CS726 : Information Retrieval Techniques

Course Overview

Course Synopsis

This course discusses the theory, design, and implementation of text-based information retrieval systems. The core components of an Information Retrieval include statistical characteristics of text, representation of information needs and documents, several important retrieval models (Boolean, vector space, probabilistic, inference net, language modeling, link analysis), clustering algorithms, automatic text categorization, recommender systems, search computing ,search engine optimization, multimedia IR, semantic web, and experimental evaluation. The software architecture components include design and implementation of high-capacity text retrieval and text filtering systems. Furthermore, queries related to the “deep web” are also discussed under the topic of Search Computing. Lastly, Page Rank Computation, Latent Semantic Indexing, other advance topics, and latest research trends shall also be discussed in this course.

Course Learning Outcomes

Developing understanding of theory and practice of text retrieval techniques

  • You will be able to understand theory of IR systems, the working mechanism of such systems and practical applications of the IR systems to real life problems.


Course Calendar

TopicLectureResourcePage
Introduction1TextbookChapter 1
Introduction Information Retrieval Models Boolean Retrieval Model2TextbookChapter 1
Boolean Retrieval Model Rank Retrieval Model3TextbookChapter 1 Boolean Retrieval
Vector Space Retrieval Model4TextbookChapter 6, Sections: 6.2 – 6.4.3; https://janav.wordpress.com/2013/10/27/tf-idf-and-cosine-similarity/ https://www.bionicspirit.com/blog/2012/01/16/cosine-similarity-euclidean-distance.html
TF-IDF Weighting Document Representation in Vector Space Query Representation in Vector Space Similarity Measures5Lecture SlidesLecture Slides
Similarity Measures Cosine Similarity Measure6TextbookChapter 6
Parsing Documents7TextbookChapter 1
Token Numbers Stop Words8Referecne BooksManaging Gigabytes , Sections: 3.6, 4.3; MIR 7.2; Porter’s stemmer: http//www.sims.berkeley.edu/~hearst/irbook/porter.html H.E. Williams, J. Zobel, and D. Bahle, “Fast Phrase Querying with Combined Indexes”, ACM Transactions on Information Systems. http://www.seg.rmit.edu.au/research/research.php?author=4
Assignment 1
Terms Normalization9Referecne BooksManaging Gigabytes , Sections: 3.6, 4.3; MIR 7.2; Porter’s stemmer: http//www.sims.berkeley.edu/~hearst/irbook/porter.html H.E. Williams, J. Zobel, and D. Bahle, “Fast Phrase Querying with Combined Indexes”, ACM Transactions on Information Systems. http://www.seg.rmit.edu.au/research/research.php?author=4
Lemmatization Stemming10Referecne BooksManaging Gigabytes , Sections: 3.6, 4.3; MIR 7.2; Porter’s stemmer: http//www.sims.berkeley.edu/~hearst/irbook/porter.html H.E. Williams, J. Zobel, and D. Bahle, “Fast Phrase Querying with Combined Indexes”, ACM Transactions on Information Systems. http://www.seg.rmit.edu.au/research/research.php?author=4
Compression11TextbookChapter 5; http://ifnlp.org/ir Original publication on word-aligned binary codes by Anh and Moffat (2005); also: Anh and Moffat (2006a) Original publication on variable byte codes by Scholer, Williams, Yiannis and Zobel (2002) More details on compression (including compression of positions and frequencies) in Zobel and Moffat (2006)
Compression12TextbookChapter 5; http://ifnlp.org/ir Original publication on word-aligned binary codes by Anh and Moffat (2005); also: Anh and Moffat (2006a) Original publication on variable byte codes by Scholer, Williams, Yiannis and Zobel (2002) More details on compression (including compression of positions and frequencies) in Zobel and Moffat (2006)
Compression13TextbookChapter 5; http://ifnlp.org/ir Original publication on word-aligned binary codes by Anh and Moffat (2005); also: Anh and Moffat (2006a) Original publication on variable byte codes by Scholer, Williams, Yiannis and Zobel (2002) More details on compression (including compression of positions and frequencies) in Zobel and Moffat (2006)
Index Constructions14Textbook; Reference BookIIR- Chapter 4; MG- Chapter 5; Original publication on MapReduce: Dean and Ghemawat (2004) Original publication on SPIMI: Heinz and Zobel (2003)
Merge Sort15Textbook; Reference BookIIR- Chapter 4; MG- Chapter 5; Original publication on MapReduce: Dean and Ghemawat (2004) Original publication on SPIMI: Heinz and Zobel (2003)
Phrase queries16Reference BooksMG 3.6, 4.3; MIR 7.2; Porter’s stemmer: http//www.sims.berkeley.edu/~hearst/irbook/porter.html H.E. Williams, J. Zobel, and D. Bahle, “Fast Phrase Querying with Combined Indexes”, ACM Transactions on Information Systems. http://www.seg.rmit.edu.au/research/research.php?author=4
Processing a phrase query Proximity queries17Reference BooksMG 3.6, 4.3; MIR 7.2; Porter’s stemmer: http//www.sims.berkeley.edu/~hearst/irbook/porter.html H.E. Williams, J. Zobel, and D. Bahle, “Fast Phrase Querying with Combined Indexes”, ACM Transactions on Information Systems. http://www.seg.rmit.edu.au/research/research.php?author=4
Wild Card Queries B Tree18Textbook; Reference BookIIR 3, MG 4.2; Efficient spell retrieval: K. Kukich. Techniques for automatically correcting words in text. ACM Computing Surveys 24(4), Dec 1992. J. Zobel and P. Dart. Finding approximate matches in large lexicons. Software - practice and experience 25(3), March 1995. http://citeseer.ist.psu.edu/zobel95finding.html Mikael Tillenius: Efficient Generation and Ranking of Spelling Error Corrections. Master’s thesis at Sweden’s Royal Institute of Technology. http://citeseer.ist.psu.edu/179155.html
Permuterm index k-gram19Textbook; Reference BookIIR 3, MG 4.2; Nice, easy reading on spell correction: Peter Norvig: How to write a spelling corrector http://norvig.com/spell-correct.html
Spelling Correction20Textbook; Reference BookIIR 3, MG 4.2; Efficient spell retrieval: K. Kukich. Techniques for automatically correcting words in text. ACM Computing Surveys 24(4), Dec 1992. J. Zobel and P. Dart. Finding approximate matches in large lexicons. Software - practice and experience 25(3), March 1995. http://citeseer.ist.psu.edu/zobel95finding.html Mikael Tillenius: Efficient Generation and Ranking of Spelling Error Corrections. Master’s thesis at Sweden’s Royal Institute of Technology. http://citeseer.ist.psu.edu/179155.html
Spelling Correction21Textbook; Reference BookIIR 3, MG 4.2; Nice, easy reading on spell correction: Peter Norvig: How to write a spelling corrector http://norvig.com/spell-correct.html
Spelling Correction22Textbook; Reference BookIIR 3, MG 4.2; Nice, easy reading on spell correction: Peter Norvig: How to write a spelling corrector http://norvig.com/spell-correct.html
Mid Term Exam
Performance Evaluation of Information Retrieval Systems23TextbookChapter 8
Benchmarks for the Evaluation of IR Systems24TextbookChapter 8
Benchmarks for the Evaluation of IR Systems25TextbookChapter 8
Precision and Recall26TextbookChapter 8
Mean Average Precision Non Binary Relevance DCG NDCG27TextbookChapter 8
Using user Clicks28TextbookChapter 9
Cosine Ranking29TextbookChapter 9
Sampling and pre-grouping30TextbookChapter 13
Assignment 2
Dimensionality reduction31Referecne BooksMG 4.6; MIR 2.7.2; Random projection theorem:http://citeseer.nj.nec.com/dasgupta99elementary.html Faster random projection: http://citeseer.nj.nec.com/frieze98fast.html Latent semantic indexing: http://citeseer.nj.nec.com/deerwester90indexing.html
Web Search32TextbookChapter 19
Spidering33TextbookChapter 19
Web Crawler34TextbookChapter 20
Distributed Index35TextbookChapter 4
Link Analysis36TextbookChapter 21; http://www2004.org/proceedings/docs/1p309.pdf http://www2004.org/proceedings/docs/1p595.pdf http://www2003.org/cdrom/papers/refereed/p270/kamvar-270-xhtml/index.html http://www2003.org/cdrom/papers/refereed/p641/xhtml/p641-mccurley.html
Markov chains37TextbookChapter 21; http://www2004.org/proceedings/docs/1p309.pdf http://www2004.org/proceedings/docs/1p595.pdf http://www2003.org/cdrom/papers/refereed/p270/kamvar-270-xhtml/index.html http://www2003.org/cdrom/papers/refereed/p641/xhtml/p641-mccurley.html
HITS38TextbookChapter 21; http://www2004.org/proceedings/docs/1p309.pdf http://www2004.org/proceedings/docs/1p595.pdf http://www2003.org/cdrom/papers/refereed/p270/kamvar-270-xhtml/index.html http://www2003.org/cdrom/papers/refereed/p641/xhtml/p641-mccurley.html
Search Computing39Search Computing by Stefano Ceri et al.www.search-computing.org
Top-k Query Processing40Search Computing by Stefano Ceri et al.http://dl.acm.org/citation.cfm?id=1391730
Clustering41TextbookChapter 13
Classification42TextbookChapter 13
Term Paper+Presentation
Viva Voce Examination (Oral Exam)
Clustering43TextbookChapter 17
Recommender Systems44C. Sammut, G. Webb (eds.), Encyclopedia of Machine Learning, Springer-Verlag Berlin Heidelberg, 2010
Final Notes on Information Retrieval45
Final Exam
 
 
Back to Top