CS726 : Information Retrieval Techniques

Course Overview

Course Synopsis

This course discusses the theory, design, and implementation of text-based information retrieval systems. The core components of an Information Retrieval include statistical characteristics of text, representation of information needs and documents, several important retrieval models (Boolean, vector space, probabilistic, inference net, language modeling, link analysis), clustering algorithms, automatic text categorization, recommender systems, search computing ,search engine optimization, multimedia IR, semantic web, and experimental evaluation. The software architecture components include design and implementation of high-capacity text retrieval and text filtering systems. Furthermore, queries related to the “deep web” are also discussed under the topic of Search Computing. Lastly, Page Rank Computation, Latent Semantic Indexing, other advance topics, and latest research trends shall also be discussed in this course.

Course Learning Outcomes

Developing understanding of theory and practice of text retrieval techniques

  • You will be able to understand theory of IR systems, the working mechanism of such systems and practical applications of the IR systems to real life problems.


Course Calendar

1 Introduction
2 Information Retrieval Models Boolean Retrieval Model
3 Boolean Retrieval Model Rank Retrieval Model
4 Vector Space Retrieval Model
5 TF-IDF Weighting, Document Representation in Vector Space, Query Representation in Vector Space, Similarity Measures
6 Similarity Measures; Cosine Similarity Measure
7 Parsing Documents
8 Token Numbers Stop Words
9 Terms Normalization
10 Lemmatization Stemming
11 Compression
12 Compression (contd.)
13 Compression(contd.)
14 Index Constructions
15 Merge Sort
16 Phrase queries
17 Processing a phrase query, Proximity queries
18 Wild Card Queries, B Tree
19 Permuterm index k-gram
20 Spelling Correction
21 Spelling Correction (contd.)
22 Spelling Correction(contd.)
23 Performance Evaluation of Information Retrieval Systems
24 Benchmarks for the Evaluation of IR Systems
25 Benchmarks for the Evaluation of IR Systems (contd.)
26 Precision and Recall
27 Mean Average Precision, Non Binary Relevance, DCG, NDCG
28 Using user Clicks
29 Cosine Ranking
30 Sampling and pre-grouping
31 Dimensionality reduction
32 Web Search
33 Spidering
34 Web Crawler
35 Distributed Index
36 Link Analysis
37 Markov chains
38 HITS
39 Search Computing
40 Top-k Query Processing
41 Clustering
42 Classification
43 Clustering (contd.)
44 Recommender Systems
45 Final Notes on Information Retrieval