Information Retrieval
Fall 2009
Tuesdays, 9:10 ~12:00 AM
Dr. Berlin Chen (陳柏琳)

Tentative List of Topics:


Course Overview & Introduction

09/22   Retrieval Models (I) - Classic Retrieval Models (Boolean, Vector Space and Probabilistic Models)  


Retrieval Performance Evaluation - Measures

HW-1: Evaluations for IR (Due10/30)


Retrieval Performance Evaluation - Collections

Retrieval Models (II) - Improved Approaches (Fuzzy Set, Extended Boolean, Generalized Vector Space Models)


Query Operations (Query Expansion and Term Re-weighting)

HW-2: Retrieval Models (Due11/27)

Retrieval Models (III) - Latent Semantic Analysis (LSA)

11/03 Retrieval Models (III) - Latent Semantic Analysis (LSA)

Retrieval Models (IV) - Language Modeling Approaches

11/17 Retrieval Models (IV) - Language Modeling Approaches
11/24 Midterm (9:00~12:00 a.m.)

Text Analysis and Processing

12/08 Clustering
12/15 Description of Final Project
Recitation and Presentation Preparation
12/22 Presentation (each will be completed within 20 - 30 minutes)
1.   When More Is Less: The Paradox of Choice in Search Engine Use, SIGIR 2009 (賴敏軒同學)
2.   Extracting Key Terms From Noisy and Multi-theme Documents, WWW2009 (Ms. Nonhlanhla Shongwe)
3.   Reducing Long Queries Using Query Quality Predictors (顏嘉緯同學)
4.   A Proximity Language Model for Information Retrieval, SIGIR 2009 (陳珮寧同學)
12/29 5.   Enhancing Cluster Labeling Using Wikipedia, SIGIR 2009 (林倚禛同學)
6.   Detecting Spammers and Content Promoters in Online Video Social Networks, SIGIR2009 (王馨蘭同學)
7.   Compressing Term Positions in Web Indexes, SIGIR 2009 (石濤鳴同學)
8.   Predicting User Interests from Contextual Information, SIGIR 2009 (朱紋儀同學)
9.   SELC: A Self-Supervised Model for Sentiment Classification, CIKM 2009 (謝聿承同學)
10. Combining Audio Content and Social Context for Semantic Music Discovery, SIGIR 2009 (林憲駿同學)
01/05 Efficient Indexing and Searching

Web Search Basics and Link Analysis

Project: Web Search Engine (Due1/29)
Learning to Rank using Language Models and SVMs



R. Baeza-Yates and B. Ribeiro-Neto, Modern Information Retrieval, Addison Wesley Longman, 1999.


Christopher D. Manning, Prabhakar Raghavan and Hinrich Schütze, Introduction to Information Retrieval, Cambridge University Press. 2008.


D. A. Grossman, O. Frieder, Information Retrieval: Algorithms and Heuristics, Springer. 2004.
4. W. Bruce Croft, Donald Metzler, and Trevor Strohman, Search Engines: Information Retrieval in Practice, Addison Wesley, 2009


1. W. B. Frakes and R. Baeza-Yates, Information Retrieval: Data Structures & Algorithms,  Prentice-Hall, 1992.
2. T. K. Landauer, D. S. McNamara, S. Dennis, W. Kintsch (eds.) , Handbook of Latent Semantic Analysis, Lawrence Erlbaum, 2007
3.  I. H. Witten, A. Moffat, and T. C. Bell, Managing Gigabytes: Compressing and Indexing Documents and Images, Morgan Kaufmann Publishing, 1999.
4. C. Manning and H. Schutze, Foundations of Statistical Natural Language Processing, MIT Press, 1999.
5. D. Jurafsky and J. H. Martin, Speech and Language Processing, Prentice-Hall, 2000.
6. W.B. Croft and J. Lafferty (eds.), Language Models for Information Retrieval, Kluwer International Series on Information Retrieval, Volume 13, Kluwer Academic Publishers, 2002.
7. C.X. Zhai, "Statistical Language Models for Information Retrieval: A Critical Review," Foundations and Trends in Information Retrieval, Vol. 2, No. 3, 2008. (Also see an extended version in: C.X. Zhai, "Statistical Language Models for Information Retrieval (Synthesis Lectures Series on Human Language Technologies)," Morgan & Claypool Publishers, 2008)
8. Stephen Robertson and Hugo Zaragoza, The Probabilistic Relevance Framework: BM25 and Beyond. Foundations and Trends in Information Retrieval 3 no. 4, 333-389 (2009).



1. D. Blei, A. Ng, and M. Jordan, "Latent Dirichlet allocation,"  Journal of Machine Learning Research, 3:993-1022, January 2003.
2. V. Lavrenko and W.B. Croft, "Relevance-Based Language Models"  ACM SIGIR 2001.
3. C. H. Papadimitriou, P. Raghavan, H. Tamaki, S. Vempala, "Latent semantic indexing: A probabilistic analysis,'' analyzes an information retrieval technique related to principle components analysis.
4. Liu, X. and Croft, W.B., "Statistical Language Modeling For Information Retrieval,"  the Annual Review of Information Science and Technology, vol. 39, 2005
5. Lan Huang. A Survey On Web Information Retrieval Technologies. 2000.
6. D. Hiemstra, "Information Retrieval Model," In: A. Goker, J. Davies, and M. Graham (eds.), Information Retrieval: Searching in the 21st Century, Wiley, 2009
7. M. Steyvers, T. Griffiths,  "Probabilistic Topic Models," In T. K. Landauer, D. S. McNamara, S. Dennis, W. Kintsch (eds.). Handbook of Latent Semantic Analysis, Mahwah NJ: Lawrence Erlbaum, 2007.
8. X. Yi, J. Allan,  "A Comparative Study of Utilizing Topic Models for Information Retrieval," in the Proceedings of ECIR'09.
9. Nallapati, Discriminative Models for Information Retrieval, in the Proceedings of SIGIR 2004
10. T. Joachims and F. Radlinski, Search Engines that Learn from Implicit Feedback, IEEE Trans. on Computer 40(8), pp. 34-40, 2007
11. B. Chen, H.M. Wang, L.S. Lee, “A discriminative HMM/N-gram-based retrieval approach for Mandarin spoken documents,” ACM Transactions on Asian Language Information Processing, Vol. 3, No. 2, pp. 128-145, June 2004.


Information Retrieval Resources

      1.   SIGIR-Information Retrieval Resources