**Information Retrieval and
Extraction
**
Fall 2007

**Homework Webpage **

¡@

**
Homework #1 :Evaluation Measures**

The the query-document relevance information (AssessmentTrainSet.txt) for a set of queries (16 queries) and a collection of 2,265 documents is provided. An IR model is then tested on this query set and save the corresponding ranking results in a file (ResultsTrainSet.txt) . Please evaluate the overall model performance using the following two measures.

**1.
Interpolated Recall-Precision Curve: **

(for each query)

(overall performance)

**2.
(Non-interpolated) Mean Average Precision:**

, where "non-interpolated average precision" is "average precision at seen relevant documents" introduced in the textbook.

**
Example 1**: Interpolated Recall-Precision Curve

**
Example 2**: (Non-interpolated) Mean Average Precision

**mAP=0.63787418**

**
Homework #2 : Retrieval
Models and Query Reformulation**

A set of text queries (16 queries) and a collection of text documents ( 2,265 documents) is provided, in which each word is represented as a number except that the number "-1" is a delimiter.

1. Implement an information retrieval system based on the
**Vector
(Space) Model **(or **Probabilistic Model**, **Generalized Vector Space
Model**, **Latent Semantic Analysis**, **Language Model**, etc.), as well as different **term weighting schemes**. The query-document relevance information is in "AssessmentTrainSet.txt".
You should evaluated you system with the two measures described in HW#1.

2. Integrate the function of query expansion and term re-weighting into your retrieval system that has been built in 1. Either (automatic) reference feedback or local analysis can be adopted as the strategy for it, but local analysis is preferred.

¡@