**Information Retrieval and
Extraction
**
Spring 2018

**Homework Webpage **

¡@

**
Homework #1 :Evaluation Metrics for IR**

The the query-document relevance information (AssessmentTrainSet.txt) for a set of queries (16 queries) and a collection of 2,265 documents is provided. An IR model is then tested on this query set and save the corresponding ranking results in a file (ResultsTrainSet.txt) . Please evaluate the overall model performance using the following two measures.

**1.
Interpolated Recall-Precision Curve: **

(for each query)

(overall performance)

**2.
(Non-interpolated) Mean Average Precision:**

, where "non-interpolated average precision" is "average precision at seen relevant documents" introduced in the textbook.

**
Example 1**: Interpolated Recall-Precision Curve

**
Example 2**: (Non-interpolated) Mean Average Precision

**mAP=0.63787418**

**3.
Normalized Discounted Cumulated Gain (NDCG) :**

**
Homework #2 : Retrieval
Models**

A set of text queries (16 queries) and a collection of text documents ( 2,265 documents) is provided, in which each word is represented as a number except that the number "-1" is a delimiter.

Implement an information retrieval system based on the
**Vector
(Space) Model **(or **Probabilistic Model**, **Generalized Vector Space
Model**, **Latent Semantic Analysis**, **Language Model**, etc.). The query-document relevance information is in "AssessmentTrainSet.txt".
You should evaluated you system with the two measures described in HW#1.