|Assignment-4: Retrieval models-1|
|Assigned on: 09/17/2008, Due on: 09/21/2008|
|1. Build index using Porter stemmer and stop words removal, taking the first 10 countries' descriptions from the CIA World Factbook as documents. Use the following three queries to do retrieval using TFIDF, Okapi, and KL methods: (1) columbus island, (2) british colony, (3) independence. You might see some differences in the retrieved results between these three retrieval models for the given queries. Investigate these differences. Go through retrieved results for each model and see which one you "trust" the most/least. Submit your three results sets and comments.
- Proper result sets (3 points)
- Comments on differences between the results sets and which one is better/worse (3 points)
|2. Find the IDF values of the following terms using the above index: communist, combat, independ, establish. (Hint: use dumpindex to get extract statistics about a term) (4 points)|