|Assignment-3: Learning to index|
|Assigned on: 01/29/2009, Due on: 02/03/2009|
|1. Build indices using Porter stemmer once and then Krovetz stemmer taking the first 10 countries' descriptions from the CIA World Factbook as documents. Compare their sizes and other statistics. Do you see any significant difference? Report these statistics and comment on them in 2-4 sentences.
- Proper indexing with required documents, stemming, and stop words removal as shown by the index statistics (2 points)
- Comparison (2 points)
|2. Imagine you are starting a new search engine. Would you choose to use stemming? Why? If yes, which stemmer would you use? Why? Write your thoughts in 4-8 sentences. (4 points)|
|3. Using indices built with Lemur, find out the reduction in size of the full text index if stop words removed. (2 points) |
|Email your answers to the instructor with "INLS 490: Assignment-3" in the subject field.|