|Build an index using Krovetz stemmer and stop words removal, taking the two TREC files (with nearly 20,000 documents) given from the NY Times. Use "Bush Kerry debates" as the query and perform pseudo-relevance feedback with feedback docs=10, feedback terms=10, and retrieved docs=100 with KL retrieval method. Use the modified 'RetEval' application to obtain term IDs and their respective weights for these top terms. Find "real" terms associated with these IDs using 'dumpTerm' utility. Assume that you are showing these top terms to the user and user selects every other term starting with term 1. Construct a structured query with these terms and their associated weights. Re-run the retrieval with this new query, this time without any of the feedback parameters. Send the following files to the instructor with "INLS-490: Assignment-6" in the subject. Make sure they are named/labeled properly.
- Retrieval parameter file (1 point)
- Screen output of your first RetEval that should show term IDs and weights (2 points)
- A list of "real" terms with weights (3 points)
- Reformulated weighted query (2 points)
- Retrieval results of the final run (2 points).