INLS 490-154: Introduction to Information Retrieval System Design and Implementation

Fall 2008. Wednesdays 9:30am-12:15pm, Manning 214

[ Home ] [ Syllabus ] [ Assignments ] [ Resources ]

Assignment-11: Web crawling
Assigned on: 11/05/2008, Due on: 11/09/2008
Crawl a web-site or a portion of it with 'wget', collecting 1000-20000 documents on the class server. Index that collection including html, pdf, and text files. Prepare a UI that allows one to search through that collection. Make sure to display a message on your UI to let the user know what collection he/she is searching in. While displaying the results as a rank-list, you should show the title and snippets of the results and provide a link from the title of a result to the crawled local file or the live file on the web.
Email the instructor (1) your wget command, [3 points] (2) parameter file for indexing, [2 points] and (3) a link to your working site [5 points] with "INLS 490: Assignment-11" in the subject field.

| Chirag Shah | Last update: December 19, 2008 |