INLS 490-154: Information Retrieval Systems Design and Implementation

Spring 2009. Thursdays 5:30-8:00pm, Greenlaw Hall 104

[ Home ] [ Syllabus ] [ Assignments ] [ Project ] [ Resources ]

Assignment-9: Web crawling
Assigned on: 03/26/2009, Due on: 03/31/2009
Crawl a web-site or a portion of it with 'wget', collecting 1,000 to 20,000 documents on the class server. Index that collection including html, pdf, and text files. Prepare a UI that allows one to search through that collection. Make sure to display a message on your UI to let the user know what collection he/she is searching in. While displaying the results as a rank-list, you should show the title and snippets of the results and provide a link from the title of a result to the crawled local file or the live file on the web.
Email the instructor (1) your wget command, [3 points] (2) parameter file for indexing, [2 points] and (3) a link to your working site [5 points] with "INLS 490: Assignment-9" in the subject field.

| Chirag Shah | Last update: May 3, 2009 |