INLS 490-154: Introduction to Information Retrieval System Design and Implementation

Fall 2008. Wednesdays 9:30am-12:15pm, Manning 214

[ Home ] [ Syllabus ] [ Assignments ] [ Project ] [ Resources ]

#DateTopics and ReadingsObjectivesAssignment
1.08/20/2008Introduction [Slides | Handout]
  • Introduction to the course
  • Setup your machine with necessary tools
  • Overview of some UNIX commands and utilities
  • Structured data access from a MySQL database
      Reading: UNIX Primer
  • Familiarize with basic terminology of a search system environment.
  • Practice accessing structured data.
Due: 08/24/2008
2.08/27/2008IR with MySQL and Text Files [Slides | Handout]
  • Structured data access and display in a webpage
  • Unstructured data access from (1) MySQL tables and (2) text files
      Reading: Getting Started with MySQL
  • Practice accessing and processing structured data using MySQL.
  • Demonstrate how textual data can be accessed from MySQL as well as flat-files.
Due: 09/07/2008
3.09/03/2008Learning to index [Slides | Handout]
  • Effective indexing of text documents
  • Basic understanding of information retrieval model
  • Work with Lemur Toolkit
      Reading: The Anatomy of a Large-Scale Hypertextual Web Search Engine
      Reading: Overview of Lemur
  • Describe a general model of information retrieval.
  • Explain the importance of indexing, stemming, and stopwords removal.
  • Demonstrate how these processes are executed in a typical search environment.
  • Configure Lemur Toolkit and related tools.
  • Use Lemur to index a set of documents.
4.09/10/2008Query processing and retrieval [Slides | Handout]
  • Represent query "by hand" and then using Lemur
  • Retrieve documents
      Reading: Google Basic Search Guidelines
  • Process a text query for matching it with an indexed collection.
  • Retrieve a set of relevant documents matching the query using vector space model.
Due: 09/14/2008
5.09/17/2008Retrieval models-1 [Slides | Handout]
  • Vector space, Boolean, and Langauge model
      Reading: Boolean retrieval [PDF] by Christopher D. Manning, Prabhakar Raghavan and Hinrich Schütze
      Reading: A language modeling approach to information retrieval by Jay Ponte and W. Bruce Croft
  • Demonstrate use of various retrieval models.
  • Describe the pros and cons of these models.
Due: 09/21/2008
6.09/24/2008Retrieval models-2 [Slides | Handout]
  • Probabilistic models
  • Relevance models
      Reading: A probabilistic model of information retrieval: development and comparative experiments. Part 1 [PDF] by K. Sparck Jones, S. Walker and S. E. Robertson
      Reading: A probabilistic model of information retrieval: development and comparative experiments. Part 2 [PDF] by K. Sparck Jones, S. Walker and S. E. Robertson
      Reading: Relevance based language models by Victor Lavrenko and W. Bruce Croft
  • Demonstrate use of language and probabilistic models for retrieving and ranking.
  • Utilize (pseudo-)relevance feedback in retrieval process.
Due: 09/28/2008
7.10/01/2008Structured query processing [Slides | Handout]
  • Query term weighting
  • Query term suggestions
      Reading: Helping people find what they don't know by Nicolas J. Belkin
      Reading: Using terminological feedback for web search refinement: a log-based study by Peter Anick
  • Employ a method of providing relevance feedback in a retrieval setup.
  • Demonstrate how the system can provide term suggestions for a query.
Due: 10/05/2008
8.10/08/2008Evaluation-1 [Slides | Handout]
  • Recall and precision measures in IR
  • TREC evaluation
      Reading: Evaluation of Evaluation in Information Retrieval [PDF] by Tefko Saracevic
  • Demonstrate ways to evaluate retrieval performance.
  • Employ TREC measures to evaluate and report retrieval effectiveness of an IR system.
Due: 10/12/2008
9.10/15/2008Evaluation-2 [Slides | Handout]
  • GMAP and bpref measures
  • Mean reciprocal rank and other rank-based measures
  • Comparing rank-lists
  • Demonstrate ways to evaluate retrieval performance with measures other than standard recall and precision.
  • Employ TREC measures to evaluate, compare, and report retrieval effectiveness of IR systems.
Due: 10/19/2008
10.10/22/2008User interface for search [Slides | Handout]
  • Basic UI for search services
  • Dynamic UI for search services with AJAX
      Reading: AJAX Tutorial
  • Develop a functional and user-friendly UI for search.
  • Add dynamic interaction components to a UI for search.
Due: 10/26/2008
11.10/29/2008IR on Web 2.0 [Slides | Handout]
  • Using REST protocol based services
  • Parsing XML documents
      Reading: XML Tutorial
  • Use a service employing REST protocol
  • Demonstrate how an XML document can be parsed
Due: 11/02/2008
12.11/05/2008Web crawling [Slides | Handout]
  • Web crawling with "wget" and "Heritrix" crawlers
  • Building a custom crawler
      Reading: Focused crawling: a new approach to topic-specific Web resource discovery by Soumen Chakrabarti, Martin van den Berg, and Byron Dom
      Reading: Random Web Crawls [PDF] by Toufik Bennouas and Fabien de Montgolfier.
  • Collect documents from the Web using crawlers
Due: 11/09/2008
--11/12/2008No class. Instructor away.----
13.11/19/2008Information organization [Slides | Handout]
  • Organizing information using (1) term-clouds and (2) clustering
  • Prepare a collection visualization interface using term-clouds
  • Demonstrate how documents can be clustered based on their contents
Due: 11/23/2008
14.12/03/2008Wrap-up [Slides]
  • Advance topics
  • Review
  • Get acquainted with some issues out of scope for this course, but are still related and important

| Chirag Shah | Last update: December 19, 2008 |