INLS 490-154W: Information Retrieval Systems Design and Implementation

Fall 2009. Web-based Course

[ Home ] [ Syllabus ] [ Assignments ] [ Project ] [ Resources ]

#DateTopics and ReadingsObjectivesQuizzes & Assignments
1.08/28/20091. Introduction [Slides | Notes]
  • Introduction to the course
  • Setup your machine with necessary tools
  • Overview of some UNIX commands and utilities
  • Structured data access from a MySQL database
      Reading: UNIX Primer
  • Familiarize with basic terminology of a search system environment.
  • Practice accessing structured data.
Quiz-1
Due on: 08/28/2009
Assignment-1
Due on: 09/01/2009
2.09/03/20092. IR with MySQL and Text Files [Slides | Notes]
  • Structured data access and display in a webpage
  • Unstructured data access from (1) MySQL tables and (2) text files
      Reading: Getting Started with MySQL
      Reading: HTML Tutorial, HTML Forms and Input
      Reading: PHP introduction, installation, syntax, variables.
  • Practice accessing and processing structured data using MySQL.
  • Demonstrate how textual data can be accessed from MySQL as well as flat-files.
Assignment-2
Due on: TBD
3.09/10/20093. Learning to index [Slides | Notes]
  • Effective indexing of text documents
  • Basic understanding of information retrieval model
  • Work with Lemur Toolkit
      Reading: The Anatomy of a Large-Scale Hypertextual Web Search Engine
      Reading: Overview of Lemur
  • Describe a general model of information retrieval.
  • Explain the importance of indexing, stemming, and stopwords removal.
  • Demonstrate how these processes are executed in a typical search environment.
  • Configure Lemur Toolkit and related tools.
  • Use Lemur to index a set of documents.
Assignment-3
Due on: TBD
4.09/17/20094. Query processing and retrieval [Slides | Notes]
  • Represent query "by hand" and then using Lemur
  • Retrieve documents
      Reading: Google Basic Search Guidelines
  • Process a text query for matching it with an indexed collection.
  • Retrieve a set of relevant documents matching the query using vector space model.
Assignment-4
Due on: TBD
5.09/24/20095. Retrieval models-1 [Slides | Notes]
  • Vector space, Boolean, and Langauge model
      Reading: Boolean retrieval [PDF] by Christopher D. Manning, Prabhakar Raghavan and Hinrich Sch├╝tze
      Reading: A language modeling approach to information retrieval by Jay Ponte and W. Bruce Croft
  • Demonstrate use of various retrieval models.
  • Describe the pros and cons of these models.
Assignment-5
Due on: TBD
6.10/01/20096. Retrieval models-2 [Slides | Notes]
  • Probabilistic models
  • Relevance models
      Reading: A probabilistic model of information retrieval: development and comparative experiments. Part 1 [PDF] by K. Sparck Jones, S. Walker and S. E. Robertson
      Reading: A probabilistic model of information retrieval: development and comparative experiments. Part 2 [PDF] by K. Sparck Jones, S. Walker and S. E. Robertson
      Reading: Relevance based language models by Victor Lavrenko and W. Bruce Croft
  • Demonstrate use of language and probabilistic models for retrieving and ranking.
  • Utilize (pseudo-)relevance feedback in retrieval process.
Assignment-6
Due on: TBD
7.10/08/20097. Structured query processing [Slides | Notes]
  • Query term weighting
  • Query term suggestions
      Reading: Helping people find what they don't know by Nicolas J. Belkin
      Reading: Using terminological feedback for web search refinement: a log-based study by Peter Anick

8. Evaluation-1 [Slides | Notes]
  • Recall and precision measures in IR
  • TREC evaluation
      Reading: Evaluation of Evaluation in Information Retrieval [PDF] by Tefko Saracevic
  • Employ a method of providing relevance feedback in a retrieval setup.
  • Demonstrate how the system can provide term suggestions for a query.
  • Demonstrate ways to evaluate retrieval performance.
  • Employ TREC measures to evaluate and report retrieval effectiveness of an IR system.
Assignment-7
Due on: TBD
8.10/15/20099. Evaluation-2 [Slides | Notes]
  • GMAP and bpref measures
  • Mean reciprocal rank and other rank-based measures
  • Comparing rank-lists
  • Demonstrate ways to evaluate retrieval performance with measures other than standard recall and precision.
  • Employ TREC measures to evaluate, compare, and report retrieval effectiveness of IR systems.
Assignment-8
Due on: TBD
9.10/29/200910. UI for search [Slides | Notes]
  • Basic UI for search services
  • Dynamic UI for search services with AJAX
      Reading: AJAX Tutorial
  • Develop a functional and user-friendly UI for search.
  • Add dynamic interaction components to a UI for search.
--
10.11/05/200911. Web crawling [Slides | Notes]
  • Web crawling with "wget" and "Heritrix" crawlers
  • Building a custom crawler
      Reading: Focused crawling: a new approach to topic-specific Web resource discovery by Soumen Chakrabarti, Martin van den Berg, and Byron Dom
      Reading: Random Web Crawls [PDF] by Toufik Bennouas and Fabien de Montgolfier.

  • Collect documents from the Web using crawlers
  • Use a service employing REST protocol
  • Demonstrate how an XML document can be parsed
Assignment-9
Due on: TBD
11.11/12/200912. IR on Web 2.0 [Slides | Notes]
  • Using REST protocol based services
  • Parsing XML documents
      Reading: XML Tutorial
  • Get acquainted with some issues out of scope for this course, but are still related and important
--
12.11/19/200913. Information organization [Slides | Notes]
  • Organizing information using (1) term-clouds and (2) clustering
  • Prepare a collection visualization interface using term-clouds
  • Demonstrate how documents can be clustered based on their contents
Assignment-10
Due on: TBD
13.12/03/200914. Wrap-up [Slides]
  • Advance topics
  • Review
  • Get acquainted with some issues out of scope for this course, but are still related and important
--

| Chirag Shah | Last update: August 23, 2009 |