|#||Date||Topics and Readings||Objectives||Assignments|
|1.||01/15/2009||Introduction [Slides | Notes]|
Reading: UNIX Primer
- Introduction to the course
- Setup your machine with necessary tools
- Overview of some UNIX commands and utilities
- Structured data access from a MySQL database
- Familiarize with basic terminology of a search system environment.
- Practice accessing structured data.
Due on: 01/20/2009
|2.||01/22/2009||IR with MySQL and Text Files
[Slides | Notes]|
Reading: Getting Started with MySQL
- Structured data access and display in a webpage
- Unstructured data access from (1) MySQL tables and (2) text files
Reading: HTML Tutorial, HTML Forms and Input
Reading: PHP introduction, installation, syntax, variables.
- Practice accessing and processing structured data using MySQL.
- Demonstrate how textual data can be accessed from MySQL as well as flat-files.
Due on: 01/27/2009
|3.||01/29/2009||Learning to index [Slides | Notes]|
Reading: The Anatomy of a Large-Scale Hypertextual Web Search Engine
- Effective indexing of text documents
- Basic understanding of information retrieval model
- Work with Lemur Toolkit
Reading: Overview of Lemur
- Describe a general model of information retrieval.
- Explain the importance of indexing, stemming, and stopwords removal.
- Demonstrate how these processes are executed in a typical search environment.
- Configure Lemur Toolkit and related tools.
- Use Lemur to index a set of documents.
Due on: 02/03/2009
|4.||02/05/2009||Query processing and retrieval [Slides | Notes]|
Reading: Google Basic Search Guidelines
- Represent query "by hand" and then using Lemur
- Retrieve documents
- Process a text query for matching it with an indexed collection.
- Retrieve a set of relevant documents matching the query using vector space model.
Due on: 02/10/2009
|5.||02/12/2009||Retrieval models-1 [Slides | Notes]|
Reading: Boolean retrieval [PDF] by Christopher D. Manning, Prabhakar Raghavan and Hinrich Schütze
- Vector space, Boolean, and Langauge model
Reading: A language modeling approach to information retrieval by Jay Ponte and W. Bruce Croft
- Demonstrate use of various retrieval models.
- Describe the pros and cons of these models.
Due on: 02/17/2009
|6.||02/19/2009||Retrieval models-2 [Slides | Notes]|
Reading: A probabilistic model of information retrieval: development and comparative experiments. Part 1 [PDF] by K. Sparck Jones, S. Walker and S. E. Robertson
- Probabilistic models
- Relevance models
Reading: A probabilistic model of information retrieval: development and comparative experiments. Part 2 [PDF] by K. Sparck Jones, S. Walker and S. E. Robertson
Reading: Relevance based language models by Victor Lavrenko and W. Bruce Croft
- Demonstrate use of language and probabilistic models for retrieving and ranking.
- Utilize (pseudo-)relevance feedback in retrieval process.
Due on: 02/24/2009
|7.||02/26/2009||Structured query processing [Slides | Notes]|
Reading: Helping people find what they don't know by Nicolas J. Belkin
- Query term weighting
- Query term suggestions
Reading: Using terminological feedback for web search refinement: a log-based study by Peter Anick
Evaluation-1 [Slides | Notes]
Reading: Evaluation of Evaluation in Information Retrieval [PDF] by Tefko Saracevic
- Recall and precision measures in IR
- TREC evaluation
- Employ a method of providing relevance feedback in a retrieval setup.
- Demonstrate how the system can provide term suggestions for a query.
- Demonstrate ways to evaluate retrieval performance.
- Employ TREC measures to evaluate and report retrieval effectiveness of an IR system.
Due on: 03/03/2009
|8.||03/05/2009||Evaluation-2 [Slides | Notes]|
- GMAP and bpref measures
- Mean reciprocal rank and other rank-based measures
- Comparing rank-lists
- Demonstrate ways to evaluate retrieval performance with measures other than standard recall and precision.
- Employ TREC measures to evaluate, compare, and report retrieval effectiveness of IR systems.
Due on: 03/17/2009
|9.||03/19/2009||UI for search [Slides | Notes] |
Reading: AJAX Tutorial
- Basic UI for search services
- Dynamic UI for search services with AJAX
- Develop a functional and user-friendly UI for search.
- Add dynamic interaction components to a UI for search.
|10.||03/26/2009||Web crawling [Slides | Notes]|
Reading: Focused crawling: a new approach to topic-specific Web resource discovery by Soumen Chakrabarti, Martin van den Berg, and Byron Dom
- Web crawling with "wget" and "Heritrix" crawlers
- Building a custom crawler
Reading: Random Web Crawls [PDF] by Toufik Bennouas and Fabien de Montgolfier.
IR on Web 2.0 [Slides | Notes]
Reading: XML Tutorial
- Using REST protocol based services
- Parsing XML documents
- Collect documents from the Web using crawlers
- Use a service employing REST protocol
- Demonstrate how an XML document can be parsed
Due on: 03/31/2009
|11.||04/02/2009||Information organization [Slides | Notes]|
- Organizing information using (1) term-clouds and (2) clustering
- Prepare a collection visualization interface using term-clouds
- Demonstrate how documents can be clustered based on their contents
Due on: 04/07/2009
|--||04/09/2009||No class. Instructor away.||--||--|
|--||04/16/2009||No class. Instructor away.||--||--|
- Get acquainted with some issues out of scope for this course, but are still related and important