# | Date | Topics and Readings | Objectives | Assignment |
1. | 08/20/2008 | Introduction [Slides | Handout]- Introduction to the course
- Setup your machine with necessary tools
- Overview of some UNIX commands and utilities
- Structured data access from a MySQL database
Reading: UNIX Primer | - Familiarize with basic terminology of a search system environment.
- Practice accessing structured data.
| Assignment-1 Due: 08/24/2008 |
2. | 08/27/2008 | IR with MySQL and Text Files
[Slides | Handout]- Structured data access and display in a webpage
- Unstructured data access from (1) MySQL tables and (2) text files
Reading: Getting Started with MySQL | - Practice accessing and processing structured data using MySQL.
- Demonstrate how textual data can be accessed from MySQL as well as flat-files.
| Assignment-2 Due: 09/07/2008 |
3. | 09/03/2008 | Learning to index [Slides | Handout]- Effective indexing of text documents
- Basic understanding of information retrieval model
- Work with Lemur Toolkit
Reading: The Anatomy of a Large-Scale Hypertextual Web Search Engine Reading: Overview of Lemur | - Describe a general model of information retrieval.
- Explain the importance of indexing, stemming, and stopwords removal.
- Demonstrate how these processes are executed in a typical search environment.
- Configure Lemur Toolkit and related tools.
- Use Lemur to index a set of documents.
| -- |
4. | 09/10/2008 | Query processing and retrieval [Slides | Handout]- Represent query "by hand" and then using Lemur
- Retrieve documents
Reading: Google Basic Search Guidelines | - Process a text query for matching it with an indexed collection.
- Retrieve a set of relevant documents matching the query using vector space model.
| Assignment-3 Due: 09/14/2008 |
5. | 09/17/2008 | Retrieval models-1 [Slides | Handout]- Vector space, Boolean, and Langauge model
Reading: Boolean retrieval [PDF] by Christopher D. Manning, Prabhakar Raghavan and Hinrich Schütze Reading: A language modeling approach to information retrieval by Jay Ponte and W. Bruce Croft | - Demonstrate use of various retrieval models.
- Describe the pros and cons of these models.
| Assignment-4 Due: 09/21/2008 |
6. | 09/24/2008 | Retrieval models-2 [Slides | Handout]- Probabilistic models
- Relevance models
Reading: A probabilistic model of information retrieval: development and comparative experiments. Part 1 [PDF] by K. Sparck Jones, S. Walker and S. E. Robertson Reading: A probabilistic model of information retrieval: development and comparative experiments. Part 2 [PDF] by K. Sparck Jones, S. Walker and S. E. Robertson Reading: Relevance based language models by Victor Lavrenko and W. Bruce Croft | - Demonstrate use of language and probabilistic models for retrieving and ranking.
- Utilize (pseudo-)relevance feedback in retrieval process.
| Assignment-5 Due: 09/28/2008 |
7. | 10/01/2008 | Structured query processing [Slides | Handout]- Query term weighting
- Query term suggestions
Reading: Helping people find what they don't know by Nicolas J. Belkin Reading: Using terminological feedback for web search refinement: a log-based study by Peter Anick | - Employ a method of providing relevance feedback in a retrieval setup.
- Demonstrate how the system can provide term suggestions for a query.
| Assignment-6 Due: 10/05/2008 |
8. | 10/08/2008 | Evaluation-1 [Slides | Handout]- Recall and precision measures in IR
- TREC evaluation
Reading: Evaluation of Evaluation in Information Retrieval [PDF] by Tefko Saracevic | - Demonstrate ways to evaluate retrieval performance.
- Employ TREC measures to evaluate and report retrieval effectiveness of an IR system.
| Assignment-7 Due: 10/12/2008 |
9. | 10/15/2008 | Evaluation-2 [Slides | Handout]- GMAP and bpref measures
- Mean reciprocal rank and other rank-based measures
- Comparing rank-lists
|
- Demonstrate ways to evaluate retrieval performance with measures other than standard recall and precision.
- Employ TREC measures to evaluate, compare, and report retrieval effectiveness of IR systems.
| Assignment-8 Due: 10/19/2008 |
10. | 10/22/2008 | User interface for search [Slides | Handout]- Basic UI for search services
- Dynamic UI for search services with AJAX
Reading: AJAX Tutorial | - Develop a functional and user-friendly UI for search.
- Add dynamic interaction components to a UI for search.
| Assignment-9 Due: 10/26/2008 |
11. | 10/29/2008 | IR on Web 2.0 [Slides | Handout]- Using REST protocol based services
- Parsing XML documents
Reading: XML Tutorial | - Use a service employing REST protocol
- Demonstrate how an XML document can be parsed
| Assignment-10 Due: 11/02/2008 |
12. | 11/05/2008 | Web crawling [Slides | Handout]- Web crawling with "wget" and "Heritrix" crawlers
- Building a custom crawler
Reading: Focused crawling: a new approach to topic-specific Web resource discovery by Soumen Chakrabarti, Martin van den Berg, and Byron Dom Reading: Random Web Crawls [PDF] by Toufik Bennouas and Fabien de Montgolfier. | - Collect documents from the Web using crawlers
| Assignment-11 Due: 11/09/2008 |
-- | 11/12/2008 | No class. Instructor away. | -- | -- |
13. | 11/19/2008 | Information organization [Slides | Handout]- Organizing information using (1) term-clouds and (2) clustering
| - Prepare a collection visualization interface using term-clouds
- Demonstrate how documents can be clustered based on their contents
| Assignment-12 Due: 11/23/2008 |
-- | 11/26/2008 | Thanksgiving | -- | -- |
14. | 12/03/2008 | Wrap-up [Slides] | - Get acquainted with some issues out of scope for this course, but are still related and important
| -- |