|#||Date||Topics and Readings||Objectives||Assignment|
|1.||08/20/2008||Introduction [Slides | Handout]|
Reading: UNIX Primer
- Introduction to the course
- Setup your machine with necessary tools
- Overview of some UNIX commands and utilities
- Structured data access from a MySQL database
- Familiarize with basic terminology of a search system environment.
- Practice accessing structured data.
|2.||08/27/2008||IR with MySQL and Text Files
[Slides | Handout]|
Reading: Getting Started with MySQL
- Structured data access and display in a webpage
- Unstructured data access from (1) MySQL tables and (2) text files
- Practice accessing and processing structured data using MySQL.
- Demonstrate how textual data can be accessed from MySQL as well as flat-files.
|3.||09/03/2008||Learning to index [Slides | Handout]|
Reading: The Anatomy of a Large-Scale Hypertextual Web Search Engine
- Effective indexing of text documents
- Basic understanding of information retrieval model
- Work with Lemur Toolkit
Reading: Overview of Lemur
- Describe a general model of information retrieval.
- Explain the importance of indexing, stemming, and stopwords removal.
- Demonstrate how these processes are executed in a typical search environment.
- Configure Lemur Toolkit and related tools.
- Use Lemur to index a set of documents.
|4.||09/10/2008||Query processing and retrieval [Slides | Handout]|
Reading: Google Basic Search Guidelines
- Represent query "by hand" and then using Lemur
- Retrieve documents
- Process a text query for matching it with an indexed collection.
- Retrieve a set of relevant documents matching the query using vector space model.
|5.||09/17/2008||Retrieval models-1 [Slides | Handout]|
Reading: Boolean retrieval [PDF] by Christopher D. Manning, Prabhakar Raghavan and Hinrich Schütze
- Vector space, Boolean, and Langauge model
Reading: A language modeling approach to information retrieval by Jay Ponte and W. Bruce Croft
- Demonstrate use of various retrieval models.
- Describe the pros and cons of these models.
|6.||09/24/2008||Retrieval models-2 [Slides | Handout]|
Reading: A probabilistic model of information retrieval: development and comparative experiments. Part 1 [PDF] by K. Sparck Jones, S. Walker and S. E. Robertson
- Probabilistic models
- Relevance models
Reading: A probabilistic model of information retrieval: development and comparative experiments. Part 2 [PDF] by K. Sparck Jones, S. Walker and S. E. Robertson
Reading: Relevance based language models by Victor Lavrenko and W. Bruce Croft
- Demonstrate use of language and probabilistic models for retrieving and ranking.
- Utilize (pseudo-)relevance feedback in retrieval process.
|7.||10/01/2008||Structured query processing [Slides | Handout]|
Reading: Helping people find what they don't know by Nicolas J. Belkin
- Query term weighting
- Query term suggestions
Reading: Using terminological feedback for web search refinement: a log-based study by Peter Anick
- Employ a method of providing relevance feedback in a retrieval setup.
- Demonstrate how the system can provide term suggestions for a query.
|8.||10/08/2008||Evaluation-1 [Slides | Handout]|
Reading: Evaluation of Evaluation in Information Retrieval [PDF] by Tefko Saracevic
- Recall and precision measures in IR
- TREC evaluation
- Demonstrate ways to evaluate retrieval performance.
- Employ TREC measures to evaluate and report retrieval effectiveness of an IR system.
|9.||10/15/2008||Evaluation-2 [Slides | Handout]|
- GMAP and bpref measures
- Mean reciprocal rank and other rank-based measures
- Comparing rank-lists
- Demonstrate ways to evaluate retrieval performance with measures other than standard recall and precision.
- Employ TREC measures to evaluate, compare, and report retrieval effectiveness of IR systems.
|10.||10/22/2008||User interface for search [Slides | Handout]|
Reading: AJAX Tutorial
- Basic UI for search services
- Dynamic UI for search services with AJAX
- Develop a functional and user-friendly UI for search.
- Add dynamic interaction components to a UI for search.
|11.||10/29/2008||IR on Web 2.0 [Slides | Handout]|
Reading: XML Tutorial
- Using REST protocol based services
- Parsing XML documents
- Use a service employing REST protocol
- Demonstrate how an XML document can be parsed
|12.||11/05/2008||Web crawling [Slides | Handout]|
Reading: Focused crawling: a new approach to topic-specific Web resource discovery by Soumen Chakrabarti, Martin van den Berg, and Byron Dom
- Web crawling with "wget" and "Heritrix" crawlers
- Building a custom crawler
Reading: Random Web Crawls [PDF] by Touﬁk Bennouas and Fabien de Montgolﬁer.
- Collect documents from the Web using crawlers
|--||11/12/2008||No class. Instructor away.||--||--|
|13.||11/19/2008||Information organization [Slides | Handout]|
- Organizing information using (1) term-clouds and (2) clustering
- Prepare a collection visualization interface using term-clouds
- Demonstrate how documents can be clustered based on their contents
- Get acquainted with some issues out of scope for this course, but are still related and important