Overview

Tutorial on Information Retrieval & Text Analytics

Seminar of the CUSO Ph.D. program in Computer Science
In Neuchatel, 9:15 AM, April Tuesday 2nd, 2013

Just before CORIA 2013, 3 - 5 April 2013 in Neuchatel
The seminar will be held at UniNE at Uni Mail (Room F200, second floor, Building F).
A map to reach the Computer Science Departement

Speakers

Donna Harman
National Institute of Standards and Technology (NIST)

Prof. Jamie Callan
Language Technologies Institute
Carnegie Mellon University (USA)

Topics

A Tutorial on Text Analytics

Prof. Jamie Callan

Many organizations need to analyze large amounts of text to discover useful information. This tutorial provides students with an understanding of common and emerging methods of summarizing and analyzing material in large collections of unstructured and lightly-structured text ('text analytics'). The tutorial begins by covering basic and advanced text representation techniques and similarity metrics used in full-text search engines and text mining software. It then explores the use of these core techniques to accomplish different types of analysis tasks, for example, frequency and co-occurrence analysis, sentiment analysis, and expert finding. This tutorial assumes a typical Computer Science background, and a very basic knowledge of linear algebra, probability, and statistics.

Information Retrieval: Evaluation

Donna Harman

Evaluation has always been a critical component of experimentation in all areas of research, and the information retrieval community has been particularly fortunate to have had excellent methodologies for evaluation right from its beginning. These methodologies, often referred to as the Cranfield paradigm, first led to a solid foundation of research, followed by continuous improvements since then. The tutorial will start with an introduction to the Cranfield paradigm, with an emphasis on the reasoning behind its development. This will be followed by an extensive examination of how this paradigm has been adapted to current research in information access, using the TREC evaluations as a case study.

The second part of the tutorial will look at evaluation techniques that involve the interaction between users and information access technologies. This includes the types of user studies done in commercial search engines and those done in the academic settings, with discussion of both usability studies and user studies designed for more generalized testing of new information access techniques.

The final part of the tutorial will be a short summary of some of the evaluation methodologies used in other related fields of human language technology, including summarization, question answering, speech recognition, video retrieval and machine translation. The goal here is to compare and contrast the differing evaluation philosophies in these areas.

Timetable

Room: F200 Aula Louis - Guillaume

9h015 - 10h45: Text Analytics, Part I (text representation, retrieval modesl, clustering) (Jamie Callan)

10h45 - 11h05: Coffee break

11h05 - 12h30: Information Retrieval Evaluation, Part I (Crandfield and TREC paradigm) (Donna Harman)

12h30 - 13h30: Lunch

13h30 - 14h45: Information Retrieval Evaluation, Part II (user-centered evaluation) (Donna Harman)

14h45 - 15h05: Coffee break

15h05 - 16h30: Text Analytics, Part II (named-entities, frequency, sentiment analysis, expert finding) (Jamie Callan)

Tutorial on Information Retrieval & Text Analytics

Seminar of the CUSO Ph.D. program in Computer Science In Neuchatel, 9:15 AM, April Tuesday 2nd, 2013 Just before CORIA 2013, 3 - 5 April 2013 in Neuchatel The seminar will be held at UniNE at Uni Mail (Room F200, second floor, Building F). A map to reach the Computer Science Departement

Speakers

Donna Harman National Institute of Standards and Technology (NIST)

Prof. Jamie Callan Language Technologies Institute Carnegie Mellon University (USA)