Inside the CHiC - CLEF 2013

Polish Track at CLEF 2013

Co-Organized by the University of Neuchatel (Switzerland)
University of Wroclaw, and Nicolaus Copernicus University (Poland)
Guidelines for Participation and Submission

This Polish Track in a CHiC subtask of the CLEF 2013 evaluation campaign

Proposed tasks based on the Europeana corpus

General Comment

The following guidelines provide some information on the CLEF 2013 CHiC queries, data manipulation, query construction and results’ submission for the Polish CHiC task (ad-hoc automatic or ad-hoc manual).

To register for this task, fulfill the registration form located here.
A full description of the Polish task is available here. Our comments on the Polish corpus are available here.

Topic format and requirements

The topics were extracted from real Europeana query logs and comprise queries for people, places, work titles, events or subjects. The topics have the following format:

<topic lang="pl">
<identifier>CHIC-2013-PL-008 </identifier>
<title>ruch robotniczy </title>

<topic lang="en">
<identifier >CHIC-2013-PL-008 </identifier>
<title>workers movement </title>
<description>A relevant CH object description must provide information about the location and reason of the corresponding workers movement.</description>

<topic lang="en"> indicates the beginning and language of a topic.
<identifier> contains the query identifier
<title> contains the actual query
<description> contains a description of the content that will be used for relevance assessment. Not every topic contains text in the description field.

The description field must not be used for retreival experiments.

Polish task submission requirements

Goal: To retrieve relevant documents for a given query (a result list of 1,000 documents is expected). 50 topics will be provided (“CHIC-2013-PL-001” to “CHIC-2013-PL-050”).

Collection processing: all document fields can be used for retrieval. The collections may not be altered in response to the CHiC 2013 topics, that is, new content may not be added to specifically adapt the documents to the topics. Other alterations (e.g., document translation or expansion) that are non-specific to the queries are permitted. Other external resources are also permitted, but must be noted in the run description later upon submission.

Conditions for participation: all groups should submit at least one run which can be either automatic or manual (with a maximum of 5 runs in the automatic mode and 5 in the manual one).

Result format: Submitted results should conform to the ASCII format, with one line per document retrieved. The lines have to be formatted as follows:

  CHIC-2013-PL-001     Q0         0         0.7416         RunA1    
    1        2        3        4         5        6    

Fields must be separated by ONE blank space and stand for the following meanings:

  2. Query iteration (will be ignored. Please choose "Q0" for all experiments).
  3. Document number (content of the “ims:identifier” attribute in the element).
  4. Rank 0..n (0 is the best matching document. If you retrieve 1,000 documents per query, the rank will be 0..999, with 0 as the best and 999 as the worst). Note that the ranking starts at 0 (zero) and not 1 (one). MUST BE SORTED IN INCREASING ORDER PER QUERY.
  5. RSV value (system specific value that expresses how relevant your system deems a document to be. This is a floating point value. High relevance should be expressed with a high value). If a document D1 is considered more relevant than a document D2, this must be reflected in the fact that RSV1 > RSV2. If RSV1 = RSV2, the documents may be randomly reordered during calculation of the evaluation measures. Please use a decimal point ".", not a comma. Do not use any form of separators for thousands. RSV values must NOT be negative numbers. The only valid characters for the RSV values are 0-9 and the decimal point. RSV MUST BE SORTED IN DECREASING ORDER PER QUERY.
  6. Run identifier (please choose a unique ID for each experiment you submit). Use only a-z, A-Z and 0-9. Do not use any special characters, accents, etc.

The result file should contain nothing but the lines formatted in the way described above.

You are expected to retrieve 1,000 documents per query. An experiment that retrieves a maximum of 1,000 documents for each of the 50 queries therefore produces a file that contains a maximum of 50,000 lines.

You should know that the effectiveness measures used in CLEF evaluate the performance of systems at various points of recall. Participants must thus return at most 1,000 documents per query in their results. Please note that by its nature, the average precision measure does not penalize systems that return extra irrelevant documents at the bottom of their result lists. Therefore, you will usually want to use the maximum allowable number of documents in your official submissions. If you knowingly retrieved less than 1,000 documents for a topic, please take note of that and check your numbers with those reported by the system during the submission.

Submission: Please submit your runs to the DIRECT system, which will be opened soon (a username and password will be sent to you). Result files should be uploaded as zip files and validated through the DIRECT system before the final submission. Runs can be deleted or added as necessary.


Prof. Jacques Savoy
Dept. of Computer Science
University of Neuchatel, (Switzerland)
Dr. Piotr Malak
Dept. of Computer Science
University of Neuchatel, (Switzerland) and
University of Nicolaus Copernicus (Poland)
Prof. Adam Pawlowski
Information Sciences Institute
University of Wroclaw (Poland)
For more information, please contact Prof. Jacques Savoy