skip to main content
10.1145/1835449.1835654acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
poster

Retrieval system evaluation: automatic evaluation versus incomplete judgments

Published: 19 July 2010 Publication History

Abstract

In information retrieval (IR), research aiming to reduce the cost of retrieval system evaluations has been conducted along two lines: (i) the evaluation of IR systems with reduced amounts of manual relevance assessments, and (ii) the fully automatic evaluation of IR systems, thus foregoing the need for manual assessments altogether. The proposed methods in both areas are commonly evaluated by comparing their performance estimates for a set of systems to a ground truth (provided for instance by evaluating the set of systems according to mean average precision). In contrast, in this poster we compare an automatic system evaluation approach directly to two evaluations based on incomplete manual relevance assessments. For the particular case of TREC's Million Query track, we show that the automatic evaluation leads to results which are highly correlated to those achieved by approaches relying on incomplete manual judgments.

References

[1]
J. Allan, J. A. Aslam, V. Pavlu, E. Kanoulas, and B. Carterette. Million Query Track 2008 Overview. In TREC 2008, 2008.
[2]
J. Allan, B. Carterette, B. Dachev, J. A. Aslam, V. Pavlu, and E. Kanoulas. Million Query Track 2007 Overview. In TREC 2007, 2007.
[3]
C. Hauff, D. Hiemstra, L. Azzopardi, and F. de Jong. A Case for Automatic System Evaluation. In ECIR '10, pages 153--165, 2010.
[4]
J. Aslam, V. Pavlu, and E. Yilmaz. A statistical method for system evaluation using incomplete judgments. In SIGIR '06, pages 541--548, 2006.
[5]
J. A. Aslam and R. Savell. On the effectiveness of evaluating retrieval systems in the absence of relevance judgments. In SIGIR '03, pages 361--362, 2003.
[6]
B. Carterette, J. Allan, and R. Sitaraman. Minimal test collections for retrieval evaluation. In SIGIR '06, pages 268--275, 2006.
[7]
I. Soboroff, C. Nicholas, and P. Cahan. Ranking retrieval systems without relevance judgments. In SIGIR '01, pages 66--73, 2001.

Cited By

View all
  • (2024)An Intrinsic Framework of Information Retrieval Evaluation MeasuresIntelligent Systems and Applications10.1007/978-3-031-47721-8_47(692-713)Online publication date: 10-Jan-2024
  • (2021)Towards Meaningful Statements in IR Evaluation: Mapping Evaluation Measures to Interval ScalesIEEE Access10.1109/ACCESS.2021.31168579(136182-136216)Online publication date: 2021
  • (2018)Reproduce. Generalize. Extend. On Information Retrieval Evaluation without Relevance JudgmentsJournal of Data and Information Quality10.1145/324106410:3(1-32)Online publication date: 29-Sep-2018
  • Show More Cited By

Index Terms

  1. Retrieval system evaluation: automatic evaluation versus incomplete judgments

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SIGIR '10: Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
    July 2010
    944 pages
    ISBN:9781450301534
    DOI:10.1145/1835449
    Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 19 July 2010

    Check for updates

    Author Tag

    1. automatic system evaluation

    Qualifiers

    • Poster

    Conference

    SIGIR '10
    Sponsor:

    Acceptance Rates

    SIGIR '10 Paper Acceptance Rate 87 of 520 submissions, 17%;
    Overall Acceptance Rate 792 of 3,983 submissions, 20%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)3
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 18 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)An Intrinsic Framework of Information Retrieval Evaluation MeasuresIntelligent Systems and Applications10.1007/978-3-031-47721-8_47(692-713)Online publication date: 10-Jan-2024
    • (2021)Towards Meaningful Statements in IR Evaluation: Mapping Evaluation Measures to Interval ScalesIEEE Access10.1109/ACCESS.2021.31168579(136182-136216)Online publication date: 2021
    • (2018)Reproduce. Generalize. Extend. On Information Retrieval Evaluation without Relevance JudgmentsJournal of Data and Information Quality10.1145/324106410:3(1-32)Online publication date: 29-Sep-2018
    • (2017)Evaluation of Full-Text Retrieval System Using Collection of Serially Evolved DocumentsProceedings of the 3rd International Conference on Industrial and Business Engineering10.1145/3133811.3133817(40-45)Online publication date: 17-Aug-2017
    • (2017)Building Test CollectionsProceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3077136.3082064(1407-1410)Online publication date: 7-Aug-2017
    • (2015)Splitting WaterProceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/2766462.2767749(103-112)Online publication date: 9-Aug-2015

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media