ABSTRACT
The empirical investigation of the effectiveness of information retrieval (IR) systems requires a test collection, a set of query topics, and a set of relevance judgments made by human assessors for each query. Previous experiments show that differences in human relevance assessments do not affect the relative performance of retrieval systems. Based on this observation, we propose and evaluate a new approach to replace the human relevance judgments by an automatic method. Ranking of retrieval systems with our methodology correlates positively and significantly with that of human-based evaluations. In the experiments, we assume a Web-like imperfect environment: the indexing information for all documents is available for ranking, but some documents may not be available for retrieval. Such conditions can be due to document deletions or network problems. Our method of simulating imperfect environments can be used for Web search engine assessment and in estimating the effects of network conditions (e.g., network unreliability) on IR system performance.
- Chowdhury A., Soboroff I. Automatic evaluation of World Wide Web search services. In the Proceedings of the 2002 ACM SIGIR Conference, 421--422. Google ScholarDigital Library
- Voorhees E.M., Harman, D. Overview of the Fifth Text Retrieval Conference (TREC-5). In E. M. Voorhees and D.K. Harman, editors, The Fifth Text Retrieval Conference, NIST Special Publication 500-238. National Institute of Standards and Technology, Gaithersburg, MD, November 1996.Google Scholar
- Soboroff, I., Nicholas,C., Cahan, P. Ranking Retrieval Systems without Relevance Judgments. In the Proceedings of the 2001 ACM SIGIR Conference, 66--73. Google ScholarDigital Library
Index Terms
- Automatic ranking of retrieval systems in imperfect environments
Recommendations
Query polyrepresentation for ranking retrieval systems without relevance judgments
Ranking information retrieval (IR) systems with respect to their effectiveness is a crucial operation during IR evaluation, as well as during data fusion. This article offers a novel method of approaching the system-ranking problem, based on the widely ...
Automatic ranking of retrieval models using retrievability measure
Analyzing retrieval model performance using retrievability (maximizing findability of documents) has recently evolved as an important measurement for recall-oriented retrieval applications. Most of the work in this domain is either focused on analyzing ...
Using manually-built web directories for automatic evaluation of known-item retrieval
SIGIR '03: Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrievalInformation retrieval system evaluation is complicated by the need for manually assessed relevance judgments. Large manually-built directories on the web open the door to new evaluation procedures. By assuming that web pages are the known relevant items ...
Comments