ABSTRACT
We consider the problem of evaluating retrieval systems using a limited number of relevance judgments. Recent work has demonstrated that one can accurately estimate average precision via a judged pool corresponding to a relatively small random sample of documents. In this work, we demonstrate that given values or estimates of average precision, one can accurately infer the relevances of unjudged documents. Combined, we thus show how one can efficiently and accurately infer a large judged pool from a relatively small number of judged documents, thus permitting accurate and efficient retrieval evaluation on a large scale.
- J. A. Aslam, V. Pavlu, and E. Yilmaz. A statistical method for system evaluation using incomplete judgments. In Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval August 2006. To appear. Google ScholarDigital Library
- J. A. Aslam, E. Yilmaz, and V. Pavlu. The maximum entropy method for analyzing retrieval measures. In Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval pages 27--34. ACM Press, August 2005. Google ScholarDigital Library
Index Terms
- Inferring document relevance via average precision
Recommendations
Estimating average precision with incomplete and imperfect judgments
CIKM '06: Proceedings of the 15th ACM international conference on Information and knowledge managementWe consider the problem of evaluating retrieval systems using incomplete judgment information. Buckley and Voorhees recently demonstrated that retrieval systems can be efficiently and effectively evaluated using incomplete judgments via the bpref ...
Inferring document relevance from incomplete information
CIKM '07: Proceedings of the sixteenth ACM conference on Conference on information and knowledge managementRecent work has shown that average precision can be accurately estimated from a small random sample of judged documents. Unfortunately, such "random pools" cannot be used to evaluate retrieval measures in any standard way. In this work, we show that ...
Estimating average precision when judgments are incomplete
We consider the problem of evaluating retrieval systems with incomplete relevance judgments. Recently, Buckley and Voorhees showed that standard measures of retrieval performance are not robust to incomplete judgments, and they proposed a new measure, ...
Comments