ABSTRACT
Information retrieval test collections traditionally use a combination of automatic and manual runs to create a pool of documents to be judged. The quality of the final judgments produced for a collection is a product of the variety across each of the runs submitted and the pool depth. In this work, we explore fully automated approaches to generating a pool. By combining a simple voting approach with machine learning from documents retrieved by automatic runs, we are able to identify a large portion of relevant documents that would normally only be found through manual runs. Our initial results are promising and can be extended in future studies to help test collection curators ensure proper judgment coverage is maintained across complete document collections.
- C. Buckley, D. Dimmick, I. Soboroff, and E. Voorhees. Bias and the limits of pooling for large collections. Information Retrieval, 10(6): 491--508, 2007. Google ScholarDigital Library
- S. Büttcher, C. L. A. Clarke, and I. Soboroff. The TREC 2006 terabyte track. In TREC-2006, volume 6, page 39, 2006.Google Scholar
- S. Büttcher, C. L. A. Clarke, P. C. K. Yeung, and I. Soboroff. Reliable information retrieval evaluation with incomplete and biased judgements. In SIGIR, pages 63--70, 2007. Google ScholarDigital Library
- B. Carterette, E. Gabrilovich, V. Josifovski, and D. Metzler. Measuring the reusability of test collections. In WSDM, pages 231--240, 2010. Google ScholarDigital Library
- G. V. Cormack, C. R. Palmer, and C. L. A. Clarke. Efficient construction of large test collections. In SIGIR, pages 282--289, 1998. Google ScholarDigital Library
- R. Fan, K. Chang, C. Hsieh, X. Wang, and C. Lin. LIBLINEAR: A library for large linear classification. Journal of Machine Learning Research, 9:1871--1874, June 2008. Google ScholarDigital Library
- R. Krovetz. Viewing morphology as an inference process. In SIGIR, pages 191--202, Pittsburgh, Pennsylvania, USA, 1993. Google ScholarDigital Library
- A. Moffat, W. Webber, and J. Zobel. Strategic system comparisons via targeted relevance judgments. In SIGIR, pages 375--382, 2007. Google ScholarDigital Library
- T. Sakai. The unreusability of diversified search test collections. In EVIA, June 2013.Google Scholar
- M. Sanderson. Test collection based evaluation of information retrieval systems. Foundations and Trends in Information Retrieval, 4 (4):247--375, 2010.Google ScholarCross Ref
- I. Soboroff and S. Robertson. Building a filtering test collection for TREC 2002. In SIGIR, pages 243--250, 2003. Google ScholarDigital Library
- K. Spärck Jones and C. J. Van Rijsbergen. Report on the need for and provision of an "ideal" information retrieval test collection. Technical report, British Library Research and Development Report 5266, 1975.Google Scholar
- E. M. Voorhees. Variations in relevance judgments and the measurement of retrieval effectiveness. Information processing & management, 36(5):697--716, 2000. Google ScholarDigital Library
- E. M. Voorhees. The philosophy of information retrieval evaluation. In Evaluation of cross-language information retrieval systems, pages 355--370. Springer, 2002. Google ScholarCross Ref
- E. M. Voorhees and D. K. Harman. TREC: Experiment and evaluation in information retrieval, volume 63. MIT press Cambridge, 2005. Google ScholarDigital Library
- J. Zobel. How reliable are the results of large-scale information retrieval experiments? In SIGIR, pages 307--314, 1998. Google ScholarDigital Library
Index Terms
- Extending test collection pools without manual runs
Recommendations
Improving test collection pools with machine learning
ADCS '14: Proceedings of the 19th Australasian Document Computing SymposiumIR experiments typically use test collections for evaluation. Such test collections are formed by judging a pool of documents retrieved by a combination of automatic and manual runs for each topic. The proportion of relevant documents found for each ...
Constructing test collections by inferring document relevance via extracted relevant information
CIKM '12: Proceedings of the 21st ACM international conference on Information and knowledge managementThe goal of a typical information retrieval system is to satisfy a user's information need---e.g., by providing an answer or information "nugget"---while the actual search space of a typical information retrieval system consists of documents---i.e., ...
Efficient Test Collection Construction via Active Learning
ICTIR '20: Proceedings of the 2020 ACM SIGIR on International Conference on Theory of Information RetrievalTo create a new IR test collection at low cost, it is valuable to carefully select which documents merit human relevance judgments. Shared task campaigns such as NIST TREC pool document rankings from many participating systems (and often interactive ...
Comments