Abstract
This work tackles the perennial problem of reproducible baselines in information retrieval research, focusing on bag-of-words ranking models. Although academic information retrieval researchers have a long history of building and sharing systems, they are primarily designed to facilitate the publication of research papers. As such, these systems are often incomplete, inflexible, poorly documented, difficult to use, and slow, particularly in the context of modern web-scale collections. Furthermore, the growing complexity of modern software ecosystems and the resource constraints most academic research groups operate under make maintaining open-source systems a constant struggle. However, except for a small number of companies (mostly commercial web search engines) that deploy custom infrastructure, Lucene has become the de facto platform in industry for building search applications. Lucene has an active developer base, a large audience of users, and diverse capabilities to work with heterogeneous collections at scale. However, it lacks systematic support for ad hoc experimentation using standard test collections. We describe Anserini, an information retrieval toolkit built on Lucene that fills this gap. Our goal is to simplify ad hoc experimentation and allow researchers to easily reproduce results with modern bag-of-words ranking models on diverse test collections. With Anserini, we demonstrate that Lucene provides a suitable framework for supporting information retrieval research. Experiments show that our system efficiently indexes large web collections, provides modern ranking models that are on par with research implementations in terms of effectiveness, and supports low-latency query evaluation to facilitate rapid experimentation
- Nasreen Abdul-Jaleel, James Allan, W. Bruce Croft, Fernando Diaz, Leah Larkey, Xiaoyan Li, Donald Metzler, Mark D. Smucker, Trevor Strohman, Howard Turtle, and Courtney Wade. 2004. UMass at TREC 2004: Novelty and HARD. In Proceedings of the 13th Text REtrieval Conference (TREC’04).Google ScholarCross Ref
- Maristella Agosti, Nicola Ferro, and Costantino Thanos. 2012. DESIRE 2011: Workshop on Data Infrastructures for Supporting Information Retrieval Evaluation. SIGIR Forum 46, 1 (2012), 51--55. Google ScholarDigital Library
- Maristella Agosti, Giorgio Maria Di Nunzio, and Nicola Ferro. 2007. Scientific data of an evaluation campaign: Do we properly deal with them? In Proceedings of the Conference on Evaluation of Cross-Language Information Retrieval Systems (CLEF’06). 11--20. Google ScholarDigital Library
- Gianni Amati and Cornelis Joost Van Rijsbergen. 2002. Probabilistic models of information retrieval based on measuring the divergence from randomness. ACM Trans. Info. Syst. 20, 4 (2002), 357--389. Google ScholarDigital Library
- Jaime Arguello, Matt Crane, Fernando Diaz, Jimmy Lin, and Andrew Trotman. 2015. Report on the SIGIR 2015 workshop on Reproducibility, Inexplicability, and Generalizability of Results (RIGOR). SIGIR Forum 49, 2 (2015), 107--116. Google ScholarDigital Library
- Timothy G. Armstrong, Alistair Moffat, William Webber, and Justin Zobel. 2009. EvaluatIR: An online tool for evaluating and comparing IR systems. In Proceedings of the 32nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’09). 834. Google ScholarDigital Library
- Timothy G. Armstrong, Alistair Moffat, William Webber, and Justin Zobel. 2009. Improvements that don’t add up: Ad-hoc retrieval results since 1998. In Proceedings of the 18th International Conference on Information and Knowledge Management (CIKM’09). 601--610. Google ScholarDigital Library
- Nima Asadi and Jimmy Lin. 2013. Effectiveness/efficiency tradeoffs for candidate generation in multi-stage retrieval architectures. In Proceedings of the 36th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’13). 997--1000. Google ScholarDigital Library
- Leif Azzopardi, Matt Crane, Hui Fang, Grant Ingersoll, Jimmy Lin, Yashar Moshfeghi, Harrisen Scells, Peilin Yang, and Guido Zuccon. 2017. The Lucene for Information Access and Retrieval Research (LIARR) workshop at SIGIR 2017. In Proceedings of the 40th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’17). 1429--1430. Google ScholarDigital Library
- Leif Azzopardi, Yashar Moshfeghi, Martin Halvey, Rami S. Alkhawaldeh, Krisztian Balog, Emanuele Di Buccio, Diego Ceccarelli, Juan M. Fernández-Luna, Charlie Hull, Jake Mannix, and Sauparna Palchowdhury. 2017. Lucene4IR: Developing information retrieval evaluation resources using Lucene. SIGIR Forum 50, 2 (2017), 58--75. Google ScholarDigital Library
- Michel Beigbeder and Wai Gen Yee. 2015. OSWIR 2005 Workshop, Final Report.Google Scholar
- Paolo Boldi and Sebastiano Vigna. 2005. MG4J at TREC 2005. In Proceedings of the 14th Text REtrieval Conference (TREC’05).Google Scholar
- Chris Buckley. 1985. Implementation of the SMART Information Retrieval System. Department of Computer Science TR 85-686. Cornell University. Google ScholarDigital Library
- Ben Carterette, Evangelos Kanoulas, Mark Hall, and Paul Clough. 2014. Overview of the TREC 2014 session track. In Proceedings of the 23rd Text REtrieval Conference (TREC’14).Google Scholar
- Marc-Allen Cartright, Samuel Huston, and Henry Feild. 2012. Galago: A modular distributed processing and retrieval system. In Proceedings of the SIGIR 2012 Workshop on Open Source Information Retrieval. 25--31.Google Scholar
- Ruey-Cheng Chen, Luke Gallagher, Roi Blanco, and J. Shane Culpepper. 2017. Efficient cost-aware cascade ranking in multi-stage retrieval. In Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’17). 445--454. Google ScholarDigital Library
- Charles L. A. Clarke, J. Shane Culpepper, and Alistair Moffat. 2016. Assessing efficiency--effectiveness tradeoffs in multi-stage retrieval systems without using relevance judgments. Info. Retriev. 19, 4 (2016), 351--377. Google ScholarDigital Library
- Stéphane Clinchant and Eric Gaussier. 2010. Information-based models for ad hoc IR. In Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’10). 234--241. Google ScholarDigital Library
- Matt Crane. 2018. Questionable answers in question answering research: Reproducibility and variability of published results. Trans. Assoc. Comput. Linguist. 6 (2018), 241--252.Google ScholarCross Ref
- Jens Dittrich and Patrick Bender. 2015. Janiform intra-document analytics for reproducible research. Proc. VLDB Endow. 8, 12 (2015), 1972--1975. Google ScholarDigital Library
- Chris Drummond. 2009. Replicability is not reproducibility: Nor is it good science. In Proceedings of the 4th Workshop on Evaluation Methods for Machine Learning at ICML.Google Scholar
- Hui Fang, Hao Wu, Peilin Yang, and ChengXiang Zhai. 2014. VIRLab: A web-based virtual lab for learning and studying information retrieval models. In Proceedings of the 37th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’14). 1249--1250. Google ScholarDigital Library
- Hui Fang and ChengXiang Zhai. 2005. An exploration of axiomatic approaches to information retrieval. In Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’05). 480--487. Google ScholarDigital Library
- Nicola Ferro, Norbert Fuhr, Kalervo Järvelin, Noriko Kando, Matthias Lippold, and Justin Zobel. 2016. Increasing reproducibility in IR: Findings from the Dagstuhl seminar on “reproducibility of data-oriented experiments in e-science.” SIGIR Forum 50, 1 (2016), 68--82. Google ScholarDigital Library
- Bill Howe. 2012. Virtual appliances, cloud computing, and reproducible research. Comput. Sci. Eng. 14, 4 (2012), 36--41. Google ScholarDigital Library
- Sadegh Kharazmi, Falk Scholer, David Vallet, and Mark Sanderson. 2016. Examining additivity and weak baselines. ACM Trans. Info. Syst. 34, 4 (2016), Article No. 23. Google ScholarDigital Library
- Victor Lavrenko and W. Bruce Croft. 2001. Relevance-based language models. In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’01). 120--127. Google ScholarDigital Library
- Hang Li. 2014. Learning to Rank for Information Retrieval and Natural Language Processing. Morgan 8 Claypool Publishers. Google ScholarDigital Library
- Jimmy Lin, Matt Crane, Andrew Trotman, Jamie Callan, Ishan Chattopadhyaya, John Foley, Grant Ingersoll, Craig Macdonald, and Sebastiano Vigna. 2016. Toward reproducible baselines: Open-Source IR Reproducibility Challenge. In Proceedings of the 38th European Conference on Information Retrieval (ECIR’16). 408--420.Google ScholarCross Ref
- Jimmy Lin, Donald Metzler, Tamer Elsayed, and Lidan Wang. 2009. Of Ivory and Smurfs: Loxodontan MapReduce experiments for web search. In Proceedings of the 18th Text REtrieval Conference (TREC’09).Google Scholar
- Jimmy Lin and Andrew Trotman. 2015. Anytime ranking for impact-ordered indexes. In Proceedings of the ACM International Conference on the Theory of Information Retrieval (ICTIR’15). 301--304. Google ScholarDigital Library
- Jimmy Lin and Peilin Yang. 2018. Repeatability corner cases in document ranking: The impact of score ties. arXiv:1807.05798.Google Scholar
- Shichen Liu, Fei Xiao, Wenwu Ou, and Luo Si. 2017. Cascade ranking for operational e-commerce search. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (SIGKDD’17). 1557--1565. Google ScholarDigital Library
- Yuanhua Lv and ChengXiang Zhai. 2011. Lower-bounding term frequency normalization. In Proceedings of the 20th ACM International Conference on Information and Knowledge Management (CIKM’11). 7--16. Google ScholarDigital Library
- Craig Macdonald, Richard McCreadie, Rodrygo L. T. Santos, and Iadh Ounis. 2012. From puppy to maturity: Experiences in developing Terrier. In Proceedings of the SIGIR 2012 Workshop on Open Source Information Retrieval. 60--63.Google Scholar
- Irina Matveeva, Chris Burges, Timo Burkard, Andy Laucius, and Leon Wong. 2006. High accuracy retrieval with multiple nested ranker. In Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’06). 437--444. Google ScholarDigital Library
- Jill P. Mesirov. 2010. Accessible reproducible research. Science 327, 5964 (2010), 415--416.Google Scholar
- Donald Metzler and W. Bruce Croft. 2004. Combining the language model and inference network approaches to retrieval. Info. Process. Manage. 40, 5 (2004), 735--750. Google ScholarDigital Library
- Donald Metzler, Trevor Strohman, Howard Turtle, and W. Bruce Croft. 2004. Indri at TREC 2004: Terabyte track. In Proceedings of the 13th Text REtrieval Conference (TREC’04).Google Scholar
- Bhaskar Mitra and Nick Craswell. 2017. Neural models for information retrieval. arXiv:1705.01509v1.Google Scholar
- Hannes Mühleisen, Thaer Samar, Jimmy Lin, and Arjen de Vries. 2014. Old dogs are great at new tricks: Column stores for IR prototyping. In Proceedings of the 37th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’14). 863--866. Google ScholarDigital Library
- Iadh Ounis, Gianni Amati, Vassilis Plachouras, Ben He, Craig Macdonald, and Christina Lioma. 2006. Terrier: A high performance and scalable information retrieval platform. In Proceedings of the SIGIR 2006 Workshop on Open Source Information Retrieval. Google ScholarDigital Library
- Jan Pedersen. 2010. Query understanding at Bing. In Industry Track Keynote at the 33rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’10).Google Scholar
- Roger D. Peng. 2011. Reproducible research in computational science. Science 334, 6060 (2011), 1226--1227.Google ScholarCross Ref
- Jay M. Ponte and W. Bruce Croft. 1998. A language modeling approach to information retrieval. In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’98). 275--281. Google ScholarDigital Library
- Karthik Ram. 2013. Git can facilitate greater reproducibility and increased transparency in science. Source Code Biol. Med. 8, 7 (2013).Google Scholar
- Stephen E. Robertson, Steve Walker, Susan Jones, Micheline Hancock-Beaulieu, and Mike Gatford. 1994. Okapi at TREC-3. In Proceedings of the 3rd Text REtrieval Conference (TREC’94). 109--126.Google Scholar
- Royal Sequiera, Gaurav Baruah, Zhucheng Tu, Salman Mohammed, Jinfeng Rao, Haotian Zhang, and Jimmy Lin. 2017. Exploring the effectiveness of convolutional neural networks for answer selection in end-to-end question answering. In Proceedings of the SIGIR 2017 Workshop on Neural Information Retrieval (Neu-IR’17).Google Scholar
- Nicola Tonellotto, Craig Macdonald, and Iadh Ounis. 2013. Efficient and effective retrieval using selective pruning. In Proceedings of the Sixth ACM International Conference on Web Search and Data Mining (WSDM’13). 63--72. Google ScholarDigital Library
- Andrew Trotman, Charles L. A. Clarke, Iadh Ounis, Shane Culpepper, Marc-Allen Cartright, and Shlomo Geva. 2012. Open Source Information Retrieval: A report on the SIGIR 2012 workshop. SIGIR Forum 46, 2 (2012), 95--101. Google ScholarDigital Library
- Andrew Trotman, Xiang-Fei Jia, and Matt Crane. 2012. Towards an efficient and effective search engine. In Proceedings of the SIGIR 2012 Workshop on Open Source Information Retrieval. 40--47.Google Scholar
- Andrew Trotman, Antti Puurula, and Blake Burgess. 2014. Improvements to BM25 and language models examined. In Proceedings of the 2014 Australasian Document Computing Symposium (ADCS’14). 58--66. Google ScholarDigital Library
- Zhucheng Tu, Matt Crane, Royal Sequiera, Junchen Zhang, and Jimmy Lin. 2017. An exploration of approaches to integrating neural reranking models in multi-stage ranking architectures. In Proceedings of the SIGIR 2017 Workshop on Neural Information Retrieval (Neu-IR’17).Google Scholar
- Howard Turtle, Yatish Hegde, and Steven A. Rowe. 2012. Yet another comparison of Lucene and Indri performance. In Proceedings of the SIGIR 2012 Workshop on Open Source Information Retrieval. 64--67.Google Scholar
- Lidan Wang, Jimmy Lin, and Donald Metzler. 2011. A cascade ranking model for efficient ranked retrieval. In Proceedings of the 34th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’11). 105--114. Google ScholarDigital Library
- Peilin Yang and Hui Fang. 2016. A reproducibility study of information retrieval models. In Proceedings of the 2nd ACM International Conference on the Theory of Information Retrieval (ICTIR’16). 77--86. Google ScholarDigital Library
- Peilin Yang, Hui Fang, and Jimmy Lin. 2017. Anserini: Enabling the use of Lucene for information retrieval research. In Proceedings of the 40th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’17). 1253--1256. Google ScholarDigital Library
- Wai Gen Yee, Michel Beigbeder, and Wray Buntine. 2006. SIGIR06 workshop report: Open source information retrieval systems (OSIR06). SIGIR Forum 40, 2 (2006), 61--65.Google ScholarDigital Library
- Chengxiang Zhai and John Lafferty. 2004. A study of smoothing methods for language models applied to information retrieval. ACM Trans. Info. Syst. 22, 2 (2004), 179--214. Google ScholarDigital Library
- Stefan Büttcher, Charles L. A. Clarke, and Gordon V. Cormack. 2010. Information Retrieval: Implementing and Evaluating Search Engines. MIT Press. Google ScholarDigital Library
- Bodo Billerbeck, Adam Cannane, Abhijit Chattaraj, Nicholas Lester, William Webber, Hugh E. Williams, John Yiannis, and Justin Zobel. 2004. RMIT University at TREC 2004. In Proceedings of the 13th Text REtrieval Conference (TREC'04).Google Scholar
Index Terms
- Anserini: Reproducible Ranking Baselines Using Lucene
Recommendations
Anserini: Enabling the Use of Lucene for Information Retrieval Research
SIGIR '17: Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information RetrievalSoftware toolkits play an essential role in information retrieval research. Most open-source toolkits developed by academics are designed to facilitate the evaluation of retrieval models over standard test collections. Efforts are generally directed ...
ChatNoir: a search engine for the ClueWeb09 corpus
SIGIR '12: Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrievalWe present the ChatNoir search engine which indexes the entire English part of the ClueWeb09 corpus. Besides Carnegie Mellon's Indri system, ChatNoir is the second publicly available search engine for this corpus. It implements the classic BM25F ...
Ranking document clusters using markov random fields
SIGIR '13: Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrievalAn important challenge in cluster-based document retrieval is ranking document clusters by their relevance to the query. We present a novel cluster ranking approach that utilizes Markov Random Fields (MRFs). MRFs enable the integration of various types ...
Comments