Anserini: Reproducible Ranking Baselines Using Lucene

Authors:
Peilin Yang

University of Delaware, Newark, DE, USA

University of Delaware, Newark, DE, USA
View Profile

,
Hui Fang

University of Delaware, Newark, DE, USA

University of Delaware, Newark, DE, USA
View Profile

,
Jimmy Lin

University of Waterloo, Waterloo, ON, Canada

University of Waterloo, Waterloo, ON, Canada
View Profile

Authors Info & Claims

Journal of Data and Information Quality Volume 10 Issue 4Article No.: 16pp 1–20https://doi.org/10.1145/3239571

Published:29 October 2018Publication History

Journal of Data and Information Quality

Abstract

This work tackles the perennial problem of reproducible baselines in information retrieval research, focusing on bag-of-words ranking models. Although academic information retrieval researchers have a long history of building and sharing systems, they are primarily designed to facilitate the publication of research papers. As such, these systems are often incomplete, inflexible, poorly documented, difficult to use, and slow, particularly in the context of modern web-scale collections. Furthermore, the growing complexity of modern software ecosystems and the resource constraints most academic research groups operate under make maintaining open-source systems a constant struggle. However, except for a small number of companies (mostly commercial web search engines) that deploy custom infrastructure, Lucene has become the de facto platform in industry for building search applications. Lucene has an active developer base, a large audience of users, and diverse capabilities to work with heterogeneous collections at scale. However, it lacks systematic support for ad hoc experimentation using standard test collections. We describe Anserini, an information retrieval toolkit built on Lucene that fills this gap. Our goal is to simplify ad hoc experimentation and allow researchers to easily reproduce results with modern bag-of-words ranking models on diverse test collections. With Anserini, we demonstrate that Lucene provides a suitable framework for supporting information retrieval research. Experiments show that our system efficiently indexes large web collections, provides modern ranking models that are on par with research implementations in terms of effectiveness, and supports low-latency query evaluation to facilitate rapid experimentation

References

Nasreen Abdul-Jaleel, James Allan, W. Bruce Croft, Fernando Diaz, Leah Larkey, Xiaoyan Li, Donald Metzler, Mark D. Smucker, Trevor Strohman, Howard Turtle, and Courtney Wade. 2004. UMass at TREC 2004: Novelty and HARD. In Proceedings of the 13th Text REtrieval Conference (TREC’04).Google ScholarCross Ref
Maristella Agosti, Nicola Ferro, and Costantino Thanos. 2012. DESIRE 2011: Workshop on Data Infrastructures for Supporting Information Retrieval Evaluation. SIGIR Forum 46, 1 (2012), 51--55. Google ScholarDigital Library
Maristella Agosti, Giorgio Maria Di Nunzio, and Nicola Ferro. 2007. Scientific data of an evaluation campaign: Do we properly deal with them? In Proceedings of the Conference on Evaluation of Cross-Language Information Retrieval Systems (CLEF’06). 11--20. Google ScholarDigital Library
Gianni Amati and Cornelis Joost Van Rijsbergen. 2002. Probabilistic models of information retrieval based on measuring the divergence from randomness. ACM Trans. Info. Syst. 20, 4 (2002), 357--389. Google ScholarDigital Library
Jaime Arguello, Matt Crane, Fernando Diaz, Jimmy Lin, and Andrew Trotman. 2015. Report on the SIGIR 2015 workshop on Reproducibility, Inexplicability, and Generalizability of Results (RIGOR). SIGIR Forum 49, 2 (2015), 107--116. Google ScholarDigital Library
Timothy G. Armstrong, Alistair Moffat, William Webber, and Justin Zobel. 2009. EvaluatIR: An online tool for evaluating and comparing IR systems. In Proceedings of the 32nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’09). 834. Google ScholarDigital Library
Timothy G. Armstrong, Alistair Moffat, William Webber, and Justin Zobel. 2009. Improvements that don’t add up: Ad-hoc retrieval results since 1998. In Proceedings of the 18th International Conference on Information and Knowledge Management (CIKM’09). 601--610. Google ScholarDigital Library
Nima Asadi and Jimmy Lin. 2013. Effectiveness/efficiency tradeoffs for candidate generation in multi-stage retrieval architectures. In Proceedings of the 36th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’13). 997--1000. Google ScholarDigital Library
Leif Azzopardi, Matt Crane, Hui Fang, Grant Ingersoll, Jimmy Lin, Yashar Moshfeghi, Harrisen Scells, Peilin Yang, and Guido Zuccon. 2017. The Lucene for Information Access and Retrieval Research (LIARR) workshop at SIGIR 2017. In Proceedings of the 40th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’17). 1429--1430. Google ScholarDigital Library
Leif Azzopardi, Yashar Moshfeghi, Martin Halvey, Rami S. Alkhawaldeh, Krisztian Balog, Emanuele Di Buccio, Diego Ceccarelli, Juan M. Fernández-Luna, Charlie Hull, Jake Mannix, and Sauparna Palchowdhury. 2017. Lucene4IR: Developing information retrieval evaluation resources using Lucene. SIGIR Forum 50, 2 (2017), 58--75. Google ScholarDigital Library
Michel Beigbeder and Wai Gen Yee. 2015. OSWIR 2005 Workshop, Final Report.Google Scholar
Paolo Boldi and Sebastiano Vigna. 2005. MG4J at TREC 2005. In Proceedings of the 14th Text REtrieval Conference (TREC’05).Google Scholar
Chris Buckley. 1985. Implementation of the SMART Information Retrieval System. Department of Computer Science TR 85-686. Cornell University. Google ScholarDigital Library
Ben Carterette, Evangelos Kanoulas, Mark Hall, and Paul Clough. 2014. Overview of the TREC 2014 session track. In Proceedings of the 23rd Text REtrieval Conference (TREC’14).Google Scholar
Marc-Allen Cartright, Samuel Huston, and Henry Feild. 2012. Galago: A modular distributed processing and retrieval system. In Proceedings of the SIGIR 2012 Workshop on Open Source Information Retrieval. 25--31.Google Scholar
Ruey-Cheng Chen, Luke Gallagher, Roi Blanco, and J. Shane Culpepper. 2017. Efficient cost-aware cascade ranking in multi-stage retrieval. In Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’17). 445--454. Google ScholarDigital Library
Charles L. A. Clarke, J. Shane Culpepper, and Alistair Moffat. 2016. Assessing efficiency--effectiveness tradeoffs in multi-stage retrieval systems without using relevance judgments. Info. Retriev. 19, 4 (2016), 351--377. Google ScholarDigital Library
Stéphane Clinchant and Eric Gaussier. 2010. Information-based models for ad hoc IR. In Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’10). 234--241. Google ScholarDigital Library
Matt Crane. 2018. Questionable answers in question answering research: Reproducibility and variability of published results. Trans. Assoc. Comput. Linguist. 6 (2018), 241--252.Google ScholarCross Ref
Jens Dittrich and Patrick Bender. 2015. Janiform intra-document analytics for reproducible research. Proc. VLDB Endow. 8, 12 (2015), 1972--1975. Google ScholarDigital Library
Chris Drummond. 2009. Replicability is not reproducibility: Nor is it good science. In Proceedings of the 4th Workshop on Evaluation Methods for Machine Learning at ICML.Google Scholar
Hui Fang, Hao Wu, Peilin Yang, and ChengXiang Zhai. 2014. VIRLab: A web-based virtual lab for learning and studying information retrieval models. In Proceedings of the 37th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’14). 1249--1250. Google ScholarDigital Library
Hui Fang and ChengXiang Zhai. 2005. An exploration of axiomatic approaches to information retrieval. In Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’05). 480--487. Google ScholarDigital Library
Nicola Ferro, Norbert Fuhr, Kalervo Järvelin, Noriko Kando, Matthias Lippold, and Justin Zobel. 2016. Increasing reproducibility in IR: Findings from the Dagstuhl seminar on “reproducibility of data-oriented experiments in e-science.” SIGIR Forum 50, 1 (2016), 68--82. Google ScholarDigital Library
Bill Howe. 2012. Virtual appliances, cloud computing, and reproducible research. Comput. Sci. Eng. 14, 4 (2012), 36--41. Google ScholarDigital Library
Sadegh Kharazmi, Falk Scholer, David Vallet, and Mark Sanderson. 2016. Examining additivity and weak baselines. ACM Trans. Info. Syst. 34, 4 (2016), Article No. 23. Google ScholarDigital Library
Victor Lavrenko and W. Bruce Croft. 2001. Relevance-based language models. In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’01). 120--127. Google ScholarDigital Library
Hang Li. 2014. Learning to Rank for Information Retrieval and Natural Language Processing. Morgan 8 Claypool Publishers. Google ScholarDigital Library
Jimmy Lin, Matt Crane, Andrew Trotman, Jamie Callan, Ishan Chattopadhyaya, John Foley, Grant Ingersoll, Craig Macdonald, and Sebastiano Vigna. 2016. Toward reproducible baselines: Open-Source IR Reproducibility Challenge. In Proceedings of the 38th European Conference on Information Retrieval (ECIR’16). 408--420.Google ScholarCross Ref
Jimmy Lin, Donald Metzler, Tamer Elsayed, and Lidan Wang. 2009. Of Ivory and Smurfs: Loxodontan MapReduce experiments for web search. In Proceedings of the 18th Text REtrieval Conference (TREC’09).Google Scholar
Jimmy Lin and Andrew Trotman. 2015. Anytime ranking for impact-ordered indexes. In Proceedings of the ACM International Conference on the Theory of Information Retrieval (ICTIR’15). 301--304. Google ScholarDigital Library
Jimmy Lin and Peilin Yang. 2018. Repeatability corner cases in document ranking: The impact of score ties. arXiv:1807.05798.Google Scholar
Shichen Liu, Fei Xiao, Wenwu Ou, and Luo Si. 2017. Cascade ranking for operational e-commerce search. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (SIGKDD’17). 1557--1565. Google ScholarDigital Library
Yuanhua Lv and ChengXiang Zhai. 2011. Lower-bounding term frequency normalization. In Proceedings of the 20th ACM International Conference on Information and Knowledge Management (CIKM’11). 7--16. Google ScholarDigital Library
Craig Macdonald, Richard McCreadie, Rodrygo L. T. Santos, and Iadh Ounis. 2012. From puppy to maturity: Experiences in developing Terrier. In Proceedings of the SIGIR 2012 Workshop on Open Source Information Retrieval. 60--63.Google Scholar
Irina Matveeva, Chris Burges, Timo Burkard, Andy Laucius, and Leon Wong. 2006. High accuracy retrieval with multiple nested ranker. In Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’06). 437--444. Google ScholarDigital Library
Jill P. Mesirov. 2010. Accessible reproducible research. Science 327, 5964 (2010), 415--416.Google Scholar
Donald Metzler and W. Bruce Croft. 2004. Combining the language model and inference network approaches to retrieval. Info. Process. Manage. 40, 5 (2004), 735--750. Google ScholarDigital Library
Donald Metzler, Trevor Strohman, Howard Turtle, and W. Bruce Croft. 2004. Indri at TREC 2004: Terabyte track. In Proceedings of the 13th Text REtrieval Conference (TREC’04).Google Scholar
Bhaskar Mitra and Nick Craswell. 2017. Neural models for information retrieval. arXiv:1705.01509v1.Google Scholar
Hannes Mühleisen, Thaer Samar, Jimmy Lin, and Arjen de Vries. 2014. Old dogs are great at new tricks: Column stores for IR prototyping. In Proceedings of the 37th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’14). 863--866. Google ScholarDigital Library
Iadh Ounis, Gianni Amati, Vassilis Plachouras, Ben He, Craig Macdonald, and Christina Lioma. 2006. Terrier: A high performance and scalable information retrieval platform. In Proceedings of the SIGIR 2006 Workshop on Open Source Information Retrieval. Google ScholarDigital Library
Jan Pedersen. 2010. Query understanding at Bing. In Industry Track Keynote at the 33rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’10).Google Scholar
Roger D. Peng. 2011. Reproducible research in computational science. Science 334, 6060 (2011), 1226--1227.Google ScholarCross Ref
Jay M. Ponte and W. Bruce Croft. 1998. A language modeling approach to information retrieval. In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’98). 275--281. Google ScholarDigital Library
Karthik Ram. 2013. Git can facilitate greater reproducibility and increased transparency in science. Source Code Biol. Med. 8, 7 (2013).Google Scholar
Stephen E. Robertson, Steve Walker, Susan Jones, Micheline Hancock-Beaulieu, and Mike Gatford. 1994. Okapi at TREC-3. In Proceedings of the 3rd Text REtrieval Conference (TREC’94). 109--126.Google Scholar
Royal Sequiera, Gaurav Baruah, Zhucheng Tu, Salman Mohammed, Jinfeng Rao, Haotian Zhang, and Jimmy Lin. 2017. Exploring the effectiveness of convolutional neural networks for answer selection in end-to-end question answering. In Proceedings of the SIGIR 2017 Workshop on Neural Information Retrieval (Neu-IR’17).Google Scholar
Nicola Tonellotto, Craig Macdonald, and Iadh Ounis. 2013. Efficient and effective retrieval using selective pruning. In Proceedings of the Sixth ACM International Conference on Web Search and Data Mining (WSDM’13). 63--72. Google ScholarDigital Library
Andrew Trotman, Charles L. A. Clarke, Iadh Ounis, Shane Culpepper, Marc-Allen Cartright, and Shlomo Geva. 2012. Open Source Information Retrieval: A report on the SIGIR 2012 workshop. SIGIR Forum 46, 2 (2012), 95--101. Google ScholarDigital Library
Andrew Trotman, Xiang-Fei Jia, and Matt Crane. 2012. Towards an efficient and effective search engine. In Proceedings of the SIGIR 2012 Workshop on Open Source Information Retrieval. 40--47.Google Scholar
Andrew Trotman, Antti Puurula, and Blake Burgess. 2014. Improvements to BM25 and language models examined. In Proceedings of the 2014 Australasian Document Computing Symposium (ADCS’14). 58--66. Google ScholarDigital Library
Zhucheng Tu, Matt Crane, Royal Sequiera, Junchen Zhang, and Jimmy Lin. 2017. An exploration of approaches to integrating neural reranking models in multi-stage ranking architectures. In Proceedings of the SIGIR 2017 Workshop on Neural Information Retrieval (Neu-IR’17).Google Scholar
Howard Turtle, Yatish Hegde, and Steven A. Rowe. 2012. Yet another comparison of Lucene and Indri performance. In Proceedings of the SIGIR 2012 Workshop on Open Source Information Retrieval. 64--67.Google Scholar
Lidan Wang, Jimmy Lin, and Donald Metzler. 2011. A cascade ranking model for efficient ranked retrieval. In Proceedings of the 34th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’11). 105--114. Google ScholarDigital Library
Peilin Yang and Hui Fang. 2016. A reproducibility study of information retrieval models. In Proceedings of the 2nd ACM International Conference on the Theory of Information Retrieval (ICTIR’16). 77--86. Google ScholarDigital Library
Peilin Yang, Hui Fang, and Jimmy Lin. 2017. Anserini: Enabling the use of Lucene for information retrieval research. In Proceedings of the 40th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’17). 1253--1256. Google ScholarDigital Library
Wai Gen Yee, Michel Beigbeder, and Wray Buntine. 2006. SIGIR06 workshop report: Open source information retrieval systems (OSIR06). SIGIR Forum 40, 2 (2006), 61--65.Google ScholarDigital Library
Chengxiang Zhai and John Lafferty. 2004. A study of smoothing methods for language models applied to information retrieval. ACM Trans. Info. Syst. 22, 2 (2004), 179--214. Google ScholarDigital Library
Stefan Büttcher, Charles L. A. Clarke, and Gordon V. Cormack. 2010. Information Retrieval: Implementing and Evaluating Search Engines. MIT Press. Google ScholarDigital Library
Bodo Billerbeck, Adam Cannane, Abhijit Chattaraj, Nicholas Lester, William Webber, Hugh E. Williams, John Yiannis, and Justin Zobel. 2004. RMIT University at TREC 2004. In Proceedings of the 13th Text REtrieval Conference (TREC'04).Google Scholar

Index Terms

Anserini: Reproducible Ranking Baselines Using Lucene
1. Information systems
  1. Information retrieval

Recommendations

Anserini: Enabling the Use of Lucene for Information Retrieval Research
SIGIR '17: Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval

Software toolkits play an essential role in information retrieval research. Most open-source toolkits developed by academics are designed to facilitate the evaluation of retrieval models over standard test collections. Efforts are generally directed ...
Read More
ChatNoir: a search engine for the ClueWeb09 corpus
SIGIR '12: Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval

We present the ChatNoir search engine which indexes the entire English part of the ClueWeb09 corpus. Besides Carnegie Mellon's Indri system, ChatNoir is the second publicly available search engine for this corpus. It implements the classic BM25F ...
Read More
Ranking document clusters using markov random fields
SIGIR '13: Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval

An important challenge in cluster-based document retrieval is ranking document clusters by their relevance to the query. We present a novel cluster ranking approach that utilizes Markov Random Fields (MRFs). MRFs enable the integration of various types ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
Journal of Data and Information Quality Volume 10, Issue 4
Reproducibility in Information Retrieval:Tools and Infrastructures
December 2018
106 pages
ISSN:1936-1955
EISSN:1936-1963
DOI:10.1145/3289400
Editor:
Tiziana Catarci
Sapienza University of Rome, Rome, Italy
Issue’s Table of Contents
Copyright © 2018 Owner/Author
This work is licensed under a Creative Commons Attribution International 4.0 License.
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 29 October 2018
- Accepted: 1 July 2018
- Revised: 1 April 2018
- Received: 1 October 2017
Published in jdiq Volume 10, Issue 4

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Ad hoc retrieval
TREC
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 105
  Total Citations
  View Citations
- 2,193
  Total Downloads
- Downloads (Last 12 months)324
- Downloads (Last 6 weeks)30
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Anserini: Reproducible Ranking Baselines Using Lucene

Journal of Data and Information Quality

Abstract

References

Cited By

Index Terms

Recommendations

Anserini: Enabling the Use of Lucene for Information Retrieval Research

ChatNoir: a search engine for the ClueWeb09 corpus

Ranking document clusters using markov random fields