skip to main content
research-article
Open Access

Anserini: Reproducible Ranking Baselines Using Lucene

Published:29 October 2018Publication History
Skip Abstract Section

Abstract

This work tackles the perennial problem of reproducible baselines in information retrieval research, focusing on bag-of-words ranking models. Although academic information retrieval researchers have a long history of building and sharing systems, they are primarily designed to facilitate the publication of research papers. As such, these systems are often incomplete, inflexible, poorly documented, difficult to use, and slow, particularly in the context of modern web-scale collections. Furthermore, the growing complexity of modern software ecosystems and the resource constraints most academic research groups operate under make maintaining open-source systems a constant struggle. However, except for a small number of companies (mostly commercial web search engines) that deploy custom infrastructure, Lucene has become the de facto platform in industry for building search applications. Lucene has an active developer base, a large audience of users, and diverse capabilities to work with heterogeneous collections at scale. However, it lacks systematic support for ad hoc experimentation using standard test collections. We describe Anserini, an information retrieval toolkit built on Lucene that fills this gap. Our goal is to simplify ad hoc experimentation and allow researchers to easily reproduce results with modern bag-of-words ranking models on diverse test collections. With Anserini, we demonstrate that Lucene provides a suitable framework for supporting information retrieval research. Experiments show that our system efficiently indexes large web collections, provides modern ranking models that are on par with research implementations in terms of effectiveness, and supports low-latency query evaluation to facilitate rapid experimentation

References

  1. Nasreen Abdul-Jaleel, James Allan, W. Bruce Croft, Fernando Diaz, Leah Larkey, Xiaoyan Li, Donald Metzler, Mark D. Smucker, Trevor Strohman, Howard Turtle, and Courtney Wade. 2004. UMass at TREC 2004: Novelty and HARD. In Proceedings of the 13th Text REtrieval Conference (TREC’04).Google ScholarGoogle ScholarCross RefCross Ref
  2. Maristella Agosti, Nicola Ferro, and Costantino Thanos. 2012. DESIRE 2011: Workshop on Data Infrastructures for Supporting Information Retrieval Evaluation. SIGIR Forum 46, 1 (2012), 51--55. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Maristella Agosti, Giorgio Maria Di Nunzio, and Nicola Ferro. 2007. Scientific data of an evaluation campaign: Do we properly deal with them? In Proceedings of the Conference on Evaluation of Cross-Language Information Retrieval Systems (CLEF’06). 11--20. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Gianni Amati and Cornelis Joost Van Rijsbergen. 2002. Probabilistic models of information retrieval based on measuring the divergence from randomness. ACM Trans. Info. Syst. 20, 4 (2002), 357--389. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Jaime Arguello, Matt Crane, Fernando Diaz, Jimmy Lin, and Andrew Trotman. 2015. Report on the SIGIR 2015 workshop on Reproducibility, Inexplicability, and Generalizability of Results (RIGOR). SIGIR Forum 49, 2 (2015), 107--116. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Timothy G. Armstrong, Alistair Moffat, William Webber, and Justin Zobel. 2009. EvaluatIR: An online tool for evaluating and comparing IR systems. In Proceedings of the 32nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’09). 834. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Timothy G. Armstrong, Alistair Moffat, William Webber, and Justin Zobel. 2009. Improvements that don’t add up: Ad-hoc retrieval results since 1998. In Proceedings of the 18th International Conference on Information and Knowledge Management (CIKM’09). 601--610. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Nima Asadi and Jimmy Lin. 2013. Effectiveness/efficiency tradeoffs for candidate generation in multi-stage retrieval architectures. In Proceedings of the 36th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’13). 997--1000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Leif Azzopardi, Matt Crane, Hui Fang, Grant Ingersoll, Jimmy Lin, Yashar Moshfeghi, Harrisen Scells, Peilin Yang, and Guido Zuccon. 2017. The Lucene for Information Access and Retrieval Research (LIARR) workshop at SIGIR 2017. In Proceedings of the 40th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’17). 1429--1430. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Leif Azzopardi, Yashar Moshfeghi, Martin Halvey, Rami S. Alkhawaldeh, Krisztian Balog, Emanuele Di Buccio, Diego Ceccarelli, Juan M. Fernández-Luna, Charlie Hull, Jake Mannix, and Sauparna Palchowdhury. 2017. Lucene4IR: Developing information retrieval evaluation resources using Lucene. SIGIR Forum 50, 2 (2017), 58--75. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Michel Beigbeder and Wai Gen Yee. 2015. OSWIR 2005 Workshop, Final Report.Google ScholarGoogle Scholar
  12. Paolo Boldi and Sebastiano Vigna. 2005. MG4J at TREC 2005. In Proceedings of the 14th Text REtrieval Conference (TREC’05).Google ScholarGoogle Scholar
  13. Chris Buckley. 1985. Implementation of the SMART Information Retrieval System. Department of Computer Science TR 85-686. Cornell University. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Ben Carterette, Evangelos Kanoulas, Mark Hall, and Paul Clough. 2014. Overview of the TREC 2014 session track. In Proceedings of the 23rd Text REtrieval Conference (TREC’14).Google ScholarGoogle Scholar
  15. Marc-Allen Cartright, Samuel Huston, and Henry Feild. 2012. Galago: A modular distributed processing and retrieval system. In Proceedings of the SIGIR 2012 Workshop on Open Source Information Retrieval. 25--31.Google ScholarGoogle Scholar
  16. Ruey-Cheng Chen, Luke Gallagher, Roi Blanco, and J. Shane Culpepper. 2017. Efficient cost-aware cascade ranking in multi-stage retrieval. In Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’17). 445--454. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Charles L. A. Clarke, J. Shane Culpepper, and Alistair Moffat. 2016. Assessing efficiency--effectiveness tradeoffs in multi-stage retrieval systems without using relevance judgments. Info. Retriev. 19, 4 (2016), 351--377. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Stéphane Clinchant and Eric Gaussier. 2010. Information-based models for ad hoc IR. In Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’10). 234--241. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Matt Crane. 2018. Questionable answers in question answering research: Reproducibility and variability of published results. Trans. Assoc. Comput. Linguist. 6 (2018), 241--252.Google ScholarGoogle ScholarCross RefCross Ref
  20. Jens Dittrich and Patrick Bender. 2015. Janiform intra-document analytics for reproducible research. Proc. VLDB Endow. 8, 12 (2015), 1972--1975. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Chris Drummond. 2009. Replicability is not reproducibility: Nor is it good science. In Proceedings of the 4th Workshop on Evaluation Methods for Machine Learning at ICML.Google ScholarGoogle Scholar
  22. Hui Fang, Hao Wu, Peilin Yang, and ChengXiang Zhai. 2014. VIRLab: A web-based virtual lab for learning and studying information retrieval models. In Proceedings of the 37th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’14). 1249--1250. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Hui Fang and ChengXiang Zhai. 2005. An exploration of axiomatic approaches to information retrieval. In Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’05). 480--487. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Nicola Ferro, Norbert Fuhr, Kalervo Järvelin, Noriko Kando, Matthias Lippold, and Justin Zobel. 2016. Increasing reproducibility in IR: Findings from the Dagstuhl seminar on “reproducibility of data-oriented experiments in e-science.” SIGIR Forum 50, 1 (2016), 68--82. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Bill Howe. 2012. Virtual appliances, cloud computing, and reproducible research. Comput. Sci. Eng. 14, 4 (2012), 36--41. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Sadegh Kharazmi, Falk Scholer, David Vallet, and Mark Sanderson. 2016. Examining additivity and weak baselines. ACM Trans. Info. Syst. 34, 4 (2016), Article No. 23. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Victor Lavrenko and W. Bruce Croft. 2001. Relevance-based language models. In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’01). 120--127. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Hang Li. 2014. Learning to Rank for Information Retrieval and Natural Language Processing. Morgan 8 Claypool Publishers. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Jimmy Lin, Matt Crane, Andrew Trotman, Jamie Callan, Ishan Chattopadhyaya, John Foley, Grant Ingersoll, Craig Macdonald, and Sebastiano Vigna. 2016. Toward reproducible baselines: Open-Source IR Reproducibility Challenge. In Proceedings of the 38th European Conference on Information Retrieval (ECIR’16). 408--420.Google ScholarGoogle ScholarCross RefCross Ref
  30. Jimmy Lin, Donald Metzler, Tamer Elsayed, and Lidan Wang. 2009. Of Ivory and Smurfs: Loxodontan MapReduce experiments for web search. In Proceedings of the 18th Text REtrieval Conference (TREC’09).Google ScholarGoogle Scholar
  31. Jimmy Lin and Andrew Trotman. 2015. Anytime ranking for impact-ordered indexes. In Proceedings of the ACM International Conference on the Theory of Information Retrieval (ICTIR’15). 301--304. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Jimmy Lin and Peilin Yang. 2018. Repeatability corner cases in document ranking: The impact of score ties. arXiv:1807.05798.Google ScholarGoogle Scholar
  33. Shichen Liu, Fei Xiao, Wenwu Ou, and Luo Si. 2017. Cascade ranking for operational e-commerce search. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (SIGKDD’17). 1557--1565. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Yuanhua Lv and ChengXiang Zhai. 2011. Lower-bounding term frequency normalization. In Proceedings of the 20th ACM International Conference on Information and Knowledge Management (CIKM’11). 7--16. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Craig Macdonald, Richard McCreadie, Rodrygo L. T. Santos, and Iadh Ounis. 2012. From puppy to maturity: Experiences in developing Terrier. In Proceedings of the SIGIR 2012 Workshop on Open Source Information Retrieval. 60--63.Google ScholarGoogle Scholar
  36. Irina Matveeva, Chris Burges, Timo Burkard, Andy Laucius, and Leon Wong. 2006. High accuracy retrieval with multiple nested ranker. In Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’06). 437--444. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Jill P. Mesirov. 2010. Accessible reproducible research. Science 327, 5964 (2010), 415--416.Google ScholarGoogle Scholar
  38. Donald Metzler and W. Bruce Croft. 2004. Combining the language model and inference network approaches to retrieval. Info. Process. Manage. 40, 5 (2004), 735--750. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Donald Metzler, Trevor Strohman, Howard Turtle, and W. Bruce Croft. 2004. Indri at TREC 2004: Terabyte track. In Proceedings of the 13th Text REtrieval Conference (TREC’04).Google ScholarGoogle Scholar
  40. Bhaskar Mitra and Nick Craswell. 2017. Neural models for information retrieval. arXiv:1705.01509v1.Google ScholarGoogle Scholar
  41. Hannes Mühleisen, Thaer Samar, Jimmy Lin, and Arjen de Vries. 2014. Old dogs are great at new tricks: Column stores for IR prototyping. In Proceedings of the 37th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’14). 863--866. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Iadh Ounis, Gianni Amati, Vassilis Plachouras, Ben He, Craig Macdonald, and Christina Lioma. 2006. Terrier: A high performance and scalable information retrieval platform. In Proceedings of the SIGIR 2006 Workshop on Open Source Information Retrieval. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Jan Pedersen. 2010. Query understanding at Bing. In Industry Track Keynote at the 33rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’10).Google ScholarGoogle Scholar
  44. Roger D. Peng. 2011. Reproducible research in computational science. Science 334, 6060 (2011), 1226--1227.Google ScholarGoogle ScholarCross RefCross Ref
  45. Jay M. Ponte and W. Bruce Croft. 1998. A language modeling approach to information retrieval. In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’98). 275--281. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Karthik Ram. 2013. Git can facilitate greater reproducibility and increased transparency in science. Source Code Biol. Med. 8, 7 (2013).Google ScholarGoogle Scholar
  47. Stephen E. Robertson, Steve Walker, Susan Jones, Micheline Hancock-Beaulieu, and Mike Gatford. 1994. Okapi at TREC-3. In Proceedings of the 3rd Text REtrieval Conference (TREC’94). 109--126.Google ScholarGoogle Scholar
  48. Royal Sequiera, Gaurav Baruah, Zhucheng Tu, Salman Mohammed, Jinfeng Rao, Haotian Zhang, and Jimmy Lin. 2017. Exploring the effectiveness of convolutional neural networks for answer selection in end-to-end question answering. In Proceedings of the SIGIR 2017 Workshop on Neural Information Retrieval (Neu-IR’17).Google ScholarGoogle Scholar
  49. Nicola Tonellotto, Craig Macdonald, and Iadh Ounis. 2013. Efficient and effective retrieval using selective pruning. In Proceedings of the Sixth ACM International Conference on Web Search and Data Mining (WSDM’13). 63--72. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Andrew Trotman, Charles L. A. Clarke, Iadh Ounis, Shane Culpepper, Marc-Allen Cartright, and Shlomo Geva. 2012. Open Source Information Retrieval: A report on the SIGIR 2012 workshop. SIGIR Forum 46, 2 (2012), 95--101. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Andrew Trotman, Xiang-Fei Jia, and Matt Crane. 2012. Towards an efficient and effective search engine. In Proceedings of the SIGIR 2012 Workshop on Open Source Information Retrieval. 40--47.Google ScholarGoogle Scholar
  52. Andrew Trotman, Antti Puurula, and Blake Burgess. 2014. Improvements to BM25 and language models examined. In Proceedings of the 2014 Australasian Document Computing Symposium (ADCS’14). 58--66. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Zhucheng Tu, Matt Crane, Royal Sequiera, Junchen Zhang, and Jimmy Lin. 2017. An exploration of approaches to integrating neural reranking models in multi-stage ranking architectures. In Proceedings of the SIGIR 2017 Workshop on Neural Information Retrieval (Neu-IR’17).Google ScholarGoogle Scholar
  54. Howard Turtle, Yatish Hegde, and Steven A. Rowe. 2012. Yet another comparison of Lucene and Indri performance. In Proceedings of the SIGIR 2012 Workshop on Open Source Information Retrieval. 64--67.Google ScholarGoogle Scholar
  55. Lidan Wang, Jimmy Lin, and Donald Metzler. 2011. A cascade ranking model for efficient ranked retrieval. In Proceedings of the 34th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’11). 105--114. Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. Peilin Yang and Hui Fang. 2016. A reproducibility study of information retrieval models. In Proceedings of the 2nd ACM International Conference on the Theory of Information Retrieval (ICTIR’16). 77--86. Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. Peilin Yang, Hui Fang, and Jimmy Lin. 2017. Anserini: Enabling the use of Lucene for information retrieval research. In Proceedings of the 40th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’17). 1253--1256. Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. Wai Gen Yee, Michel Beigbeder, and Wray Buntine. 2006. SIGIR06 workshop report: Open source information retrieval systems (OSIR06). SIGIR Forum 40, 2 (2006), 61--65.Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. Chengxiang Zhai and John Lafferty. 2004. A study of smoothing methods for language models applied to information retrieval. ACM Trans. Info. Syst. 22, 2 (2004), 179--214. Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. Stefan Büttcher, Charles L. A. Clarke, and Gordon V. Cormack. 2010. Information Retrieval: Implementing and Evaluating Search Engines. MIT Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. Bodo Billerbeck, Adam Cannane, Abhijit Chattaraj, Nicholas Lester, William Webber, Hugh E. Williams, John Yiannis, and Justin Zobel. 2004. RMIT University at TREC 2004. In Proceedings of the 13th Text REtrieval Conference (TREC'04).Google ScholarGoogle Scholar

Index Terms

  1. Anserini: Reproducible Ranking Baselines Using Lucene

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          Full Access

          • Published in

            cover image Journal of Data and Information Quality
            Journal of Data and Information Quality  Volume 10, Issue 4
            Reproducibility in Information Retrieval:Tools and Infrastructures
            December 2018
            106 pages
            ISSN:1936-1955
            EISSN:1936-1963
            DOI:10.1145/3289400
            Issue’s Table of Contents

            Copyright © 2018 Owner/Author

            This work is licensed under a Creative Commons Attribution International 4.0 License.

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 29 October 2018
            • Accepted: 1 July 2018
            • Revised: 1 April 2018
            • Received: 1 October 2017
            Published in jdiq Volume 10, Issue 4

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article
            • Research
            • Refereed

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader