research-article

Ranking Documents by Answer-Passage Quality

Authors:
Evi Yulianti

RMIT University, Melbourne, Australia

RMIT University, Melbourne, Australia
View Profile

,
Ruey-Cheng Chen

SEEK Ltd., Melbourne, Australia

SEEK Ltd., Melbourne, Australia
View Profile

,
Falk Scholer

RMIT University, Melbourne, Australia

RMIT University, Melbourne, Australia
View Profile

,
W. Bruce Croft

RMIT University, Melbourne, Australia

RMIT University, Melbourne, Australia
View Profile

,
Mark Sanderson

RMIT University, Melbourne, Australia

RMIT University, Melbourne, Australia
View Profile

SIGIR '18: The 41st International ACM SIGIR Conference on Research & Development in Information RetrievalJune 2018Pages 335–344https://doi.org/10.1145/3209978.3210028

Published:27 June 2018Publication History

SIGIR '18: The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval

Pages 335–344

ABSTRACT

Evidence derived from passages that closely represent likely answers to a posed query can be useful input to the ranking process. Based on a novel use of Community Question Answering data, we present an approach for the creation of such passages. A general framework for extracting answer passages and estimating their quality is proposed, and this evidence is integrated into ranking models. Our experiments on two web collections show that such quality estimates from answer passages provide a strong indication of document relevance and compare favorably to previous passage-based methods. Combining such evidence can significantly improve over a set of state-of-the-art ranking models, including Quality-Biased Ranking, External Expansion, and a combination of both. A final ranking model that incorporates all quality estimates achieves further improvements on both collections.

References

Eugene Agichtein, Eric Brill, and Susan Dumais . 2006. Improving web search ranking by incorporating user behavior information Proc. of SIGIR. ACM, 19--26. Google ScholarDigital Library
Gianni Amati and Cornelis Joost van Rijsbergen . 2002. Probabilistic models of information retrieval based on measuring the divergence from randomness. ACM Trans. Inf. Syst. Vol. 20, 4 (2002), 357--389. Google ScholarDigital Library
Michael Bendersky, W. Bruce Croft, and Yanlei Diao . 2011. Quality-biased Ranking of Web Documents. In Proc. of WSDM. ACM, 95--104. Google ScholarDigital Library
Michael Bendersky and Oren Kurland . 2008. Utilizing passage-based language models for document retrieval Proc. of ECIR. Springer, 162--174. Google ScholarDigital Library
Michael Bendersky, Donald Metzler, and W. Bruce Croft . 2010. Learning Concept Importance Using a Weighted Dependence Model Proc. of WSDM. ACM, 31--40. Google ScholarDigital Library
Jiang Bian, Yandong Liu, Eugene Agichtein, and Hongyuan Zha . 2008. Finding the right facts in the crowd: factoid question answering over social media Proc. of WWW. ACM, 467--476. Google ScholarDigital Library
Piotr Bojanowski, Edouard Grave, Armand Joulin, and Tomas Mikolov . 2016. Enriching Word Vectors with Subword Information. arXiv preprint arXiv:1607.04606 (2016).Google Scholar
James P. Callan . 1994. Passage-level Evidence in Document Retrieval Proc. of SIGIR. Springer-Verlag New York, Inc., 302--310. Google ScholarDigital Library
Danqi Chen, Adam Fisch, Jason Weston, and Antoine Bordes . 2017. Reading Wikipedia to Answer Open-Domain Questions. Proc. of ACL. Association for Computational Linguistics, 1870--1879.Google Scholar
Gordon V. Cormack, Mark D. Smucker, and Charles L. Clarke . 2011. Efficient and Effective Spam Filtering and Re-ranking for Large Web Datasets. Inf. Retr., Vol. 14, 5 (Oct. . 2011), 441--465. Google ScholarDigital Library
W Bruce Croft . 2002. Combining approaches to information retrieval. Proc. of ECIR. Springer, 1--36.Google Scholar
Fernando Diaz and Donald Metzler . 2006. Improving the Estimation of Relevance Models Using Large External Corpora Proc. of SIGIR. ACM, 154--161. Google ScholarDigital Library
Dan Gillick and Benoit Favre . 2009. A scalable global model for summarization. In Proceedings of the Workshop on Integer Linear Programming for Natural Langauge Processing. Association for Computational Linguistics, 10--18. Google ScholarDigital Library
Jing He, Pablo Duboue, and Jian-Yun Nie . 2012. Bridging the Gap between Intrinsic and Perceived Relevance in Snippet Generation. In Proc. of COLING. 1129--1146.Google Scholar
Karl Moritz Hermann, Tomas Kocisky, Edward Grefenstette, Lasse Espeholt, Will Kay, Mustafa Suleyman, and Phil Blunsom . 2015. Teaching machines to read and comprehend. In Advances in Neural Information Processing Systems. 1693--1701. Google ScholarDigital Library
Kalervo J"arvelin and Jaana Kek"al"ainen . 2002. Cumulated gain-based evaluation of IR techniques. ACM Trans. Inf. Syst. Vol. 20, 4 (2002), 422--446. Google ScholarDigital Library
Mostafa Keikha, Jae Hyun Park, and W Bruce Croft . 2014. Evaluating answer passages using summarization measures Proc. of SIGIR. ACM, 963--966. Google ScholarDigital Library
Diederik P Kingma and Jimmy Ba . 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).Google Scholar
Eyal Krikon and Oren Kurland . 2011. A study of the integration of passage-, document-, and cluster-based information for re-ranking search results. Information Retrieval Vol. 14, 6 (2011), 593--616. Google ScholarDigital Library
Oren Kurland and Lillian Lee . 2010. PageRank without hyperlinks: Structural reranking using links induced by language models. ACM TOIS, Vol. 28, 4 (2010), 18. Google ScholarDigital Library
Saar Kuzi, Anna Shtok, and Oren Kurland . 2016. Query expansion using word embeddings. In Proc. of CIKM. ACM, 1929--1932. Google ScholarDigital Library
Adenike M. Lam-Adesina and Gareth J. F. Jones . 2001. Applying Summarization Techniques for Term Selection in Relevance Feedback Proc. of SIGIR. ACM, 1--9. Google ScholarDigital Library
Victor Lavrenko and W Bruce Croft . 2001. Relevance based language models. In Proc. of SIGIR. ACM, 120--127. Google ScholarDigital Library
Hui Lin and Jeff Bilmes . 2010. Multi-document summarization via budgeted maximization of submodular functions Proc. of HLT/NAACL. Association for Computational Linguistics, 912--920. Google ScholarDigital Library
Qiaoling Liu, Eugene Agichtein, Gideon Dror, Evgeniy Gabrilovich, Yoelle Maarek, Dan Pelleg, and Idan Szpektor . 2011. Predicting web searcher satisfaction with existing community-based answers Proc. of SIGIR. ACM, 415--424. Google ScholarDigital Library
Yandong Liu, Jiang Bian, and Eugene Agichtein . 2008. Predicting information seeker satisfaction in community question answering Proc. of SIGIR. ACM, 483--490. Google ScholarDigital Library
Craig Macdonald, Rodrygo L.T. Santos, and Iadh Ounis . 2012. On the Usefulness of Query Features for Learning to Rank Proc. of CIKM. ACM, 2559--2562. Google ScholarDigital Library
Edgar Meij and Maarten de Rijke . 2010. Supervised query modeling using wikipedia. In Proc. of SIGIR. ACM, 875--876. Google ScholarDigital Library
Donald Metzler and W. Bruce Croft . 2005. A Markov Random Field Model for Term Dependencies Proc. of SIGIR. ACM, 472--479. Google ScholarDigital Library
Donald Metzler and Tapas Kanungo . 2008. Machine Learned Sentence Selection Strategies for Query-Biased Summarization. In SIGIR Learning to Rank Workshop.Google Scholar
Bhaskar Mitra and Nick Craswell . 2017. Neural Models for Information Retrieval. arXiv preprint arXiv:1705.01509 (2017).Google Scholar
Alistair Moffat and Justin Zobel . 2008. Rank-biased precision for measurement of retrieval effectiveness. ACM Trans. Inf. Syst. Vol. 27, 1 (2008), 2. Google ScholarDigital Library
John O'Connor . 1980. Answer-passage retrieval by text searching. Journal of the Association for Information Science and Technology, Vol. 31, 4 (1980), 227--239.Google Scholar
Jay M Ponte and W Bruce Croft . 1998. A language modeling approach to information retrieval Proc. of SIGIR. ACM, 275--281. Google ScholarDigital Library
Dragomir Radev, Timothy Allison, Sasha Blair-Goldensohn, John Blitzer, Arda cCelebi, Stanko Dimitrov, Elliott Drabek, Ali Hakim, Wai Lam, Danyu Liu, Jahna Otterbacher, Hong Qi, Horacio Saggion, Simone Teufel, Michael Topper, Adam Winkel, and Zhu Zhang . 2004. MEAD -- A platform for multidocument multilingual text summarization Proc. of LREC.Google Scholar
Fiana Raiber and Oren Kurland . 2013. Ranking document clusters using markov random fields Proc. of SIGIR. ACM, 333--342. Google ScholarDigital Library
Pranav Rajpurkar, Jian Zhang, Konstantin Lopyrev, and Percy Liang . 2016. Squad: 100,000Google Scholar
questions for machine comprehension of text. arXiv preprint arXiv:1606.05250 (2016).Google Scholar
Stephen E Robertson . 1997. Overview of the okapi projects. Journal of Documentation Vol. 53, 1 (1997), 3--7.Google ScholarCross Ref
Joseph John Rocchio . 1971. Relevance feedback in information retrieval. The SMART retrieval system: experiments in automatic document processing (1971), 313--323.Google Scholar
Tetsuya Sakai and Karen Sparck-Jones . 2001. Generic Summaries for Indexing in Information Retrieval Proc. of SIGIR. ACM, 190--198. Google ScholarDigital Library
Chirag Shah and Jefferey Pomerantz . 2010. Evaluating and predicting answer quality in community QA Proc. of SIGIR. ACM, 411--418. Google ScholarDigital Library
Hiroya Takamura and Manabu Okumura . 2009. Text summarization model based on maximum coverage problem and its variant Proc. of EACL. Association for Computational Linguistics, 781--789. Google ScholarDigital Library
Anastasios Tombros and Mark Sanderson . 1998. Advantages of Query Biased Summaries in Information Retrieval Proc. of SIGIR. ACM, 2--10. Google ScholarDigital Library
Ingmar Weber, Antti Ukkonen, and Aris Gionis . 2012. Answers, not links: extracting tips from yahoo! answers to address how-to web queries Proc. of WSDM. ACM, 613--622. Google ScholarDigital Library
Wouter Weerkamp, Krisztian Balog, and Maarten de Rijke . 2012. Exploiting External Collections for Query Expansion. ACM Trans. Web, Vol. 6, 4 (2012), 1--29. Google ScholarDigital Library
Ross Wilkinson . 1994. Effective Retrieval of Structured Documents. Proc. of SIGIR. Springer-Verlag New York, Inc., 311--317. Google ScholarDigital Library
Kristian Woodsend and Mirella Lapata . 2012. Multiple aspect summarization using integer linear programming Proc. of EMNLP. Association for Computational Linguistics, 233--243. Google ScholarDigital Library
Chenyan Xiong, Jamie Callan, and Tie-Yan Liu . 2017. Word-Entity Duet Representations for Document Ranking Proc. of SIGIR. ACM, 763--772. Google ScholarDigital Library
Xiaobing Xue, Jiwoon Jeon, and W Bruce Croft . 2008. Retrieval models for question and answer archives. Proc. of SIGIR. ACM, 475--482. Google ScholarDigital Library
Zi Yang, Keke Cai, Jie Tang, Li Zhang, Zhong Su, and Juanzi Li . 2011. Social context summarization. In Proc. of SIGIR. ACM, 255--264. Google ScholarDigital Library
Evi Yulianti, Ruey-Cheng Chen, Falk Scholer, W. Bruce Croft, and Mark Sanderson . 2018. Document summarization for answering non-factoid queries. IEEE Trans. Knowl. Data Eng. Vol. 30, 1 (2018), 15--28.Google ScholarCross Ref
Hamed Zamani and W Bruce Croft . 2016. Embedding-based query language models. In Proc. of ICTIR. ACM, 147--156. Google ScholarDigital Library
Chengxiang Zhai and John Lafferty . 2004. A Study of Smoothing Methods for Language Models Applied to Information Retrieval. ACM Trans. Inf. Syst. Vol. 22, 2 (2004), 179--214. Google ScholarDigital Library

Index Terms

Ranking Documents by Answer-Passage Quality
1. Information systems
  1. Information retrieval
    1. Retrieval models and ranking

Recommendations

Leveraging Passage-level Cumulative Gain for Document Ranking
WWW '20: Proceedings of The Web Conference 2020

Document ranking is one of the most studied but challenging problems in information retrieval (IR) research. A number of existing document ranking models capture relevance signals at the whole document level. Recently, more and more research has begun ...
Read More
Context-sensitive document ranking

Ranking is a main research issue in IR-styled keyword search over a set of documents. In this paper, we study a new keyword search problem, called context-sensitive document ranking, which is to rank documents with an additional context that provides ...
Read More
Leveraging Multi-view Inter-passage Interactions for Neural Document Ranking
WSDM '22: Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining

The configuration of 512 window size prevents transformers from being directly applicable to document ranking that requires larger context. Hence, recent works propose to estimate document relevance with fine-grained passage-level relevance signals. A ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SIGIR '18: The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval
June 2018
1509 pages
ISBN:9781450356572
DOI:10.1145/3209978
General Chairs:
Kevyn Collins-Thompson
University of Michigan, United States
,
Qiaozhu Mei
University of Michigan, United States
,
Program Chairs:
Brian Davison
Lehigh University, United States
,
Yiqun Liu
Tsinghua University, China
,
Emine Yilmaz
University College London, United Kingdom
Copyright © 2018 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 27 June 2018
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
answer passages
document ranking
quality estimation
Qualifiers
- research-article
Conference

Acceptance Rates
SIGIR '18 Paper Acceptance Rate86of409submissions,21%Overall Acceptance Rate792of3,983submissions,20%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 7
  Total Citations
  View Citations
- 452
  Total Downloads
- Downloads (Last 12 months)20
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Ranking Documents by Answer-Passage Quality

SIGIR '18: The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval

ABSTRACT

References

Cited By

Index Terms

Recommendations

Leveraging Passage-level Cumulative Gain for Document Ranking

Context-sensitive document ranking

Leveraging Multi-view Inter-passage Interactions for Neural Document Ranking