ABSTRACT
Evidence derived from passages that closely represent likely answers to a posed query can be useful input to the ranking process. Based on a novel use of Community Question Answering data, we present an approach for the creation of such passages. A general framework for extracting answer passages and estimating their quality is proposed, and this evidence is integrated into ranking models. Our experiments on two web collections show that such quality estimates from answer passages provide a strong indication of document relevance and compare favorably to previous passage-based methods. Combining such evidence can significantly improve over a set of state-of-the-art ranking models, including Quality-Biased Ranking, External Expansion, and a combination of both. A final ranking model that incorporates all quality estimates achieves further improvements on both collections.
- Eugene Agichtein, Eric Brill, and Susan Dumais . 2006. Improving web search ranking by incorporating user behavior information Proc. of SIGIR. ACM, 19--26. Google ScholarDigital Library
- Gianni Amati and Cornelis Joost van Rijsbergen . 2002. Probabilistic models of information retrieval based on measuring the divergence from randomness. ACM Trans. Inf. Syst. Vol. 20, 4 (2002), 357--389. Google ScholarDigital Library
- Michael Bendersky, W. Bruce Croft, and Yanlei Diao . 2011. Quality-biased Ranking of Web Documents. In Proc. of WSDM. ACM, 95--104. Google ScholarDigital Library
- Michael Bendersky and Oren Kurland . 2008. Utilizing passage-based language models for document retrieval Proc. of ECIR. Springer, 162--174. Google ScholarDigital Library
- Michael Bendersky, Donald Metzler, and W. Bruce Croft . 2010. Learning Concept Importance Using a Weighted Dependence Model Proc. of WSDM. ACM, 31--40. Google ScholarDigital Library
- Jiang Bian, Yandong Liu, Eugene Agichtein, and Hongyuan Zha . 2008. Finding the right facts in the crowd: factoid question answering over social media Proc. of WWW. ACM, 467--476. Google ScholarDigital Library
- Piotr Bojanowski, Edouard Grave, Armand Joulin, and Tomas Mikolov . 2016. Enriching Word Vectors with Subword Information. arXiv preprint arXiv:1607.04606 (2016).Google Scholar
- James P. Callan . 1994. Passage-level Evidence in Document Retrieval Proc. of SIGIR. Springer-Verlag New York, Inc., 302--310. Google ScholarDigital Library
- Danqi Chen, Adam Fisch, Jason Weston, and Antoine Bordes . 2017. Reading Wikipedia to Answer Open-Domain Questions. Proc. of ACL. Association for Computational Linguistics, 1870--1879.Google Scholar
- Gordon V. Cormack, Mark D. Smucker, and Charles L. Clarke . 2011. Efficient and Effective Spam Filtering and Re-ranking for Large Web Datasets. Inf. Retr., Vol. 14, 5 (Oct. . 2011), 441--465. Google ScholarDigital Library
- W Bruce Croft . 2002. Combining approaches to information retrieval. Proc. of ECIR. Springer, 1--36.Google Scholar
- Fernando Diaz and Donald Metzler . 2006. Improving the Estimation of Relevance Models Using Large External Corpora Proc. of SIGIR. ACM, 154--161. Google ScholarDigital Library
- Dan Gillick and Benoit Favre . 2009. A scalable global model for summarization. In Proceedings of the Workshop on Integer Linear Programming for Natural Langauge Processing. Association for Computational Linguistics, 10--18. Google ScholarDigital Library
- Jing He, Pablo Duboue, and Jian-Yun Nie . 2012. Bridging the Gap between Intrinsic and Perceived Relevance in Snippet Generation. In Proc. of COLING. 1129--1146.Google Scholar
- Karl Moritz Hermann, Tomas Kocisky, Edward Grefenstette, Lasse Espeholt, Will Kay, Mustafa Suleyman, and Phil Blunsom . 2015. Teaching machines to read and comprehend. In Advances in Neural Information Processing Systems. 1693--1701. Google ScholarDigital Library
- Kalervo J"arvelin and Jaana Kek"al"ainen . 2002. Cumulated gain-based evaluation of IR techniques. ACM Trans. Inf. Syst. Vol. 20, 4 (2002), 422--446. Google ScholarDigital Library
- Mostafa Keikha, Jae Hyun Park, and W Bruce Croft . 2014. Evaluating answer passages using summarization measures Proc. of SIGIR. ACM, 963--966. Google ScholarDigital Library
- Diederik P Kingma and Jimmy Ba . 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).Google Scholar
- Eyal Krikon and Oren Kurland . 2011. A study of the integration of passage-, document-, and cluster-based information for re-ranking search results. Information Retrieval Vol. 14, 6 (2011), 593--616. Google ScholarDigital Library
- Oren Kurland and Lillian Lee . 2010. PageRank without hyperlinks: Structural reranking using links induced by language models. ACM TOIS, Vol. 28, 4 (2010), 18. Google ScholarDigital Library
- Saar Kuzi, Anna Shtok, and Oren Kurland . 2016. Query expansion using word embeddings. In Proc. of CIKM. ACM, 1929--1932. Google ScholarDigital Library
- Adenike M. Lam-Adesina and Gareth J. F. Jones . 2001. Applying Summarization Techniques for Term Selection in Relevance Feedback Proc. of SIGIR. ACM, 1--9. Google ScholarDigital Library
- Victor Lavrenko and W Bruce Croft . 2001. Relevance based language models. In Proc. of SIGIR. ACM, 120--127. Google ScholarDigital Library
- Hui Lin and Jeff Bilmes . 2010. Multi-document summarization via budgeted maximization of submodular functions Proc. of HLT/NAACL. Association for Computational Linguistics, 912--920. Google ScholarDigital Library
- Qiaoling Liu, Eugene Agichtein, Gideon Dror, Evgeniy Gabrilovich, Yoelle Maarek, Dan Pelleg, and Idan Szpektor . 2011. Predicting web searcher satisfaction with existing community-based answers Proc. of SIGIR. ACM, 415--424. Google ScholarDigital Library
- Yandong Liu, Jiang Bian, and Eugene Agichtein . 2008. Predicting information seeker satisfaction in community question answering Proc. of SIGIR. ACM, 483--490. Google ScholarDigital Library
- Craig Macdonald, Rodrygo L.T. Santos, and Iadh Ounis . 2012. On the Usefulness of Query Features for Learning to Rank Proc. of CIKM. ACM, 2559--2562. Google ScholarDigital Library
- Edgar Meij and Maarten de Rijke . 2010. Supervised query modeling using wikipedia. In Proc. of SIGIR. ACM, 875--876. Google ScholarDigital Library
- Donald Metzler and W. Bruce Croft . 2005. A Markov Random Field Model for Term Dependencies Proc. of SIGIR. ACM, 472--479. Google ScholarDigital Library
- Donald Metzler and Tapas Kanungo . 2008. Machine Learned Sentence Selection Strategies for Query-Biased Summarization. In SIGIR Learning to Rank Workshop.Google Scholar
- Bhaskar Mitra and Nick Craswell . 2017. Neural Models for Information Retrieval. arXiv preprint arXiv:1705.01509 (2017).Google Scholar
- Alistair Moffat and Justin Zobel . 2008. Rank-biased precision for measurement of retrieval effectiveness. ACM Trans. Inf. Syst. Vol. 27, 1 (2008), 2. Google ScholarDigital Library
- John O'Connor . 1980. Answer-passage retrieval by text searching. Journal of the Association for Information Science and Technology, Vol. 31, 4 (1980), 227--239.Google Scholar
- Jay M Ponte and W Bruce Croft . 1998. A language modeling approach to information retrieval Proc. of SIGIR. ACM, 275--281. Google ScholarDigital Library
- Dragomir Radev, Timothy Allison, Sasha Blair-Goldensohn, John Blitzer, Arda cCelebi, Stanko Dimitrov, Elliott Drabek, Ali Hakim, Wai Lam, Danyu Liu, Jahna Otterbacher, Hong Qi, Horacio Saggion, Simone Teufel, Michael Topper, Adam Winkel, and Zhu Zhang . 2004. MEAD -- A platform for multidocument multilingual text summarization Proc. of LREC.Google Scholar
- Fiana Raiber and Oren Kurland . 2013. Ranking document clusters using markov random fields Proc. of SIGIR. ACM, 333--342. Google ScholarDigital Library
- Pranav Rajpurkar, Jian Zhang, Konstantin Lopyrev, and Percy Liang . 2016. Squad: 100,000Google Scholar
- questions for machine comprehension of text. arXiv preprint arXiv:1606.05250 (2016).Google Scholar
- Stephen E Robertson . 1997. Overview of the okapi projects. Journal of Documentation Vol. 53, 1 (1997), 3--7.Google ScholarCross Ref
- Joseph John Rocchio . 1971. Relevance feedback in information retrieval. The SMART retrieval system: experiments in automatic document processing (1971), 313--323.Google Scholar
- Tetsuya Sakai and Karen Sparck-Jones . 2001. Generic Summaries for Indexing in Information Retrieval Proc. of SIGIR. ACM, 190--198. Google ScholarDigital Library
- Chirag Shah and Jefferey Pomerantz . 2010. Evaluating and predicting answer quality in community QA Proc. of SIGIR. ACM, 411--418. Google ScholarDigital Library
- Hiroya Takamura and Manabu Okumura . 2009. Text summarization model based on maximum coverage problem and its variant Proc. of EACL. Association for Computational Linguistics, 781--789. Google ScholarDigital Library
- Anastasios Tombros and Mark Sanderson . 1998. Advantages of Query Biased Summaries in Information Retrieval Proc. of SIGIR. ACM, 2--10. Google ScholarDigital Library
- Ingmar Weber, Antti Ukkonen, and Aris Gionis . 2012. Answers, not links: extracting tips from yahoo! answers to address how-to web queries Proc. of WSDM. ACM, 613--622. Google ScholarDigital Library
- Wouter Weerkamp, Krisztian Balog, and Maarten de Rijke . 2012. Exploiting External Collections for Query Expansion. ACM Trans. Web, Vol. 6, 4 (2012), 1--29. Google ScholarDigital Library
- Ross Wilkinson . 1994. Effective Retrieval of Structured Documents. Proc. of SIGIR. Springer-Verlag New York, Inc., 311--317. Google ScholarDigital Library
- Kristian Woodsend and Mirella Lapata . 2012. Multiple aspect summarization using integer linear programming Proc. of EMNLP. Association for Computational Linguistics, 233--243. Google ScholarDigital Library
- Chenyan Xiong, Jamie Callan, and Tie-Yan Liu . 2017. Word-Entity Duet Representations for Document Ranking Proc. of SIGIR. ACM, 763--772. Google ScholarDigital Library
- Xiaobing Xue, Jiwoon Jeon, and W Bruce Croft . 2008. Retrieval models for question and answer archives. Proc. of SIGIR. ACM, 475--482. Google ScholarDigital Library
- Zi Yang, Keke Cai, Jie Tang, Li Zhang, Zhong Su, and Juanzi Li . 2011. Social context summarization. In Proc. of SIGIR. ACM, 255--264. Google ScholarDigital Library
- Evi Yulianti, Ruey-Cheng Chen, Falk Scholer, W. Bruce Croft, and Mark Sanderson . 2018. Document summarization for answering non-factoid queries. IEEE Trans. Knowl. Data Eng. Vol. 30, 1 (2018), 15--28.Google ScholarCross Ref
- Hamed Zamani and W Bruce Croft . 2016. Embedding-based query language models. In Proc. of ICTIR. ACM, 147--156. Google ScholarDigital Library
- Chengxiang Zhai and John Lafferty . 2004. A Study of Smoothing Methods for Language Models Applied to Information Retrieval. ACM Trans. Inf. Syst. Vol. 22, 2 (2004), 179--214. Google ScholarDigital Library
Index Terms
- Ranking Documents by Answer-Passage Quality
Recommendations
Leveraging Passage-level Cumulative Gain for Document Ranking
WWW '20: Proceedings of The Web Conference 2020Document ranking is one of the most studied but challenging problems in information retrieval (IR) research. A number of existing document ranking models capture relevance signals at the whole document level. Recently, more and more research has begun ...
Context-sensitive document ranking
Ranking is a main research issue in IR-styled keyword search over a set of documents. In this paper, we study a new keyword search problem, called context-sensitive document ranking, which is to rank documents with an additional context that provides ...
Leveraging Multi-view Inter-passage Interactions for Neural Document Ranking
WSDM '22: Proceedings of the Fifteenth ACM International Conference on Web Search and Data MiningThe configuration of 512 window size prevents transformers from being directly applicable to document ranking that requires larger context. Hence, recent works propose to estimate document relevance with fine-grained passage-level relevance signals. A ...
Comments