skip to main content
10.1145/3209978.3210028acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
research-article

Ranking Documents by Answer-Passage Quality

Published:27 June 2018Publication History

ABSTRACT

Evidence derived from passages that closely represent likely answers to a posed query can be useful input to the ranking process. Based on a novel use of Community Question Answering data, we present an approach for the creation of such passages. A general framework for extracting answer passages and estimating their quality is proposed, and this evidence is integrated into ranking models. Our experiments on two web collections show that such quality estimates from answer passages provide a strong indication of document relevance and compare favorably to previous passage-based methods. Combining such evidence can significantly improve over a set of state-of-the-art ranking models, including Quality-Biased Ranking, External Expansion, and a combination of both. A final ranking model that incorporates all quality estimates achieves further improvements on both collections.

References

  1. Eugene Agichtein, Eric Brill, and Susan Dumais . 2006. Improving web search ranking by incorporating user behavior information Proc. of SIGIR. ACM, 19--26. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Gianni Amati and Cornelis Joost van Rijsbergen . 2002. Probabilistic models of information retrieval based on measuring the divergence from randomness. ACM Trans. Inf. Syst. Vol. 20, 4 (2002), 357--389. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Michael Bendersky, W. Bruce Croft, and Yanlei Diao . 2011. Quality-biased Ranking of Web Documents. In Proc. of WSDM. ACM, 95--104. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Michael Bendersky and Oren Kurland . 2008. Utilizing passage-based language models for document retrieval Proc. of ECIR. Springer, 162--174. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Michael Bendersky, Donald Metzler, and W. Bruce Croft . 2010. Learning Concept Importance Using a Weighted Dependence Model Proc. of WSDM. ACM, 31--40. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Jiang Bian, Yandong Liu, Eugene Agichtein, and Hongyuan Zha . 2008. Finding the right facts in the crowd: factoid question answering over social media Proc. of WWW. ACM, 467--476. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Piotr Bojanowski, Edouard Grave, Armand Joulin, and Tomas Mikolov . 2016. Enriching Word Vectors with Subword Information. arXiv preprint arXiv:1607.04606 (2016).Google ScholarGoogle Scholar
  8. James P. Callan . 1994. Passage-level Evidence in Document Retrieval Proc. of SIGIR. Springer-Verlag New York, Inc., 302--310. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Danqi Chen, Adam Fisch, Jason Weston, and Antoine Bordes . 2017. Reading Wikipedia to Answer Open-Domain Questions. Proc. of ACL. Association for Computational Linguistics, 1870--1879.Google ScholarGoogle Scholar
  10. Gordon V. Cormack, Mark D. Smucker, and Charles L. Clarke . 2011. Efficient and Effective Spam Filtering and Re-ranking for Large Web Datasets. Inf. Retr., Vol. 14, 5 (Oct. . 2011), 441--465. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. W Bruce Croft . 2002. Combining approaches to information retrieval. Proc. of ECIR. Springer, 1--36.Google ScholarGoogle Scholar
  12. Fernando Diaz and Donald Metzler . 2006. Improving the Estimation of Relevance Models Using Large External Corpora Proc. of SIGIR. ACM, 154--161. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Dan Gillick and Benoit Favre . 2009. A scalable global model for summarization. In Proceedings of the Workshop on Integer Linear Programming for Natural Langauge Processing. Association for Computational Linguistics, 10--18. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Jing He, Pablo Duboue, and Jian-Yun Nie . 2012. Bridging the Gap between Intrinsic and Perceived Relevance in Snippet Generation. In Proc. of COLING. 1129--1146.Google ScholarGoogle Scholar
  15. Karl Moritz Hermann, Tomas Kocisky, Edward Grefenstette, Lasse Espeholt, Will Kay, Mustafa Suleyman, and Phil Blunsom . 2015. Teaching machines to read and comprehend. In Advances in Neural Information Processing Systems. 1693--1701. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Kalervo J"arvelin and Jaana Kek"al"ainen . 2002. Cumulated gain-based evaluation of IR techniques. ACM Trans. Inf. Syst. Vol. 20, 4 (2002), 422--446. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Mostafa Keikha, Jae Hyun Park, and W Bruce Croft . 2014. Evaluating answer passages using summarization measures Proc. of SIGIR. ACM, 963--966. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Diederik P Kingma and Jimmy Ba . 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).Google ScholarGoogle Scholar
  19. Eyal Krikon and Oren Kurland . 2011. A study of the integration of passage-, document-, and cluster-based information for re-ranking search results. Information Retrieval Vol. 14, 6 (2011), 593--616. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Oren Kurland and Lillian Lee . 2010. PageRank without hyperlinks: Structural reranking using links induced by language models. ACM TOIS, Vol. 28, 4 (2010), 18. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Saar Kuzi, Anna Shtok, and Oren Kurland . 2016. Query expansion using word embeddings. In Proc. of CIKM. ACM, 1929--1932. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Adenike M. Lam-Adesina and Gareth J. F. Jones . 2001. Applying Summarization Techniques for Term Selection in Relevance Feedback Proc. of SIGIR. ACM, 1--9. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Victor Lavrenko and W Bruce Croft . 2001. Relevance based language models. In Proc. of SIGIR. ACM, 120--127. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Hui Lin and Jeff Bilmes . 2010. Multi-document summarization via budgeted maximization of submodular functions Proc. of HLT/NAACL. Association for Computational Linguistics, 912--920. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Qiaoling Liu, Eugene Agichtein, Gideon Dror, Evgeniy Gabrilovich, Yoelle Maarek, Dan Pelleg, and Idan Szpektor . 2011. Predicting web searcher satisfaction with existing community-based answers Proc. of SIGIR. ACM, 415--424. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Yandong Liu, Jiang Bian, and Eugene Agichtein . 2008. Predicting information seeker satisfaction in community question answering Proc. of SIGIR. ACM, 483--490. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Craig Macdonald, Rodrygo L.T. Santos, and Iadh Ounis . 2012. On the Usefulness of Query Features for Learning to Rank Proc. of CIKM. ACM, 2559--2562. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Edgar Meij and Maarten de Rijke . 2010. Supervised query modeling using wikipedia. In Proc. of SIGIR. ACM, 875--876. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Donald Metzler and W. Bruce Croft . 2005. A Markov Random Field Model for Term Dependencies Proc. of SIGIR. ACM, 472--479. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Donald Metzler and Tapas Kanungo . 2008. Machine Learned Sentence Selection Strategies for Query-Biased Summarization. In SIGIR Learning to Rank Workshop.Google ScholarGoogle Scholar
  31. Bhaskar Mitra and Nick Craswell . 2017. Neural Models for Information Retrieval. arXiv preprint arXiv:1705.01509 (2017).Google ScholarGoogle Scholar
  32. Alistair Moffat and Justin Zobel . 2008. Rank-biased precision for measurement of retrieval effectiveness. ACM Trans. Inf. Syst. Vol. 27, 1 (2008), 2. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. John O'Connor . 1980. Answer-passage retrieval by text searching. Journal of the Association for Information Science and Technology, Vol. 31, 4 (1980), 227--239.Google ScholarGoogle Scholar
  34. Jay M Ponte and W Bruce Croft . 1998. A language modeling approach to information retrieval Proc. of SIGIR. ACM, 275--281. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Dragomir Radev, Timothy Allison, Sasha Blair-Goldensohn, John Blitzer, Arda cCelebi, Stanko Dimitrov, Elliott Drabek, Ali Hakim, Wai Lam, Danyu Liu, Jahna Otterbacher, Hong Qi, Horacio Saggion, Simone Teufel, Michael Topper, Adam Winkel, and Zhu Zhang . 2004. MEAD -- A platform for multidocument multilingual text summarization Proc. of LREC.Google ScholarGoogle Scholar
  36. Fiana Raiber and Oren Kurland . 2013. Ranking document clusters using markov random fields Proc. of SIGIR. ACM, 333--342. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Pranav Rajpurkar, Jian Zhang, Konstantin Lopyrev, and Percy Liang . 2016. Squad: 100,000Google ScholarGoogle Scholar
  38. questions for machine comprehension of text. arXiv preprint arXiv:1606.05250 (2016).Google ScholarGoogle Scholar
  39. Stephen E Robertson . 1997. Overview of the okapi projects. Journal of Documentation Vol. 53, 1 (1997), 3--7.Google ScholarGoogle ScholarCross RefCross Ref
  40. Joseph John Rocchio . 1971. Relevance feedback in information retrieval. The SMART retrieval system: experiments in automatic document processing (1971), 313--323.Google ScholarGoogle Scholar
  41. Tetsuya Sakai and Karen Sparck-Jones . 2001. Generic Summaries for Indexing in Information Retrieval Proc. of SIGIR. ACM, 190--198. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Chirag Shah and Jefferey Pomerantz . 2010. Evaluating and predicting answer quality in community QA Proc. of SIGIR. ACM, 411--418. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Hiroya Takamura and Manabu Okumura . 2009. Text summarization model based on maximum coverage problem and its variant Proc. of EACL. Association for Computational Linguistics, 781--789. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Anastasios Tombros and Mark Sanderson . 1998. Advantages of Query Biased Summaries in Information Retrieval Proc. of SIGIR. ACM, 2--10. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Ingmar Weber, Antti Ukkonen, and Aris Gionis . 2012. Answers, not links: extracting tips from yahoo! answers to address how-to web queries Proc. of WSDM. ACM, 613--622. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Wouter Weerkamp, Krisztian Balog, and Maarten de Rijke . 2012. Exploiting External Collections for Query Expansion. ACM Trans. Web, Vol. 6, 4 (2012), 1--29. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Ross Wilkinson . 1994. Effective Retrieval of Structured Documents. Proc. of SIGIR. Springer-Verlag New York, Inc., 311--317. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Kristian Woodsend and Mirella Lapata . 2012. Multiple aspect summarization using integer linear programming Proc. of EMNLP. Association for Computational Linguistics, 233--243. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Chenyan Xiong, Jamie Callan, and Tie-Yan Liu . 2017. Word-Entity Duet Representations for Document Ranking Proc. of SIGIR. ACM, 763--772. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Xiaobing Xue, Jiwoon Jeon, and W Bruce Croft . 2008. Retrieval models for question and answer archives. Proc. of SIGIR. ACM, 475--482. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Zi Yang, Keke Cai, Jie Tang, Li Zhang, Zhong Su, and Juanzi Li . 2011. Social context summarization. In Proc. of SIGIR. ACM, 255--264. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Evi Yulianti, Ruey-Cheng Chen, Falk Scholer, W. Bruce Croft, and Mark Sanderson . 2018. Document summarization for answering non-factoid queries. IEEE Trans. Knowl. Data Eng. Vol. 30, 1 (2018), 15--28.Google ScholarGoogle ScholarCross RefCross Ref
  53. Hamed Zamani and W Bruce Croft . 2016. Embedding-based query language models. In Proc. of ICTIR. ACM, 147--156. Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. Chengxiang Zhai and John Lafferty . 2004. A Study of Smoothing Methods for Language Models Applied to Information Retrieval. ACM Trans. Inf. Syst. Vol. 22, 2 (2004), 179--214. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Ranking Documents by Answer-Passage Quality

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      SIGIR '18: The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval
      June 2018
      1509 pages
      ISBN:9781450356572
      DOI:10.1145/3209978

      Copyright © 2018 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 27 June 2018

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      SIGIR '18 Paper Acceptance Rate86of409submissions,21%Overall Acceptance Rate792of3,983submissions,20%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader