skip to main content
10.1145/2187836.2187939acmotherconferencesArticle/Chapter ViewAbstractPublication PagesthewebconfConference Proceedingsconference-collections
research-article

Learning from the past: answering new questions with past answers

Published: 16 April 2012 Publication History

Abstract

Community-based Question Answering sites, such as Yahoo! Answers or Baidu Zhidao, allow users to get answers to complex, detailed and personal questions from other users. However, since answering a question depends on the ability and willingness of users to address the asker's needs, a significant fraction of the questions remain unanswered. We measured that in Yahoo! Answers, this fraction represents 15% of all incoming English questions. At the same time, we discovered that around 25% of questions in certain categories are recurrent, at least at the question-title level, over a period of one year.
We attempt to reduce the rate of unanswered questions in Yahoo! Answers by reusing the large repository of past resolved questions, openly available on the site. More specifically, we estimate the probability whether certain new questions can be satisfactorily answered by a best answer from the past, using a statistical model specifically trained for this task. We leverage concepts and methods from query-performance prediction and natural language processing in order to extract a wide range of features for our model. The key challenge here is to achieve a level of quality similar to the one provided by the best human answerers.
We evaluated our algorithm on offline data extracted from Yahoo! Answers, but more interestingly, also on online data by using three "live" answering robots that automatically provide past answers to new questions when a certain degree of confidence is reached. We report the success rate of these robots in three active Yahoo! Answers categories in terms of both accuracy, coverage and askers' satisfaction. This work presents a first attempt, to the best of our knowledge, of automatic question answering to questions of social nature, by reusing past answers of high quality.

References

[1]
E. Agichtein, S. Lawrence, and L. Gravano. Learning search engine specific query transformations for question answering. In WWW, 2001.
[2]
E. Agichtein, Y. Liu, and J. Bian. Modeling information-seeker satisfaction in community question answering. ACM Trans. Knowl. Discov. Data, 3, 2009.
[3]
D. Bernhard and I. Gurevych. Combining lexical semantic resources with question & answer archives for translation-based answer finding. In ACL, 2009.
[4]
J. Bian, Y. Liu, E. Agichtein, and H. Zha. Finding the right facts in the crowd: factoid question answering over social media. In WWW, 2008.
[5]
D. M. Blei, A. Y. Ng, M. I. Jordan, and J. Lafferty. Latent dirichlet allocation. Journal of Machine Learning Research, 3:2003, 2003.
[6]
L. Breiman. Random forests. Machine Learning, 45(1):5--32, 2001.
[7]
D. Carmel, M. Shtalhaim, and A. Soffer. eresponder: Electronic question responder. In CooplS, 2000.
[8]
A. Corrada-Emmanuel, W. B. Croft, and V. Murdock. Answer passage retrieval for question answering, 2003.
[9]
S. Cronen-Townsend, Y. Zhou, and W. B. Croft. Precision prediction based on ranked list coherence. Information Retrieval, 9(6):723--755, 2006.
[10]
M.-C. de Marneffe, B. MacCartney, and C. D. Manning. Generating typed dependency parses from phrase structure parses. In LREC, 2006.
[11]
G. Dror, Y. Koren, Y. Maarek, and I. Szpektor. I want to answer; who has a question?: Yahoo! answers recommender system. In KDD, 2011.
[12]
H. Duan, Y. Cao, C.-Y. Lin, and Y. Yu. Searching questions by identifying question topic and question focus. In ACL, 2008.
[13]
J. L. Fleiss. Measuring nominal scale agreement among many raters. Psychological Bulletin, 76(5):378 -- 382, 1971.
[14]
M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, and I. H. Witten. The weka data mining software: an update. SIGKDD Explor. Newsl., 11(1):10--18, 2009.
[15]
C. Hauff, D. Hiemstra, and F. de Jong. A survey of pre-retrieval query performance predictors. In CIKM, 2008.
[16]
B. He and I. Ounis. Inferring query performance using pre-retrieval predictors. In SPIRE, 2004.
[17]
D. Horowitz and S. Kamvar. The anatomy of a large-scale social search engine. In WWW, 2010.
[18]
J. Jeon, W. B. Croft, and J. H. Lee. Finding semantically similar questions based on their answers. In SIGIR, 2005.
[19]
J. Jeon, W. B. Croft, and J. H. Lee. Finding similar questions in large question and answer archives. In CIKM, 2005.
[20]
R. Kohavi. A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection. In IJCAI, 1995.
[21]
B. Li and I. King. Routing questions to appropriate answerers in community question answering services. In CIKM, 2010.
[22]
X. Liu and W. B. Croft. Passage retrieval based on language models. In CIKM, 2002.
[23]
E. Mendes Rodrigues and N. Milic-Frayling. Socializing or knowledge sharing?: characterizing social intent in community question answering. In CIKM, 2009.
[24]
J. M. Prager. Open-domain question-answering. Foundations and Trends in Information Retrieval, 1(2):91--231, 2006.
[25]
I. Roberts and R. Gaizauskas. Evaluating passage retrieval approaches for question answering. In S. McDonald and J. Tait, editors, Advances in Information Retrieval, volume 2997 of Lecture Notes in Computer Science, pages 72--84. Springer Berlin / Heidelberg, 2004.
[26]
J. Sim and C. C. Wright. The kappa statistic in reliability studies: Use, interpretation, and sample size requirements. Physical Therapy, March 2005.
[27]
R. Soricut and E. Brill. Automatic question answering: Beyond the factoid. In HLT-NAACL, 2004.
[28]
T. Strzalkowski and S. Harabagiu. Advances in Open Domain Question Answering (Text, Speech and Language Technology). Springer-Verlag New York, Inc., Secaucus, NJ, USA, 2006.
[29]
M. Surdeanu, M. Ciaramita, and H. Zaragoza. Learning to rank answers on large online QA collections. In HLT-ACL, 2008.
[30]
S. Tellex, B. Katz, J. Lin, A. Fernandes, and G. Marton. Quantitative evaluation of passage retrieval algorithms for question answering. In SIGIR, 2003.
[31]
A. Tsotsis. Just because google exists doesn't mean you should stop asking people things, October 2010. Techcrunch.
[32]
E. M. Voorhees. The trec-8 question answering track report. In Text REtrieval Conference, 1999.
[33]
E. M. Voorhees. Overview of the trec 2003 question answering track. In Text REtrieval Conference, 2003.
[34]
K. Wang, Z. Ming, and T.-S. Chua. A syntactic tree matching approach to finding similar questions in community-based qa services. In SIGIR, 2009.
[35]
X. Xue, J. Jeon, and W. B. Croft. Retrieval models for question and answer archives. In SIGIR, 2008.
[36]
Y. Zhou and W. B. Croft. Query performance prediction in web search environments. In SIGIR, 2007.

Cited By

View all
  • (2024)Investigating the effects of nudges to promote knowledge-sharing behaviours on MOOC forums: a mixed method designBehaviour & Information Technology10.1080/0144929X.2024.231628744:2(289-314)Online publication date: 16-Feb-2024
  • (2021)KnowSum: Knowledge Inclusive Approach for Text Summarization Using Semantic Allignment2021 7th International Conference on Web Research (ICWR)10.1109/ICWR51868.2021.9443149(227-231)Online publication date: 19-May-2021
  • (2021)Accurate Answers Selection and Expert Recommendation in Community Question Answers System2021 5th International Conference on Intelligent Computing and Control Systems (ICICCS)10.1109/ICICCS51141.2021.9432089(1171-1174)Online publication date: 6-May-2021
  • Show More Cited By

Index Terms

  1. Learning from the past: answering new questions with past answers

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    WWW '12: Proceedings of the 21st international conference on World Wide Web
    April 2012
    1078 pages
    ISBN:9781450312295
    DOI:10.1145/2187836
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    • Univ. de Lyon: Universite de Lyon

    In-Cooperation

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 16 April 2012

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. automatic question answering
    2. community-based question answering

    Qualifiers

    • Research-article

    Conference

    WWW 2012
    Sponsor:
    • Univ. de Lyon
    WWW 2012: 21st World Wide Web Conference 2012
    April 16 - 20, 2012
    Lyon, France

    Acceptance Rates

    Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)29
    • Downloads (Last 6 weeks)3
    Reflects downloads up to 20 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Investigating the effects of nudges to promote knowledge-sharing behaviours on MOOC forums: a mixed method designBehaviour & Information Technology10.1080/0144929X.2024.231628744:2(289-314)Online publication date: 16-Feb-2024
    • (2021)KnowSum: Knowledge Inclusive Approach for Text Summarization Using Semantic Allignment2021 7th International Conference on Web Research (ICWR)10.1109/ICWR51868.2021.9443149(227-231)Online publication date: 19-May-2021
    • (2021)Accurate Answers Selection and Expert Recommendation in Community Question Answers System2021 5th International Conference on Intelligent Computing and Control Systems (ICICCS)10.1109/ICICCS51141.2021.9432089(1171-1174)Online publication date: 6-May-2021
    • (2021)Natural language processing based identification of Related Short Forum Posts Through Knowledge Based Conceptualization2021 International Conference on Artificial Intelligence and Smart Systems (ICAIS)10.1109/ICAIS50930.2021.9396051(1733-1740)Online publication date: 25-Mar-2021
    • (2021)TSAR-based Expert Recommendation Mechanism for Community Question Answering2021 IEEE 24th International Conference on Computer Supported Cooperative Work in Design (CSCWD)10.1109/CSCWD49262.2021.9437843(162-167)Online publication date: 5-May-2021
    • (2021)Technological troubleshooting based on sentence embedding with deep transformersJournal of Intelligent Manufacturing10.1007/s10845-021-01797-w32:6(1699-1710)Online publication date: 7-Jun-2021
    • (2021)Detecting Duplicate Question Pairs Using GloVe Embeddings and Similarity MeasuresAdvances in Automation, Signal Processing, Instrumentation, and Control10.1007/978-981-15-8221-9_63(695-702)Online publication date: 5-Mar-2021
    • (2020)Efficient crowdsourcing of crowd-generated microtasksPLOS ONE10.1371/journal.pone.024424515:12(e0244245)Online publication date: 17-Dec-2020
    • (2020)Voice-based Reformulation of Community AnswersProceedings of The Web Conference 202010.1145/3366423.3380053(2885-2891)Online publication date: 20-Apr-2020
    • (2020)HCA: Hierarchical Compare Aggregate model for question retrieval in community question answeringInformation Processing & Management10.1016/j.ipm.2020.10231857:6(102318)Online publication date: Nov-2020
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media