skip to main content
10.1145/1935826.1935864acmconferencesArticle/Chapter ViewAbstractPublication PageswsdmConference Proceedingsconference-collections
research-article

Using graded-relevance metrics for evaluating community QA answer selection

Published:09 February 2011Publication History

ABSTRACT

Community Question Answering (CQA) sites such as Yahoo! Answers have emerged as rich knowledge resources for information seekers. However, answers posted to CQA sites can be irrelevant, incomplete, redundant, incorrect, biased, ill-formed or even abusive. Hence, automatic selection of "good" answers for a given posted question is a practical research problem that will help us manage the quality of accumulated knowledge. One way to evaluate answer selection systems for CQA would be to use the Best Answers (BAs) that are readily available from the CQA sites. However, BAs may be biased, and even if they are not, there may be other good answers besides BAs. To remedy these two problems, we propose system evaluation methods that involve multiple answer assessors and graded-relevance information retrieval metrics. Our main findings from experiments using the NTCIR-8 CQA task data are that, using our evaluation methods, (a) we can detect many substantial differences between systems that would have been overlooked by BA-based evaluation; and (b) we can better identify hard questions (i.e. those that are handled poorly by many systems and therefore require focussed investigation) compared to BAbased evaluation. We therefore argue that our approach is useful for building effective CQA answer selection systems despite the cost of manual answer assessments.

Skip Supplemental Material Section

Supplemental Material

wsdm2011_sakai_ugr_01.mov

mov

143.5 MB

wsdm2011_sakai_ugr_01.mp4

mp4

190.6 MB

References

  1. Agichtein, E. et al.: Finding High-Quality Content in Social Media, ACM WSDM 2008 Proceedings, pp. 183--194 (2008). Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Agichtein, E., Liu, Y. and Bian, J.: Modeling Information-Seeker Satisfaction in Community Question Answering, ACM TKDD, Volume 3, Issue 2, Article No.10 (2009). Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Cao, Y. et al.: Recommending Questions Using the MDL-based Tree Cut Model, ACM WWW 2008 Proceedings, pp. 81--90 (2008). Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Gyöngyi, Z., Koutrika, G., Pedersen, J. and Garcia-Molina, H.: Questioning Yahoo! Answers, QAWeb 2008 Proceedings (2008).Google ScholarGoogle Scholar
  5. Ishikawa, D., Sakai, T. and Kando, N.: Overview of the NTCIR-8 Community QA Pilot Task (Part I): The Test Collection and the Task, NTCIR-8 proceedings, pp. 421--432 (2010).Google ScholarGoogle Scholar
  6. Järvelin, K. and Kekäläinen, J.: Cumulated Gain-Based Evaluation of IR Techniques, ACM TOIS, Vol. 20, No. 4, pp. 422--446 (2002). Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Jeon, J., Croft, W. B., Lee, J. H. and Park, S.: A Framework to Predict the Quality of Answers with Non-Textual Features, ACM SIGIR 2006 Proceedings, pp. 228--235 (2006). Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Lin, J. and Demner-Fushman, D.: Will Pyramids Built of Nuggets Topple Over? HLT/NAACL 2006 Proceedings, pp. 383--390 (2006). Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Liu, Y., Li, S., Cao, Y., Lin, C.-Y., Han, D. and Yu, Y.: Understanding and Summarizing Answers in Community-based Question Answering Services, COLING 2008 Proceedings, pp. 497--504 (2008). Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Nenkova, A., Passonneau, R. and McKeown, K.: The Pyramid Method: Incorporating Human Content Selection Variation in Sumarization Evaluation, ACM Transactions on Speech and Language Processing, Volume 4, Number 2, Article 4 (2007). Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Sakai, T.: Evaluating Evaluation Metrics based on the Bootstrap, ACM SIGIR 2006 Proceedings, pp. 525--532 (2006). Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Sakai, T.: On Penalising Late Arrival of Relevant Documents in Information Retrieval Evaluation with Graded Relevance, Proceedings of the First Workshop on Evaluating Information Access (EVIA 2007), pp. 32--43 (2007).Google ScholarGoogle Scholar
  13. Sakai, T. and Robertson, S.: Modelling A User Population for Designing Information Retrieval Metrics, Proceedings of the Second Workshop on Evaluating Information Access (EVIA 2008), pp. 30--41 (2008).Google ScholarGoogle Scholar
  14. Sakai, T., Ishikawa, D. and Kando, N.: Overview of the NTCIR-8 Community QA Pilot Task (Part II): System Evaluation, NTCIR-8 Proceedings, pp. 433--457 (2010).Google ScholarGoogle Scholar
  15. Sakai, T., Ishikawa, D., Seki, Y., Kando, N. and Kuriyama, K.: Selecting Good Answers for Community QA: A Note on Evaluation Methods (in Japanese), Forum on Information Technology 2010, pp. 13--20 (2010).Google ScholarGoogle Scholar
  16. Sanderson, M.: Test Collection Based Evaluation of Information Retrieval Systems, Foundations and Trends in Information Retrieval, Vol. 4, No. 4, pp. 247--375 (2010).Google ScholarGoogle ScholarCross RefCross Ref
  17. Suryanto, M. A., Lin, E.-P., Sun, A. and Chiang, R. H. L.: Quality-Aware Collaborative Question Answering: Methods and Evaluation, ACM WSDM 2009 Proceedings, pp. 142--151 (2009). Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Wang, X.-J. et al.: Ranking Community Answers by Modeling Question-Answer Relationships via Analogical Reasoning, ACM SIGIR 2009 Proceedings, pp. 179--186 (2009). Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Zhang, J., Ackerman, M. S. and Adamic, L.: Expertise Networks in Online Communities: Structure and Algorithms, ACM WWW 2007 Proceedings, pp. 221--230 (2007). Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Using graded-relevance metrics for evaluating community QA answer selection

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          WSDM '11: Proceedings of the fourth ACM international conference on Web search and data mining
          February 2011
          870 pages
          ISBN:9781450304931
          DOI:10.1145/1935826

          Copyright © 2011 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 9 February 2011

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          WSDM '11 Paper Acceptance Rate83of372submissions,22%Overall Acceptance Rate498of2,863submissions,17%

          Upcoming Conference

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader