research-article

Using graded-relevance metrics for evaluating community QA answer selection

Authors:
Tetsuya Sakai

Microsoft Research Asia, Beijing, China

Microsoft Research Asia, Beijing, China
View Profile

,
Daisuke Ishikawa

NII, Tokyo, Japan

NII, Tokyo, Japan
View Profile

,
Noriko Kando

NII, Tokyo, Japan

NII, Tokyo, Japan
View Profile

,
Yohei Seki

University of Tsukuba, Ibaraki, Japan

University of Tsukuba, Ibaraki, Japan
View Profile

,
Kazuko Kuriyama

Shirayuri College, Tokyo, Japan

Shirayuri College, Tokyo, Japan
View Profile

,
Chin-Yew Lin

MSRA, Beijing, China

MSRA, Beijing, China
View Profile

WSDM '11: Proceedings of the fourth ACM international conference on Web search and data miningFebruary 2011Pages 187–196https://doi.org/10.1145/1935826.1935864

Published:09 February 2011Publication History

WSDM '11: Proceedings of the fourth ACM international conference on Web search and data mining

Pages 187–196

ABSTRACT

Community Question Answering (CQA) sites such as Yahoo! Answers have emerged as rich knowledge resources for information seekers. However, answers posted to CQA sites can be irrelevant, incomplete, redundant, incorrect, biased, ill-formed or even abusive. Hence, automatic selection of "good" answers for a given posted question is a practical research problem that will help us manage the quality of accumulated knowledge. One way to evaluate answer selection systems for CQA would be to use the Best Answers (BAs) that are readily available from the CQA sites. However, BAs may be biased, and even if they are not, there may be other good answers besides BAs. To remedy these two problems, we propose system evaluation methods that involve multiple answer assessors and graded-relevance information retrieval metrics. Our main findings from experiments using the NTCIR-8 CQA task data are that, using our evaluation methods, (a) we can detect many substantial differences between systems that would have been overlooked by BA-based evaluation; and (b) we can better identify hard questions (i.e. those that are handled poorly by many systems and therefore require focussed investigation) compared to BAbased evaluation. We therefore argue that our approach is useful for building effective CQA answer selection systems despite the cost of manual answer assessments.

Supplemental Material

wsdm2011_sakai_ugr_01.mov

mov

143.5 MB

Download

wsdm2011_sakai_ugr_01.mp4

mp4

190.6 MB

Download

References

Agichtein, E. et al.: Finding High-Quality Content in Social Media, ACM WSDM 2008 Proceedings, pp. 183--194 (2008). Google ScholarDigital Library
Agichtein, E., Liu, Y. and Bian, J.: Modeling Information-Seeker Satisfaction in Community Question Answering, ACM TKDD, Volume 3, Issue 2, Article No.10 (2009). Google ScholarDigital Library
Cao, Y. et al.: Recommending Questions Using the MDL-based Tree Cut Model, ACM WWW 2008 Proceedings, pp. 81--90 (2008). Google ScholarDigital Library
Gyöngyi, Z., Koutrika, G., Pedersen, J. and Garcia-Molina, H.: Questioning Yahoo! Answers, QAWeb 2008 Proceedings (2008).Google Scholar
Ishikawa, D., Sakai, T. and Kando, N.: Overview of the NTCIR-8 Community QA Pilot Task (Part I): The Test Collection and the Task, NTCIR-8 proceedings, pp. 421--432 (2010).Google Scholar
Järvelin, K. and Kekäläinen, J.: Cumulated Gain-Based Evaluation of IR Techniques, ACM TOIS, Vol. 20, No. 4, pp. 422--446 (2002). Google ScholarDigital Library
Jeon, J., Croft, W. B., Lee, J. H. and Park, S.: A Framework to Predict the Quality of Answers with Non-Textual Features, ACM SIGIR 2006 Proceedings, pp. 228--235 (2006). Google ScholarDigital Library
Lin, J. and Demner-Fushman, D.: Will Pyramids Built of Nuggets Topple Over? HLT/NAACL 2006 Proceedings, pp. 383--390 (2006). Google ScholarDigital Library
Liu, Y., Li, S., Cao, Y., Lin, C.-Y., Han, D. and Yu, Y.: Understanding and Summarizing Answers in Community-based Question Answering Services, COLING 2008 Proceedings, pp. 497--504 (2008). Google ScholarDigital Library
Nenkova, A., Passonneau, R. and McKeown, K.: The Pyramid Method: Incorporating Human Content Selection Variation in Sumarization Evaluation, ACM Transactions on Speech and Language Processing, Volume 4, Number 2, Article 4 (2007). Google ScholarDigital Library
Sakai, T.: Evaluating Evaluation Metrics based on the Bootstrap, ACM SIGIR 2006 Proceedings, pp. 525--532 (2006). Google ScholarDigital Library
Sakai, T.: On Penalising Late Arrival of Relevant Documents in Information Retrieval Evaluation with Graded Relevance, Proceedings of the First Workshop on Evaluating Information Access (EVIA 2007), pp. 32--43 (2007).Google Scholar
Sakai, T. and Robertson, S.: Modelling A User Population for Designing Information Retrieval Metrics, Proceedings of the Second Workshop on Evaluating Information Access (EVIA 2008), pp. 30--41 (2008).Google Scholar
Sakai, T., Ishikawa, D. and Kando, N.: Overview of the NTCIR-8 Community QA Pilot Task (Part II): System Evaluation, NTCIR-8 Proceedings, pp. 433--457 (2010).Google Scholar
Sakai, T., Ishikawa, D., Seki, Y., Kando, N. and Kuriyama, K.: Selecting Good Answers for Community QA: A Note on Evaluation Methods (in Japanese), Forum on Information Technology 2010, pp. 13--20 (2010).Google Scholar
Sanderson, M.: Test Collection Based Evaluation of Information Retrieval Systems, Foundations and Trends in Information Retrieval, Vol. 4, No. 4, pp. 247--375 (2010).Google ScholarCross Ref
Suryanto, M. A., Lin, E.-P., Sun, A. and Chiang, R. H. L.: Quality-Aware Collaborative Question Answering: Methods and Evaluation, ACM WSDM 2009 Proceedings, pp. 142--151 (2009). Google ScholarDigital Library
Wang, X.-J. et al.: Ranking Community Answers by Modeling Question-Answer Relationships via Analogical Reasoning, ACM SIGIR 2009 Proceedings, pp. 179--186 (2009). Google ScholarDigital Library
Zhang, J., Ackerman, M. S. and Adamic, L.: Expertise Networks in Online Communities: Structure and Algorithms, ACM WWW 2007 Proceedings, pp. 221--230 (2007). Google ScholarDigital Library

Index Terms

Using graded-relevance metrics for evaluating community QA answer selection
1. Human-centered computing
  1. Collaborative and social computing
    1. Collaborative and social computing systems and tools
2. Information systems
  1. Information retrieval
  2. World Wide Web

Recommendations

Evaluating and predicting answer quality in community QA
SIGIR '10: Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval

Question answering (QA) helps one go beyond traditional keywords-based querying and retrieve information in more precise form than given by a document or a list of documents. Several community-based QA (CQA) services have emerged allowing information ...
Read More
A community question-answering refinement system
HT '11: Proceedings of the 22nd ACM conference on Hypertext and hypermedia

Community Question Answering (CQA) websites, which archive millions of questions and answers created by CQA users to provide a rich resource of information that is missing at web search engines and QA websites, have become increasingly popular. Web ...
Read More
Unsupervised Answer Retrieval with Data Fusion for Community Question Answering
Information Retrieval Technology
Abstract
Community question answering (cQA) systems have enjoyed the benefits of advances in neural information retrieval, some models of which need annotated documents as supervised data. However, in contrast with the amount of supervised data for cQA ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
WSDM '11: Proceedings of the fourth ACM international conference on Web search and data mining
February 2011
870 pages
ISBN:9781450304931
DOI:10.1145/1935826
General Chair:
Irwin King
CUHK, Hong Kong
,
Program Chairs:
Wolfgang Nejdl
L3S and University of Hannover, Germany
,
Hang Li
Microsoft Research Asia, China
Copyright © 2011 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 9 February 2011
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
best answers
community question answering
evaluation
graded relevance
ntcir
test collections
Qualifiers
- research-article
Conference

Acceptance Rates
WSDM '11 Paper Acceptance Rate83of372submissions,22%Overall Acceptance Rate498of2,863submissions,17%
More
Upcoming Conference
WSDM '25

Sponsor:

sigir

sigir

sigir

sigir

The Eighteenth ACM International Conference on Web Search and Data Mining

April 7 - 11, 2025

Hannover , Germany
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 32
  Total Citations
  View Citations
- 748
  Total Downloads
- Downloads (Last 12 months)13
- Downloads (Last 6 weeks)3
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Using graded-relevance metrics for evaluating community QA answer selection

WSDM '11: Proceedings of the fourth ACM international conference on Web search and data mining

ABSTRACT

Supplemental Material

References

Cited By

Index Terms

Recommendations

Evaluating and predicting answer quality in community QA

A community question-answering refinement system

Unsupervised Answer Retrieval with Data Fusion for Community Question Answering

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Using graded-relevance metrics for evaluating community QA answer selection

WSDM '11: Proceedings of the fourth ACM international conference on Web search and data mining

ABSTRACT

Supplemental Material

References

Cited By

Index Terms

Recommendations

Evaluating and predicting answer quality in community QA

A community question-answering refinement system

Unsupervised Answer Retrieval with Data Fusion for Community Question Answering

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media