ABSTRACT
Community Question Answering (CQA) sites such as Yahoo! Answers have emerged as rich knowledge resources for information seekers. However, answers posted to CQA sites can be irrelevant, incomplete, redundant, incorrect, biased, ill-formed or even abusive. Hence, automatic selection of "good" answers for a given posted question is a practical research problem that will help us manage the quality of accumulated knowledge. One way to evaluate answer selection systems for CQA would be to use the Best Answers (BAs) that are readily available from the CQA sites. However, BAs may be biased, and even if they are not, there may be other good answers besides BAs. To remedy these two problems, we propose system evaluation methods that involve multiple answer assessors and graded-relevance information retrieval metrics. Our main findings from experiments using the NTCIR-8 CQA task data are that, using our evaluation methods, (a) we can detect many substantial differences between systems that would have been overlooked by BA-based evaluation; and (b) we can better identify hard questions (i.e. those that are handled poorly by many systems and therefore require focussed investigation) compared to BAbased evaluation. We therefore argue that our approach is useful for building effective CQA answer selection systems despite the cost of manual answer assessments.
Supplemental Material
- Agichtein, E. et al.: Finding High-Quality Content in Social Media, ACM WSDM 2008 Proceedings, pp. 183--194 (2008). Google ScholarDigital Library
- Agichtein, E., Liu, Y. and Bian, J.: Modeling Information-Seeker Satisfaction in Community Question Answering, ACM TKDD, Volume 3, Issue 2, Article No.10 (2009). Google ScholarDigital Library
- Cao, Y. et al.: Recommending Questions Using the MDL-based Tree Cut Model, ACM WWW 2008 Proceedings, pp. 81--90 (2008). Google ScholarDigital Library
- Gyöngyi, Z., Koutrika, G., Pedersen, J. and Garcia-Molina, H.: Questioning Yahoo! Answers, QAWeb 2008 Proceedings (2008).Google Scholar
- Ishikawa, D., Sakai, T. and Kando, N.: Overview of the NTCIR-8 Community QA Pilot Task (Part I): The Test Collection and the Task, NTCIR-8 proceedings, pp. 421--432 (2010).Google Scholar
- Järvelin, K. and Kekäläinen, J.: Cumulated Gain-Based Evaluation of IR Techniques, ACM TOIS, Vol. 20, No. 4, pp. 422--446 (2002). Google ScholarDigital Library
- Jeon, J., Croft, W. B., Lee, J. H. and Park, S.: A Framework to Predict the Quality of Answers with Non-Textual Features, ACM SIGIR 2006 Proceedings, pp. 228--235 (2006). Google ScholarDigital Library
- Lin, J. and Demner-Fushman, D.: Will Pyramids Built of Nuggets Topple Over? HLT/NAACL 2006 Proceedings, pp. 383--390 (2006). Google ScholarDigital Library
- Liu, Y., Li, S., Cao, Y., Lin, C.-Y., Han, D. and Yu, Y.: Understanding and Summarizing Answers in Community-based Question Answering Services, COLING 2008 Proceedings, pp. 497--504 (2008). Google ScholarDigital Library
- Nenkova, A., Passonneau, R. and McKeown, K.: The Pyramid Method: Incorporating Human Content Selection Variation in Sumarization Evaluation, ACM Transactions on Speech and Language Processing, Volume 4, Number 2, Article 4 (2007). Google ScholarDigital Library
- Sakai, T.: Evaluating Evaluation Metrics based on the Bootstrap, ACM SIGIR 2006 Proceedings, pp. 525--532 (2006). Google ScholarDigital Library
- Sakai, T.: On Penalising Late Arrival of Relevant Documents in Information Retrieval Evaluation with Graded Relevance, Proceedings of the First Workshop on Evaluating Information Access (EVIA 2007), pp. 32--43 (2007).Google Scholar
- Sakai, T. and Robertson, S.: Modelling A User Population for Designing Information Retrieval Metrics, Proceedings of the Second Workshop on Evaluating Information Access (EVIA 2008), pp. 30--41 (2008).Google Scholar
- Sakai, T., Ishikawa, D. and Kando, N.: Overview of the NTCIR-8 Community QA Pilot Task (Part II): System Evaluation, NTCIR-8 Proceedings, pp. 433--457 (2010).Google Scholar
- Sakai, T., Ishikawa, D., Seki, Y., Kando, N. and Kuriyama, K.: Selecting Good Answers for Community QA: A Note on Evaluation Methods (in Japanese), Forum on Information Technology 2010, pp. 13--20 (2010).Google Scholar
- Sanderson, M.: Test Collection Based Evaluation of Information Retrieval Systems, Foundations and Trends in Information Retrieval, Vol. 4, No. 4, pp. 247--375 (2010).Google ScholarCross Ref
- Suryanto, M. A., Lin, E.-P., Sun, A. and Chiang, R. H. L.: Quality-Aware Collaborative Question Answering: Methods and Evaluation, ACM WSDM 2009 Proceedings, pp. 142--151 (2009). Google ScholarDigital Library
- Wang, X.-J. et al.: Ranking Community Answers by Modeling Question-Answer Relationships via Analogical Reasoning, ACM SIGIR 2009 Proceedings, pp. 179--186 (2009). Google ScholarDigital Library
- Zhang, J., Ackerman, M. S. and Adamic, L.: Expertise Networks in Online Communities: Structure and Algorithms, ACM WWW 2007 Proceedings, pp. 221--230 (2007). Google ScholarDigital Library
Index Terms
- Using graded-relevance metrics for evaluating community QA answer selection
Recommendations
Evaluating and predicting answer quality in community QA
SIGIR '10: Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrievalQuestion answering (QA) helps one go beyond traditional keywords-based querying and retrieve information in more precise form than given by a document or a list of documents. Several community-based QA (CQA) services have emerged allowing information ...
A community question-answering refinement system
HT '11: Proceedings of the 22nd ACM conference on Hypertext and hypermediaCommunity Question Answering (CQA) websites, which archive millions of questions and answers created by CQA users to provide a rich resource of information that is missing at web search engines and QA websites, have become increasingly popular. Web ...
Unsupervised Answer Retrieval with Data Fusion for Community Question Answering
Information Retrieval TechnologyAbstractCommunity question answering (cQA) systems have enjoyed the benefits of advances in neural information retrieval, some models of which need annotated documents as supervised data. However, in contrast with the amount of supervised data for cQA ...
Comments