skip to main content
10.1145/1571941.1571973acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
research-article

A classification-based approach to question answering in discussion boards

Published: 19 July 2009 Publication History

Abstract

Discussion boards and online forums are important platforms for people to share information. Users post questions or problems onto discussion boards and rely on others to provide possible solutions and such question-related content sometimes even dominates the whole discussion board. However, to retrieve this kind of information automatically and effectively is still a non-trivial task. In addition, the existence of other types of information (e.g., announcements, plans, elaborations, etc.) makes it difficult to assume that every thread in a discussion board is about a question. We consider the problems of identifying question-related threads and their potential answers as classification tasks. Experimental results across multiple datasets demonstrate that our method can significantly improve the performance in both question detection and answer finding subtasks. We also do a careful comparison of how different types of features contribute to the final result and show that non-content features play a key role in improving overall performance. Finally, we show that a ranking scheme based on our classification approach can yield much better performance than prior published methods.

References

[1]
E. Agichtein, C. Castillo, D. Donato, A. Gionis, and G. Mishne. Finding high-quality content in social media. In Proceedings of the 1st International ACM Conference on Web Search and web Data Mining (WSDM), pages 183--194, New York, NY, 2008. ACM.
[2]
F. Antonelli and M. Sapino. A rule based approach to message board topics classification. In Advances in Multimedia Information Systems, pages 33--48, 2005.
[3]
A. Berger, R. Caruana, D. Cohn, D. Freitag, and V. Mittal. Bridging the lexical chasm: Statistical approaches to answer-finding. In Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 192--199, New York, NY, 2000. ACM.
[4]
M. Bouguessa, B. Dumoulin, and S. Wang. Identifying authoritative actors in question--answering forums: The case of Yahoo! answers. In Proceeding of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 866--874, New York, NY, 2008. ACM.
[5]
Y. Cao, H. Duan, C.-Y. Lin, Y. Yu, and H.-W. Hon. Recommending questions using the MDL-based tree cut model. In Proceeding of the 17th international conference on World Wide Web (WWW), pages 81--90, New York, NY, USA, 2008. ACM.
[6]
V.R. Carvalho and W.W. Cohen. Improving email speech acts analysis via n-gram selection. In Proceedings of the HLT/NAACL 2006 Analyzing Conversations in Text and Speech Workshop (ACTS), pages 35--41, New York City, NY, June 2006. Association for Computational Linguistics.
[7]
C.-C. Chang and C.-J. Lin. LIBSVM: a library for support vector machines, 2001. Software available from http://www.csie.ntu.edu.tw/Ücjlin/libsvm.
[8]
G. Cong, L.Wang, C.-Y. Lin, Y.-I. Song, and Y. Sun. Finding question-answer pairs from online forums. In Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 467--474, New York, NY, 2008. ACM.
[9]
S. Ding, G. Cong, C. Lin, and X. Zhu. Using conditional random fields to extract contexts and answers of questions from online forums. In Proceedings of 46th Annual Meeting of the Association for Computational Linguistics: Human Language Tchnologies (ACL:HLT), pages 710--718, Columbus, OH, June 2008.
[10]
H. Duan, Y. Cao, C.-Y. Lin, and Y. Yu. Searching questions by identifying question topic and question focus. In Proceedings of 46th Annual Meeting of the Association for Computational Linguistics: Human Language Tchnologies (ACL:HLT), Columbus, OH, June 2008.
[11]
D. Feng, E. Shaw, J. Kim, and E. Hovy. An intelligent discussion-bot for answering student queries in threaded discussions. In Proceedings of the 11th International Conference on Intelligent User Interfaces (IUI), pages 171--177, New York, NY, 2006. ACM.
[12]
D. Feng, E. Shaw, J. Kim, and E. Hovy. Learning to detect conversation focus of threaded discussions. In Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics, pages 208--215, Morristown, NJ, 2006. Association for Computational Linguistics.
[13]
Z. Gyöngyi, G. Koutrika, J. Pedersen, and H. Garcia-Molina. Questioning Yahoo! Answers. In Proceedings of the First Workshop on Question Answering on the Web, 2008.
[14]
M. Hu, E.-P. Lim, A. Sun, H.W. Lauw, and B.-Q. Vuong. On improving Wikipedia search using article quality. In Proceedings of the 9th Annual ACM International Workshop on Web Information and Data Management (WIDM), pages 145--152, New York, NY, 2007. ACM.
[15]
J. Huang, M. Zhou, and D. Yang. Extracting chatbot knowledge from online discussion forums. In Proceedings of the 20th International Joint Conference on Artificial Intelligence (IJCAI), pages 423--428, Jan. 2007.
[16]
J. Jeon, W.B. Croft, and J.H. Lee. Finding semantically similar questions based on their answers. In Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 617--618, New York, NY, 2005. ACM.
[17]
J. Jeon, W.B. Croft, and J.H. Lee. Finding similar questions in large question and answer archives. In Proceedings of the 14th ACM International Conference on Information and Knowledge Management (CIKM), pages 84--90, New York, NY, 2005. ACM.
[18]
J. Jeon, W.B. Croft, J.H. Lee, and S. Park. A framework to predict the quality of answers with non-textual features. In Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 228--235, New York, NY, 2006. ACM.
[19]
V. Jijkoun and M. de Rijke. Retrieving answers from frequently asked questions pages on the web. In Proceedings of the 14th ACM International Conference on Information and Knowledge Management (CIKM), pages 76--83, New York, NY, 2005. ACM.
[20]
P. Jurczyk and E. Agichtein. Discovering authorities in question answer communities by using link analysis. In Proceedings of the sixteenth ACM Conference on Information and Knowledge Management (CIKM), pages 919--922, New York, NY, 2007. ACM.
[21]
J. Kim, G. Chern, D. Feng, E. Shaw, and E. Hovy. Mining and assessing discussions on the web through speech act analysis. In Proceedings of the Workshop on Web Content Mining with Human Language Technologies at the 5th International Semantic Web Conference, 2006.
[22]
J. Kim, E. Shaw, D. Feng, C. Beal, and E. Hovy. Modeling and assessing student activities in on-line discussions. In Proceedings of the Workshop on Educational Data Mining at AAAI, 2006.
[23]
C.-J. Lin and C.-H. Cho. Question pre-processing in a QA system on internet discussion groups. In Proceedings of the Workshop on Task--Focused Summarization and Question Answering, 2006.
[24]
Y. Liu, S. Li, Y. Cao, C.-Y. Lin, D. Han, and Y. Yu. Understanding and summarizing answers in community-based question answering services. In Proceedings of the 22nd International Conference on Computational Linguistics (COLING 2008), pages 497--504, Manchester, UK, August 2008.
[25]
J. Pei, J. Han, B. Mortazavi-Asl, H. Pinto, Q. Chen, U. Dayal, and M.-C. Hsu. PrefixSpan: Mining sequential patterns efficiently by prefix-projected pattern growth. In Proceedings of the 17th International Conference on Data Engineering (ICDE), pages 215--224, Los Alamitos, CA, 2001. IEEE Computer Society.
[26]
S. Riezler, A. Vasserman, I. Tsochantaridis, V. Mittal, and Y. Liu. Statistical machine translation for query expansion in answer retrieval. In Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, 2007.
[27]
L. Shrestha and K. McKeown. Detection of question-answer pairs in email conversations. In Proceedings of the 20th International Conference on Computational Linguistics (COLING), page 889, Morristown, NJ, 2004. Association for Computational Linguistics.
[28]
Y.-I. Song, C.-Y. Lin, Y. Cao, and H.-C. Rim. Question utility: A novel static ranking of question search. In Proceedings of the 23rd AAAI Conference on Artificial Intelligence, July 2008.
[29]
M. Surdeanu, M. Ciaramita, and H. Zaragoza. Learning to rank answers on large online qa collections. In 46th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies (ACL-HLT), 2008.
[30]
K. Toutanova and C.D. Manning. Enriching the knowledge sources used in a maximum entropy part-of-speech tagger. In Proceedings of the Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora, pages 63--70. Association for Computational Linguistics, 2000.
[31]
E.M. Voorhees. The TREC question answering track. Nat. Lang. Eng., 7(4):361--378, 2001.
[32]
X. Xue, J. Jeon, and W.B. Croft. Retrieval models for question and answer archives. In Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 475--482, 2008.
[33]
J. Zhang, M.S. Ackerman, and L. Adamic. Expertise networks in online communities: structure and algorithms. In Proceedings of the 16th International Conference on World Wide Web (WWW), pages 221--230, 2007.
[34]
L. Zhou and E. Hovy. Digesting virtual "geek" culture: the summarization of technical internet relay chats. In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL), pages 298--305, 2005.

Cited By

View all
  • (2024)The power and potentials of Flexible Query Answering Systems: A critical and comprehensive analysisData & Knowledge Engineering10.1016/j.datak.2023.102246149(102246)Online publication date: Jan-2024
  • (2024)Structural complexity predicts consensus readability in online discussionsSocial Network Analysis and Mining10.1007/s13278-024-01212-114:1Online publication date: 4-Mar-2024
  • (2024)Multi-dimensional Edge-Embedded GCNs for Arabic Text ClassificationLinking Theory and Practice of Digital Libraries10.1007/978-3-031-72437-4_14(241-255)Online publication date: 26-Sep-2024
  • Show More Cited By

Index Terms

  1. A classification-based approach to question answering in discussion boards

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      SIGIR '09: Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
      July 2009
      896 pages
      ISBN:9781605584836
      DOI:10.1145/1571941
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 19 July 2009

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. classification
      2. discussion boards
      3. online forums
      4. question answering

      Qualifiers

      • Research-article

      Conference

      SIGIR '09
      Sponsor:

      Acceptance Rates

      Overall Acceptance Rate 792 of 3,983 submissions, 20%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)17
      • Downloads (Last 6 weeks)1
      Reflects downloads up to 02 Mar 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)The power and potentials of Flexible Query Answering Systems: A critical and comprehensive analysisData & Knowledge Engineering10.1016/j.datak.2023.102246149(102246)Online publication date: Jan-2024
      • (2024)Structural complexity predicts consensus readability in online discussionsSocial Network Analysis and Mining10.1007/s13278-024-01212-114:1Online publication date: 4-Mar-2024
      • (2024)Multi-dimensional Edge-Embedded GCNs for Arabic Text ClassificationLinking Theory and Practice of Digital Libraries10.1007/978-3-031-72437-4_14(241-255)Online publication date: 26-Sep-2024
      • (2023)Taming Entangled Accessibility Forum Threads for Efficient Screen ReadingProceedings of the 28th International Conference on Intelligent User Interfaces10.1145/3581641.3584073(65-76)Online publication date: 27-Mar-2023
      • (2021)Towards a Toolbox for Mining QA-pairs and QAT-triplets from Conversational Data of Public Chats2021 29th Conference of Open Innovations Association (FRUCT)10.23919/FRUCT52173.2021.9435511(94-101)Online publication date: 12-May-2021
      • (2021)Impact of Lexical Features on Answer Detection Model in Discussion ForumsComplexity10.1155/2021/28932572021(1-8)Online publication date: 14-Apr-2021
      • (2021)Investigating Responsible Factors for Interaction between Learners and Instructors in the Discussion Forum of MOOC2021 9th International Conference on Information and Education Technology (ICIET)10.1109/ICIET51873.2021.9419599(204-207)Online publication date: 27-Mar-2021
      • (2021)Natural language processing based identification of Related Short Forum Posts Through Knowledge Based Conceptualization2021 International Conference on Artificial Intelligence and Smart Systems (ICAIS)10.1109/ICAIS50930.2021.9396051(1733-1740)Online publication date: 25-Mar-2021
      • (2021)A Classification Method of the Learners’ Queries in the Discussion Forum of MOOC to Enhance the Effective Response Rate from InstructorsHCI International 2021 - Posters10.1007/978-3-030-78645-8_14(109-115)Online publication date: 3-Jul-2021
      • (2019)Quality dimensions features for identifying high-quality user replies in text forum threads using classification methodsPLOS ONE10.1371/journal.pone.021551614:5(e0215516)Online publication date: 15-May-2019
      • Show More Cited By

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media