research-article

Concept Embedded Convolutional Semantic Model for Question Retrieval

Authors:
Pengwei Wang

South China University of Technology, Guangzhou, China

South China University of Technology, Guangzhou, China
View Profile

,
Yong Zhang

Weber State University, Ogden, USA

Weber State University, Ogden, USA
View Profile

,
Lei Ji

Microsoft Research Asia, Beijing, China

Microsoft Research Asia, Beijing, China
View Profile

,
Jun Yan

Microsoft Research Asia, Beijing, China

Microsoft Research Asia, Beijing, China
View Profile

,
Lianwen Jin

South China University of Technology, Guangzhou, China

South China University of Technology, Guangzhou, China
View Profile

WSDM '17: Proceedings of the Tenth ACM International Conference on Web Search and Data MiningFebruary 2017Pages 395–403https://doi.org/10.1145/3018661.3018687

Published:02 February 2017Publication History

WSDM '17: Proceedings of the Tenth ACM International Conference on Web Search and Data Mining

Pages 395–403

ABSTRACT

The question retrieval, which aims to find similar questions of a given question, is playing pivotal role in various question answering (QA) systems. This task is quite challenging mainly on three aspects: lexical gap, polysemy and word order. In this paper, we propose a unified framework to simultaneously handle these three problems. We use word combined with corresponding concept information to handle the polysemous problem. The concept embedding and word embedding are learned at the same time from both context-dependent and context-independent view. The lexical gap problem is handled since the semantic information has been encoded into the embedding. Then, we propose to use a high-level feature embedded convolutional semantic model to learn the question embedding by inputting the concept embedding and word embedding without manually labeling training data. The proposed framework nicely represent the hierarchical structures of word information and concept information in sentences with their layer-by-layer composition and pooling. Finally, the framework is trained in a weakly-supervised manner on question answer pairs, which can be directly obtained without manually labeling. Experiments on two real question answering datasets show that the proposed framework can significantly outperform the state-of-the-art solutions.

References

Y. Bengio, R. Ducharme, P. Vincent, and C. Jauvin. A neural probabilistic language model. Journal of Machine Learning Research, pages 3:1137--1155, 2003.Google ScholarDigital Library
P. F. Brown, S. A. D. Pietra, V. J. D. Pietra, and R. L. Mercer. The mathematics of statistical machine translation: parameter estimation? In Computational Linguistics, pages 263--311, 1993.Google ScholarDigital Library
J. Cheng, Z. Wang, J.-R. Wen, J. Yan, and Z. Chen. Contextual text understanding in distributional semantic space. In Proceedings of the 24th ACM International on Conference on Information and Knowledge Management (CIKM), pages 133--142, 2015. Google ScholarDigital Library
R. Collobert, J. Weston, L. Bottou, M. Karlen, K. Kavukcuoglu, and P. Kuksa. Natural language processing (almost) from scratch. Journal of Machine Learning Research, pages 12:2493--2537, 2011.Google ScholarDigital Library
X. Hao, X. Chang, and K. Liu. A rule-based chinese question answering system for reading comprehension test. In Proceedings IIH-MSP, 2007. Google ScholarDigital Library
Z. S. Harris. Distributional structure. Word, 1954.Google Scholar
P. S. Huang, X. He, J. Gao, and L. Deng. Learning deep structured semantic models for web search using clickthrough data. In Proceedings of the 22nd ACM international conference on Conference on information & knowledge management (CIKM), pages 2333--2338, 2013. Google ScholarDigital Library
K. Järvelin and J. Kekäläinen. Ir evaluation methods for retrieving highly relevant documents. In Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval (SIGIR), pages 41--48, 2000. Google ScholarDigital Library
J. Jeon., W. B. Croft, and J. H. Lee. Finding similar questions in large question and answers archives. In Proceedings of the 14th ACM international conference on Information and knowledge management (CIKM)., pages 84--90, 2005. Google ScholarDigital Library
Z. Ji, F. Xu, B. Wang, and B. He. Question-answer topic model for question retrieval in community question answering. In Proceedings of CIKM, pages 2471--2474, 2012. Google ScholarDigital Library
N. Kalchbrenner, E. Grefenstette, and P. Blunsom. A convolutional neural network for modelling sentences. arXiv, 2014.Google Scholar
D. Kartsaklis and M. Sadrzadeh. Prior disambiguation of word tensors for constructing sentence vectors. In EMNLP, pages 1590--1601, 2013.Google Scholar
Q. V. Le and T. Mikolov. Distributed representations of sentences and documents. Arxiv, 2014.Google Scholar
J. T. Lee, S. B. Kim, Y. I. Song, and H. C. Rim. Bridging lexical gaps between queries and questions on large online q&a collections with compact translation models. In Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 410--418, 2008. Google ScholarCross Ref
J. Li and D. Jurafsky. Do multi-sense embeddings improve natural language understanding? arXiv, 2015.Google Scholar
T. Mikolov, K. Chen, G. Corrado, and J. Dean. Efficient estimation of word representations in vector space. In ICLR Work-shop, 2013.Google Scholar
A. Neelakantan, J. Shankar, A. Passos, and A. McCallum. Efficient nonparametric estimation of multiple embeddings per word in vector space. In Proceedings of EMNLP, 2014.Google ScholarCross Ref
G. Salton, A. Wong, and C. S. Yang. A vector space model for automatic indexing. In Communications of the ACM, pages 613--620, 1975. Google ScholarDigital Library
Y. Shen, X. He, J. Gao, L. Deng, and G. Mesnil. A latent semantic model with convolutional-pooling structure for information retrieval. In Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management (CIKM), pages 101--110, 2014. Google ScholarDigital Library
R. Socher, C. C. Lin, C. Manning, and A. Y. Ng. Parsing natural scenes and natural language with recursive neural networks. In Proceedings of (ICML-11), pages 129--136, 2011.Google Scholar
K. Wang, Z. Ming, and T. S. Chua. A syntactic tree matching approach to finding similar questions in community-based qa services. In Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval (SIGIR), pages 187--194, 2009. Google ScholarDigital Library
W. A. Woods. Progress in natural language understanding - an application to lunar geology. In Proceedings of AFIPS Conference, pages 441--450, 1973.Google Scholar
W. Wu, H. Li, H. Wang, and K. Q. Zhu. Probase: A probabilistic taxonomy for text understanding. In SIGMOD 2012, pages 481--492, 2012. Google ScholarDigital Library
C. Xu, Y. Bai, J. Bian, B. Gao, G. Wang, X. Liu, and T.-Y. Liu. Rcnet: A general framework for incorporating knowledge into word representations. In Proceedings of CIKM, pages 1219--1228, 2014.Google Scholar
X. Xue, J. Jeon, and W. B. Croft. Retrieval models for question and answer archives. In Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval (SIGIR), pages 475--482, 2008. Google ScholarDigital Library
M. Yu and M. Dredze. Improving lexical embeddings with semantic knowledge. In Proceedings of ACL, pages 545--550, 2014. Google ScholarCross Ref
C. Zhai and J. Lafferty. A study of smooth methods for language model applied to ad hoc information retrieval. In Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval (SIGIR), pages 34--342, 2001. Google ScholarDigital Library
K. Zhang, W. Wu, H. Wu, Z. Li, and M. Zhou. Question retrieval with high quality answers in community question answering. In Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management (CIKM), pages 71--380, 2014. Google ScholarDigital Library
W. N. Zhang, Z. Y. Ming, Y. Zhang, T. Liu, and T. S. Chua. Exploring key concept paraphrasing based on pivot language translation for question retrieval. In Proceedings of the 29th National Conference on Artificial Intelligence (AAAI), pages 410--416, 2015.Google Scholar
G. Zhou, L. Cai, J. Zhao, and K. Liu. Phrase-based translation model for question retrieval in community question answer archives. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies (HLT), pages 653--662, 2011.Google ScholarDigital Library
G. Zhou, Y. Chen, D. Zeng, and J. Zhao. Towards faster and better retrieval models for question search. In Proceedings of CIKM, pages 2139--2148, 2013. Google ScholarDigital Library
G. Zhou, F. Liu, Y. Liu, S. He, and J. Zhao. Statistical machine translation improves question retrieval in community question answering via matrix factorization. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (ACL), pages 852--861, 2013.Google Scholar
G. Zhou, K. Liu, and J. Zhao. Exploiting bilingual translation for question retrieval in community-based question answering. In Proceedings of COLING, 2012.Google Scholar
Z. Zhou, T. He, J. Zhao, and P. Hu. Learning continuous word embeddings with metadata for question retrieval in community question answering. In proceedings of the 53rd annual meeting of the association for computational linguistics and the 7th international joint conference on natural language processing, pages 250--259, 2015. Google ScholarCross Ref

Index Terms

Concept Embedded Convolutional Semantic Model for Question Retrieval
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
2. Information systems
  1. Information retrieval
    1. Retrieval tasks and goals
      1. Question answering

Recommendations

Concept and Attention-Based CNN for Question Retrieval in Multi-View Learning
Research Survey and Regular Papers

Question retrieval, which aims to find similar versions of a given question, is playing a pivotal role in various question answering (QA) systems. This task is quite challenging, mainly in regard to five aspects: synonymy, polysemy, word order, question ...
Read More
Cross-lingual embedding for cross-lingual question retrieval in low-resource community question answering
Abstract
In today’s digital world people are keen on finding the knowledge they need by surfing the internet to find the answers to their questions. To this aim, many Community Question Answering (CQA) systems are established, in which people can ask their ...
Read More
Learning the multilingual translation representations for question retrieval in community question answering via non-negative matrix factorization

Community question answering (CQA) has become an increasingly popular research topic. In this paper, we focus on the problem of question retrieval. Question retrieval in CQA can automatically find the most relevant and recent questions that have been ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
WSDM '17: Proceedings of the Tenth ACM International Conference on Web Search and Data Mining
February 2017
868 pages
ISBN:9781450346757
DOI:10.1145/3018661
General Chairs:
Maarten de Rijke
University of Amsterdam
,
Milad Shokouhi
Microsoft
,
Program Chairs:
Andrew Tomkins
Google
,
Min Zhang
Tsinghua University
Copyright © 2017 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 2 February 2017
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
concept embedding
question embedding
question retrieval
Qualifiers
- research-article
Conference

Acceptance Rates
WSDM '17 Paper Acceptance Rate80of505submissions,16%Overall Acceptance Rate498of2,863submissions,17%
More
Upcoming Conference
WSDM '25

Sponsor:

sigir

sigir

sigir

sigir

The Eighteenth ACM International Conference on Web Search and Data Mining

April 7 - 11, 2025

Hannover , Germany
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 5
  Total Citations
  View Citations
- 452
  Total Downloads
- Downloads (Last 12 months)3
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Concept Embedded Convolutional Semantic Model for Question Retrieval

WSDM '17: Proceedings of the Tenth ACM International Conference on Web Search and Data Mining

ABSTRACT

References

Cited By

Index Terms

Recommendations

Concept and Attention-Based CNN for Question Retrieval in Multi-View Learning

Cross-lingual embedding for cross-lingual question retrieval in low-resource community question answering

Learning the multilingual translation representations for question retrieval in community question answering via non-negative matrix factorization