ABSTRACT
The question retrieval, which aims to find similar questions of a given question, is playing pivotal role in various question answering (QA) systems. This task is quite challenging mainly on three aspects: lexical gap, polysemy and word order. In this paper, we propose a unified framework to simultaneously handle these three problems. We use word combined with corresponding concept information to handle the polysemous problem. The concept embedding and word embedding are learned at the same time from both context-dependent and context-independent view. The lexical gap problem is handled since the semantic information has been encoded into the embedding. Then, we propose to use a high-level feature embedded convolutional semantic model to learn the question embedding by inputting the concept embedding and word embedding without manually labeling training data. The proposed framework nicely represent the hierarchical structures of word information and concept information in sentences with their layer-by-layer composition and pooling. Finally, the framework is trained in a weakly-supervised manner on question answer pairs, which can be directly obtained without manually labeling. Experiments on two real question answering datasets show that the proposed framework can significantly outperform the state-of-the-art solutions.
- Y. Bengio, R. Ducharme, P. Vincent, and C. Jauvin. A neural probabilistic language model. Journal of Machine Learning Research, pages 3:1137--1155, 2003.Google ScholarDigital Library
- P. F. Brown, S. A. D. Pietra, V. J. D. Pietra, and R. L. Mercer. The mathematics of statistical machine translation: parameter estimation? In Computational Linguistics, pages 263--311, 1993.Google ScholarDigital Library
- J. Cheng, Z. Wang, J.-R. Wen, J. Yan, and Z. Chen. Contextual text understanding in distributional semantic space. In Proceedings of the 24th ACM International on Conference on Information and Knowledge Management (CIKM), pages 133--142, 2015. Google ScholarDigital Library
- R. Collobert, J. Weston, L. Bottou, M. Karlen, K. Kavukcuoglu, and P. Kuksa. Natural language processing (almost) from scratch. Journal of Machine Learning Research, pages 12:2493--2537, 2011.Google ScholarDigital Library
- X. Hao, X. Chang, and K. Liu. A rule-based chinese question answering system for reading comprehension test. In Proceedings IIH-MSP, 2007. Google ScholarDigital Library
- Z. S. Harris. Distributional structure. Word, 1954.Google Scholar
- P. S. Huang, X. He, J. Gao, and L. Deng. Learning deep structured semantic models for web search using clickthrough data. In Proceedings of the 22nd ACM international conference on Conference on information & knowledge management (CIKM), pages 2333--2338, 2013. Google ScholarDigital Library
- K. Järvelin and J. Kekäläinen. Ir evaluation methods for retrieving highly relevant documents. In Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval (SIGIR), pages 41--48, 2000. Google ScholarDigital Library
- J. Jeon., W. B. Croft, and J. H. Lee. Finding similar questions in large question and answers archives. In Proceedings of the 14th ACM international conference on Information and knowledge management (CIKM)., pages 84--90, 2005. Google ScholarDigital Library
- Z. Ji, F. Xu, B. Wang, and B. He. Question-answer topic model for question retrieval in community question answering. In Proceedings of CIKM, pages 2471--2474, 2012. Google ScholarDigital Library
- N. Kalchbrenner, E. Grefenstette, and P. Blunsom. A convolutional neural network for modelling sentences. arXiv, 2014.Google Scholar
- D. Kartsaklis and M. Sadrzadeh. Prior disambiguation of word tensors for constructing sentence vectors. In EMNLP, pages 1590--1601, 2013.Google Scholar
- Q. V. Le and T. Mikolov. Distributed representations of sentences and documents. Arxiv, 2014.Google Scholar
- J. T. Lee, S. B. Kim, Y. I. Song, and H. C. Rim. Bridging lexical gaps between queries and questions on large online q&a collections with compact translation models. In Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 410--418, 2008. Google ScholarCross Ref
- J. Li and D. Jurafsky. Do multi-sense embeddings improve natural language understanding? arXiv, 2015.Google Scholar
- T. Mikolov, K. Chen, G. Corrado, and J. Dean. Efficient estimation of word representations in vector space. In ICLR Work-shop, 2013.Google Scholar
- A. Neelakantan, J. Shankar, A. Passos, and A. McCallum. Efficient nonparametric estimation of multiple embeddings per word in vector space. In Proceedings of EMNLP, 2014.Google ScholarCross Ref
- G. Salton, A. Wong, and C. S. Yang. A vector space model for automatic indexing. In Communications of the ACM, pages 613--620, 1975. Google ScholarDigital Library
- Y. Shen, X. He, J. Gao, L. Deng, and G. Mesnil. A latent semantic model with convolutional-pooling structure for information retrieval. In Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management (CIKM), pages 101--110, 2014. Google ScholarDigital Library
- R. Socher, C. C. Lin, C. Manning, and A. Y. Ng. Parsing natural scenes and natural language with recursive neural networks. In Proceedings of (ICML-11), pages 129--136, 2011.Google Scholar
- K. Wang, Z. Ming, and T. S. Chua. A syntactic tree matching approach to finding similar questions in community-based qa services. In Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval (SIGIR), pages 187--194, 2009. Google ScholarDigital Library
- W. A. Woods. Progress in natural language understanding - an application to lunar geology. In Proceedings of AFIPS Conference, pages 441--450, 1973.Google Scholar
- W. Wu, H. Li, H. Wang, and K. Q. Zhu. Probase: A probabilistic taxonomy for text understanding. In SIGMOD 2012, pages 481--492, 2012. Google ScholarDigital Library
- C. Xu, Y. Bai, J. Bian, B. Gao, G. Wang, X. Liu, and T.-Y. Liu. Rcnet: A general framework for incorporating knowledge into word representations. In Proceedings of CIKM, pages 1219--1228, 2014.Google Scholar
- X. Xue, J. Jeon, and W. B. Croft. Retrieval models for question and answer archives. In Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval (SIGIR), pages 475--482, 2008. Google ScholarDigital Library
- M. Yu and M. Dredze. Improving lexical embeddings with semantic knowledge. In Proceedings of ACL, pages 545--550, 2014. Google ScholarCross Ref
- C. Zhai and J. Lafferty. A study of smooth methods for language model applied to ad hoc information retrieval. In Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval (SIGIR), pages 34--342, 2001. Google ScholarDigital Library
- K. Zhang, W. Wu, H. Wu, Z. Li, and M. Zhou. Question retrieval with high quality answers in community question answering. In Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management (CIKM), pages 71--380, 2014. Google ScholarDigital Library
- W. N. Zhang, Z. Y. Ming, Y. Zhang, T. Liu, and T. S. Chua. Exploring key concept paraphrasing based on pivot language translation for question retrieval. In Proceedings of the 29th National Conference on Artificial Intelligence (AAAI), pages 410--416, 2015.Google Scholar
- G. Zhou, L. Cai, J. Zhao, and K. Liu. Phrase-based translation model for question retrieval in community question answer archives. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies (HLT), pages 653--662, 2011.Google ScholarDigital Library
- G. Zhou, Y. Chen, D. Zeng, and J. Zhao. Towards faster and better retrieval models for question search. In Proceedings of CIKM, pages 2139--2148, 2013. Google ScholarDigital Library
- G. Zhou, F. Liu, Y. Liu, S. He, and J. Zhao. Statistical machine translation improves question retrieval in community question answering via matrix factorization. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (ACL), pages 852--861, 2013.Google Scholar
- G. Zhou, K. Liu, and J. Zhao. Exploiting bilingual translation for question retrieval in community-based question answering. In Proceedings of COLING, 2012.Google Scholar
- Z. Zhou, T. He, J. Zhao, and P. Hu. Learning continuous word embeddings with metadata for question retrieval in community question answering. In proceedings of the 53rd annual meeting of the association for computational linguistics and the 7th international joint conference on natural language processing, pages 250--259, 2015. Google ScholarCross Ref
Index Terms
- Concept Embedded Convolutional Semantic Model for Question Retrieval
Recommendations
Concept and Attention-Based CNN for Question Retrieval in Multi-View Learning
Research Survey and Regular PapersQuestion retrieval, which aims to find similar versions of a given question, is playing a pivotal role in various question answering (QA) systems. This task is quite challenging, mainly in regard to five aspects: synonymy, polysemy, word order, question ...
Cross-lingual embedding for cross-lingual question retrieval in low-resource community question answering
AbstractIn today’s digital world people are keen on finding the knowledge they need by surfing the internet to find the answers to their questions. To this aim, many Community Question Answering (CQA) systems are established, in which people can ask their ...
Learning the multilingual translation representations for question retrieval in community question answering via non-negative matrix factorization
Community question answering (CQA) has become an increasingly popular research topic. In this paper, we focus on the problem of question retrieval. Question retrieval in CQA can automatically find the most relevant and recent questions that have been ...
Comments