ABSTRACT
The Web has several information sources on which an ongoing event is discussed. To get a complete picture of the event, it is important to retrieve information from multiple sources. We propose a novel neural network based model which integrates the embeddings from multiple sources, and thus retrieves information from them jointly, %all the sources together, as opposed to combining multiple retrieval results. The importance of the proposed model is that no document-aligned comparable data is needed. Experiments on posts related to a particular event from three different sources - Facebook, Twitter and WhatsApp - exhibit the efficacy of the proposed model.
- Edward A. Fox and Joseph A. Shaw. 1993. Combination of Multiple Searches. In Proceedings of TREC 1993. http://trec.nist.gov/pubs/trec2/papers/txt/23.txt.Google Scholar
- T. Mikolov, W.T. Yih, and G. Zweig. 2013. Linguistic Regularities in Continuous Space Word Representations NAACL HLT 2013.Google Scholar
- S. Siegel. 1956. Nonparametric Statistics for the Behavioral Sciences. McGraw-Hill. showLCCN56008185Google Scholar
- Ke Tao, Fabian Abel, Claudia Hauff, Geert-Jan Houben, and Ujwal Gadiraju. 2013. Groundhog Day: Near-duplicate Detection on Twitter Proc. World Wide Web (WWW). Google ScholarDigital Library
- Ivan Vuliç and Marie-Francine Moens. 2015. Monolingual and Cross-Lingual Information Retrieval Models Based on (Bilingual) Word Embeddings. In Proc. ACM SIGIR. 363--372. Google ScholarDigital Library
- Ivan Vuliç, Susana Zoghbi, and Marie-Francine Moens. 2014. Learning to Bridge Colloquial and Formal Language Applied to Linking and Search of E-Commerce Data. In Proc. ACM SIGIR. 1195--1198. Google ScholarDigital Library
Index Terms
- Retrieving Information from Multiple Sources
Recommendations
Improving Arabic information retrieval using word embedding similarities
Term mismatch is a common limitation of traditional information retrieval (IR) models where relevance scores are estimated based on exact matching of documents and queries. Typically, good IR model should consider distinct but semantically similar words ...
Combining IR Models for Bengali Information Retrieval
Word mismatch between queries and documents is a fundamental problem in information retrieval domain. In this article, the authors present an effective approach to Bengali information retrieval that combines two IR models to tackle the word mismatch ...
Word-embedding-based pseudo-relevance feedback for Arabic information retrieval
Pseudo-relevance feedback (PRF) is a very effective query expansion approach, which reformulates queries by selecting expansion terms from top k pseudo-relevant documents. Although standard PRF models have been proven effective to deal with vocabulary ...
Comments