ABSTRACT
In this paper, we try to leverage a large-scale and multilingual knowledge base, Wikipedia, to help effectively analyze and organize Web information written in different languages. Based on the observation that one Wikipedia concept may be described by articles in different languages, we adapt existing topic modeling algorithm for mining multilingual topics from this knowledge base. The extracted 'universal' topics have multiple types of representations, with each type corresponding to one language. Accordingly, new documents of different languages can be represented in a space using a group of universal topics, which makes various multilingual Web applications feasible.
- D. Blei, A. Ng and M. Jordan. Latent Dirichlet Allocation. JMLR, 3:993--1022, 2003. Google ScholarDigital Library
- G. Heinrich. Parameter estimation for text analysis. Technical report, 2005.Google Scholar
- http://projects.ldc.upenn.edu/Chinese/Google Scholar
- J. Olsson, D. Oard and J. Hajic. Cross-language text classification. In Proc. of SIGIR-05, pages 645--646, 2005. Google ScholarDigital Library
- Y. Wu and D.W. Oard. Bilingual topic aspect classification with a few training examples. In Proc. of SIGIR-08, pages 203--210, 2008. Google ScholarDigital Library
Index Terms
- Mining multilingual topics from wikipedia
Recommendations
Cross lingual text classification by mining multilingual topics from wikipedia
WSDM '11: Proceedings of the fourth ACM international conference on Web search and data miningThis paper investigates how to effectively do cross lingual text classification by leveraging a large scale and multilingual knowledge base, Wikipedia. Based on the observation that each Wikipedia concept is described by documents of different languages,...
Cross-media topic mining on wikipedia
MM '13: Proceedings of the 21st ACM international conference on MultimediaAs a collaborative wiki-based encyclopedia, Wikipedia provides a huge amount of articles of various categories. In addition to their text corpus, Wikipedia also contains plenty of images which makes the articles more intuitive for readers to understand. ...
Text, Topics, and Turkers: A Consensus Measure for Statistical Topics
HT '15: Proceedings of the 26th ACM Conference on Hypertext & Social MediaTopic modeling is an important tool in social media analysis, allowing researchers to quickly understand large text corpora by investigating the topics underlying them. One of the fundamental problems of topic models lies in how to assess the quality of ...
Comments