skip to main content
10.1145/1526709.1526904acmconferencesArticle/Chapter ViewAbstractPublication PageswwwConference Proceedingsconference-collections
poster

Mining multilingual topics from wikipedia

Published:20 April 2009Publication History

ABSTRACT

In this paper, we try to leverage a large-scale and multilingual knowledge base, Wikipedia, to help effectively analyze and organize Web information written in different languages. Based on the observation that one Wikipedia concept may be described by articles in different languages, we adapt existing topic modeling algorithm for mining multilingual topics from this knowledge base. The extracted 'universal' topics have multiple types of representations, with each type corresponding to one language. Accordingly, new documents of different languages can be represented in a space using a group of universal topics, which makes various multilingual Web applications feasible.

References

  1. D. Blei, A. Ng and M. Jordan. Latent Dirichlet Allocation. JMLR, 3:993--1022, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. G. Heinrich. Parameter estimation for text analysis. Technical report, 2005.Google ScholarGoogle Scholar
  3. http://projects.ldc.upenn.edu/Chinese/Google ScholarGoogle Scholar
  4. J. Olsson, D. Oard and J. Hajic. Cross-language text classification. In Proc. of SIGIR-05, pages 645--646, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Y. Wu and D.W. Oard. Bilingual topic aspect classification with a few training examples. In Proc. of SIGIR-08, pages 203--210, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Mining multilingual topics from wikipedia

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      WWW '09: Proceedings of the 18th international conference on World wide web
      April 2009
      1280 pages
      ISBN:9781605584874
      DOI:10.1145/1526709

      Copyright © 2009 Copyright is held by the author/owner(s)

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 20 April 2009

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • poster

      Acceptance Rates

      Overall Acceptance Rate1,899of8,196submissions,23%

      Upcoming Conference

      WWW '24
      The ACM Web Conference 2024
      May 13 - 17, 2024
      Singapore , Singapore

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader