skip to main content
10.1145/1526709.1526904acmconferencesArticle/Chapter ViewAbstractPublication PagesthewebconfConference Proceedingsconference-collections
poster

Mining multilingual topics from wikipedia

Published: 20 April 2009 Publication History

Abstract

In this paper, we try to leverage a large-scale and multilingual knowledge base, Wikipedia, to help effectively analyze and organize Web information written in different languages. Based on the observation that one Wikipedia concept may be described by articles in different languages, we adapt existing topic modeling algorithm for mining multilingual topics from this knowledge base. The extracted 'universal' topics have multiple types of representations, with each type corresponding to one language. Accordingly, new documents of different languages can be represented in a space using a group of universal topics, which makes various multilingual Web applications feasible.

References

[1]
D. Blei, A. Ng and M. Jordan. Latent Dirichlet Allocation. JMLR, 3:993--1022, 2003.
[2]
G. Heinrich. Parameter estimation for text analysis. Technical report, 2005.
[3]
http://projects.ldc.upenn.edu/Chinese/
[4]
J. Olsson, D. Oard and J. Hajic. Cross-language text classification. In Proc. of SIGIR-05, pages 645--646, 2005.
[5]
Y. Wu and D.W. Oard. Bilingual topic aspect classification with a few training examples. In Proc. of SIGIR-08, pages 203--210, 2008.

Cited By

View all
  • (2024)Uncovering Flat and Hierarchical Topics by Community Discovery on Word Co-occurrence NetworkData Science and Engineering10.1007/s41019-023-00239-29:1(41-61)Online publication date: 13-Mar-2024
  • (2023)Cross-lingual Related Events Recognition Methods Based on The Event Central News Sets2023 38th Youth Academic Annual Conference of Chinese Association of Automation (YAC)10.1109/YAC59482.2023.10401576(539-545)Online publication date: 27-Aug-2023
  • (2023)Research on high-performance English translation based on topic modelDigital Communications and Networks10.1016/j.dcan.2022.03.0159:2(505-511)Online publication date: Apr-2023
  • Show More Cited By

Index Terms

  1. Mining multilingual topics from wikipedia

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    WWW '09: Proceedings of the 18th international conference on World wide web
    April 2009
    1280 pages
    ISBN:9781605584874
    DOI:10.1145/1526709

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 20 April 2009

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. multilingual
    2. topic modeling
    3. universal-topics
    4. wikipedia

    Qualifiers

    • Poster

    Conference

    WWW '09
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)4
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 15 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Uncovering Flat and Hierarchical Topics by Community Discovery on Word Co-occurrence NetworkData Science and Engineering10.1007/s41019-023-00239-29:1(41-61)Online publication date: 13-Mar-2024
    • (2023)Cross-lingual Related Events Recognition Methods Based on The Event Central News Sets2023 38th Youth Academic Annual Conference of Chinese Association of Automation (YAC)10.1109/YAC59482.2023.10401576(539-545)Online publication date: 27-Aug-2023
    • (2023)Research on high-performance English translation based on topic modelDigital Communications and Networks10.1016/j.dcan.2022.03.0159:2(505-511)Online publication date: Apr-2023
    • (2023)Between news and history: identifying networked topics of collective attention on WikipediaJournal of Computational Social Science10.1007/s42001-023-00215-w6:2(845-875)Online publication date: 8-Jul-2023
    • (2022)Deep Multilabel Multilingual Document Learning for Cross-Lingual Document RetrievalEntropy10.3390/e2407094324:7(943)Online publication date: 7-Jul-2022
    • (2022)Domain-specific analysis of mobile app reviews using keyword-assisted topic modelsProceedings of the 44th International Conference on Software Engineering10.1145/3510003.3510201(762-773)Online publication date: 21-May-2022
    • (2022)Text Representation Model for Multiple Language Forms in Spoken Chinese ExpressionInternational Journal of Pattern Recognition and Artificial Intelligence10.1142/S0218001422530044Online publication date: 23-May-2022
    • (2022)Cross-lingual embeddings with auxiliary topic modelsExpert Systems with Applications: An International Journal10.1016/j.eswa.2021.116194190:COnline publication date: 9-Apr-2022
    • (2021)Wikipedia Beyond the English Language EditionProceedings of the ACM on Human-Computer Interaction10.1145/34491295:CSCW1(1-39)Online publication date: 22-Apr-2021
    • (2021)Cross-lingual Language Model Pretraining for RetrievalProceedings of the Web Conference 202110.1145/3442381.3449830(1029-1039)Online publication date: 19-Apr-2021
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media