poster

Mining multilingual topics from wikipedia

Authors:

Xiaochuan Ni,

Jian-Tao Sun,

Jian Hu,

Zheng ChenAuthors Info & Claims

WWW '09: Proceedings of the 18th international conference on World wide web

Pages 1155 - 1156

https://doi.org/10.1145/1526709.1526904

Published: 20 April 2009 Publication History

Get Access

Abstract

In this paper, we try to leverage a large-scale and multilingual knowledge base, Wikipedia, to help effectively analyze and organize Web information written in different languages. Based on the observation that one Wikipedia concept may be described by articles in different languages, we adapt existing topic modeling algorithm for mining multilingual topics from this knowledge base. The extracted 'universal' topics have multiple types of representations, with each type corresponding to one language. Accordingly, new documents of different languages can be represented in a space using a group of universal topics, which makes various multilingual Web applications feasible.

References

[1]

D. Blei, A. Ng and M. Jordan. Latent Dirichlet Allocation. JMLR, 3:993--1022, 2003.

Digital Library

Google Scholar

[2]

G. Heinrich. Parameter estimation for text analysis. Technical report, 2005.

Google Scholar

[3]

http://projects.ldc.upenn.edu/Chinese/

Google Scholar

[4]

J. Olsson, D. Oard and J. Hajic. Cross-language text classification. In Proc. of SIGIR-05, pages 645--646, 2005.

Digital Library

Google Scholar

[5]

Y. Wu and D.W. Oard. Bilingual topic aspect classification with a few training examples. In Proc. of SIGIR-08, pages 203--210, 2008.

Digital Library

Google Scholar

Cited By

View all

Austin EMakwana STrabelsi ALargeron CZaïane O(2024)Uncovering Flat and Hierarchical Topics by Community Discovery on Word Co-occurrence NetworkData Science and Engineering10.1007/s41019-023-00239-29:1(41-61)Online publication date: 13-Mar-2024
https://doi.org/10.1007/s41019-023-00239-2
Xudong HJun WJun H(2023)Cross-lingual Related Events Recognition Methods Based on The Event Central News Sets2023 38th Youth Academic Annual Conference of Chinese Association of Automation (YAC)10.1109/YAC59482.2023.10401576(539-545)Online publication date: 27-Aug-2023
https://doi.org/10.1109/YAC59482.2023.10401576
Shen YGuo H(2023)Research on high-performance English translation based on topic modelDigital Communications and Networks10.1016/j.dcan.2022.03.0159:2(505-511)Online publication date: Apr-2023
https://doi.org/10.1016/j.dcan.2022.03.015
Show More Cited By

Index Terms

Mining multilingual topics from wikipedia
1. Information systems
  1. Information retrieval
    1. Document representation

Recommendations

Cross lingual text classification by mining multilingual topics from wikipedia
WSDM '11: Proceedings of the fourth ACM international conference on Web search and data mining

This paper investigates how to effectively do cross lingual text classification by leveraging a large scale and multilingual knowledge base, Wikipedia. Based on the observation that each Wikipedia concept is described by documents of different languages,...
Cross-media topic mining on wikipedia
MM '13: Proceedings of the 21st ACM international conference on Multimedia

As a collaborative wiki-based encyclopedia, Wikipedia provides a huge amount of articles of various categories. In addition to their text corpus, Wikipedia also contains plenty of images which makes the articles more intuitive for readers to understand. ...
Text, Topics, and Turkers: A Consensus Measure for Statistical Topics
HT '15: Proceedings of the 26th ACM Conference on Hypertext & Social Media

Topic modeling is an important tool in social media analysis, allowing researchers to quickly understand large text corpora by investigating the topics underlying them. One of the fundamental problems of topic models lies in how to assess the quality of ...

Comments

Information & Contributors

Information

Published In

WWW '09: Proceedings of the 18th international conference on World wide web

April 2009

1280 pages

ISBN:9781605584874

DOI:10.1145/1526709

General Chairs:
Juan Quemada
DIT-UPM
,
Gonzalo León
DIT-UPM
,
Program Chairs:
Yoelle Maarek
Google Inc., Israel
,
Wolfgang Nejdl
L3S and Hannover University

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 April 2009

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Poster

Conference

WWW '09

Sponsor:

WWW '09: The 18th International World Wide Web Conference

April 20 - 24, 2009

Madrid, Spain

Acceptance Rates

Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

64
Total Citations
View Citations
896
Total Downloads

Downloads (Last 12 months)4
Downloads (Last 6 weeks)1

Reflects downloads up to 15 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

View all

Austin EMakwana STrabelsi ALargeron CZaïane O(2024)Uncovering Flat and Hierarchical Topics by Community Discovery on Word Co-occurrence NetworkData Science and Engineering10.1007/s41019-023-00239-29:1(41-61)Online publication date: 13-Mar-2024
https://doi.org/10.1007/s41019-023-00239-2
Xudong HJun WJun H(2023)Cross-lingual Related Events Recognition Methods Based on The Event Central News Sets2023 38th Youth Academic Annual Conference of Chinese Association of Automation (YAC)10.1109/YAC59482.2023.10401576(539-545)Online publication date: 27-Aug-2023
https://doi.org/10.1109/YAC59482.2023.10401576
Shen YGuo H(2023)Research on high-performance English translation based on topic modelDigital Communications and Networks10.1016/j.dcan.2022.03.0159:2(505-511)Online publication date: Apr-2023
https://doi.org/10.1016/j.dcan.2022.03.015
Gildersleve PLambiotte RYasseri T(2023)Between news and history: identifying networked topics of collective attention on WikipediaJournal of Computational Social Science10.1007/s42001-023-00215-w6:2(845-875)Online publication date: 8-Jul-2023
https://doi.org/10.1007/s42001-023-00215-w
Feng KHuang LXu HWang KWei WZhang R(2022)Deep Multilabel Multilingual Document Learning for Cross-Lingual Document RetrievalEntropy10.3390/e2407094324:7(943)Online publication date: 7-Jul-2022
https://doi.org/10.3390/e24070943
Tushev MEbrahimi FMahmoud ADwyer MDamian DZeller A(2022)Domain-specific analysis of mobile app reviews using keyword-assisted topic modelsProceedings of the 44th International Conference on Software Engineering10.1145/3510003.3510201(762-773)Online publication date: 21-May-2022
https://dl.acm.org/doi/10.1145/3510003.3510201
Hu MPeng JZhang WHu JQi LZhang H(2022)Text Representation Model for Multiple Language Forms in Spoken Chinese ExpressionInternational Journal of Pattern Recognition and Artificial Intelligence10.1142/S0218001422530044Online publication date: 23-May-2022
https://doi.org/10.1142/S0218001422530044
Zhou DPeng XLi LHan J(2022)Cross-lingual embeddings with auxiliary topic modelsExpert Systems with Applications: An International Journal10.1016/j.eswa.2021.116194190:COnline publication date: 9-Apr-2022
https://dl.acm.org/doi/10.1016/j.eswa.2021.116194
Bipat TAlimohammadi NYu YMcDonald DZachry M(2021)Wikipedia Beyond the English Language EditionProceedings of the ACM on Human-Computer Interaction10.1145/34491295:CSCW1(1-39)Online publication date: 22-Apr-2021
https://dl.acm.org/doi/10.1145/3449129
Yu PFei HLi P(2021)Cross-lingual Language Model Pretraining for RetrievalProceedings of the Web Conference 202110.1145/3442381.3449830(1029-1039)Online publication date: 19-Apr-2021
https://dl.acm.org/doi/10.1145/3442381.3449830
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Abstract

References

Cited By

Index Terms

Recommendations

Cross lingual text classification by mining multilingual topics from wikipedia

Cross-media topic mining on wikipedia

Text, Topics, and Turkers: A Consensus Measure for Statistical Topics

Comments

Information

Published In

Sponsors

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

Login options

Full Access

View options

PDF

eReader

Share

Share this Publication link

Share on social media

Affiliations