poster

Fast on-line learning for multilingual categorization

Authors:
Michelle Kovesi

National Research Council Canada, Gatineau, PQ, Canada

National Research Council Canada, Gatineau, PQ, Canada
View Profile

,
Cyril Goutte

National Research Council Canada, Gatineau, PQ, Canada

National Research Council Canada, Gatineau, PQ, Canada
View Profile

,
Massih-Reza Amini

Laboratoire d'Informatique de Paris 6, Paris, France

Laboratoire d'Informatique de Paris 6, Paris, France
View Profile

SIGIR '12: Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrievalAugust 2012Pages 1071–1072https://doi.org/10.1145/2348283.2348474

Published:12 August 2012Publication History

SIGIR '12: Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval

Pages 1071–1072

ABSTRACT

Multiview learning has been shown to be a natural and efficient framework for supervised or semi-supervised learning of multilingual document categorizers. The state-of-the-art co-regularization approach relies on alternate minimizations of a combination of language-specific categorization errors and a disagreement between the outputs of the monolingual text categorizers. This is typically solved by repeatedly training categorizers on each language with the appropriate regularizer. We extend and improve this approach by introducing an on-line learning scheme, where language-specific updates are interleaved in order to iteratively optimize the global cost in one pass. Our experimental results show that this produces similar performance as the batch approach, at a fraction of the computational cost.

References

M.-R. Amini and C. Goutte. A co-classification approach to learning from multilingual corpora. Machine Learning, 79(1--2), 2010. Google ScholarDigital Library
M. R. Amini, C. Goutte, and N. Usunier. Combining coregularization and consensus-based self-training for multilingual text categorization. In SIGIR'10, 2010. Google ScholarDigital Library
L. Bottou and Y. LeCun. Large scale online learning. In NIPS 16, 2004.Google Scholar
A. Eisele and Y. Chen. MultiUN: A multilingual corpus from united nation documents. In LREC'10, 2010.Google Scholar
D. D. Lewis, Y. Yang, T. Rose, and F. Li. A new benchmark collection for text categorization research. J. Machine Learning Research, 5:361--397, 2004. Google ScholarDigital Library
B. Pouliquen, R. Steinberger, and C. Ignat. Automatic annotation of multilingual text collections with a conceptual thesaurus. CoRR, abs/cs/0609059, 2006.Google Scholar

Index Terms

Fast on-line learning for multilingual categorization
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing

Recommendations

Multilingual Text Categorization: (Based on Machine Learning Algorithms and Ontologies)
Read More
Multilingual sentence categorization and novelty mining

A challenge for sentence categorization and novelty mining is to detect not only when text is relevant to the user's information need, but also when it contains something new which the user has not seen before. It involves two tasks that need to be ...
Read More
Unsupervised multilingual learning for POS tagging
EMNLP '08: Proceedings of the Conference on Empirical Methods in Natural Language Processing

We demonstrate the effectiveness of multilingual learning for unsupervised part-of-speech tagging. The key hypothesis of multilingual learning is that by combining cues from multiple languages, the structure of each becomes more apparent. We formulate a ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SIGIR '12: Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
August 2012
1236 pages
ISBN:9781450314725
DOI:10.1145/2348283
General Chair:
William Hersh
Oregon Health & Science University, USA
,
Program Chairs:
Jamie Callan
Carnegie Mellon University, USA
,
Yoelle Maarek
Yahoo! Research, Israel
,
Mark Sanderson
Royal Melbourne Institute of Technology, Australia
Copyright © 2012 Authors
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 12 August 2012
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
multilingual text categorisation
on-line learning
Qualifiers
- poster
Conference

Acceptance Rates
Overall Acceptance Rate792of3,983submissions,20%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 147
  Total Downloads
- Downloads (Last 12 months)1
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Fast on-line learning for multilingual categorization

SIGIR '12: Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval

ABSTRACT

References

Cited By

Index Terms

Recommendations

Multilingual Text Categorization: (Based on Machine Learning Algorithms and Ontologies)

Multilingual sentence categorization and novelty mining

Unsupervised multilingual learning for POS tagging