skip to main content
10.1145/2348283.2348474acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
poster

Fast on-line learning for multilingual categorization

Published:12 August 2012Publication History

ABSTRACT

Multiview learning has been shown to be a natural and efficient framework for supervised or semi-supervised learning of multilingual document categorizers. The state-of-the-art co-regularization approach relies on alternate minimizations of a combination of language-specific categorization errors and a disagreement between the outputs of the monolingual text categorizers. This is typically solved by repeatedly training categorizers on each language with the appropriate regularizer. We extend and improve this approach by introducing an on-line learning scheme, where language-specific updates are interleaved in order to iteratively optimize the global cost in one pass. Our experimental results show that this produces similar performance as the batch approach, at a fraction of the computational cost.

References

  1. M.-R. Amini and C. Goutte. A co-classification approach to learning from multilingual corpora. Machine Learning, 79(1--2), 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. M. R. Amini, C. Goutte, and N. Usunier. Combining coregularization and consensus-based self-training for multilingual text categorization. In SIGIR'10, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. L. Bottou and Y. LeCun. Large scale online learning. In NIPS 16, 2004.Google ScholarGoogle Scholar
  4. A. Eisele and Y. Chen. MultiUN: A multilingual corpus from united nation documents. In LREC'10, 2010.Google ScholarGoogle Scholar
  5. D. D. Lewis, Y. Yang, T. Rose, and F. Li. A new benchmark collection for text categorization research. J. Machine Learning Research, 5:361--397, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. B. Pouliquen, R. Steinberger, and C. Ignat. Automatic annotation of multilingual text collections with a conceptual thesaurus. CoRR, abs/cs/0609059, 2006.Google ScholarGoogle Scholar

Index Terms

  1. Fast on-line learning for multilingual categorization

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      SIGIR '12: Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
      August 2012
      1236 pages
      ISBN:9781450314725
      DOI:10.1145/2348283

      Copyright © 2012 Authors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 12 August 2012

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • poster

      Acceptance Rates

      Overall Acceptance Rate792of3,983submissions,20%
    • Article Metrics

      • Downloads (Last 12 months)1
      • Downloads (Last 6 weeks)0

      Other Metrics

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader