skip to main content
10.1145/1143844.1143934acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicmlConference Proceedingsconference-collections
Article

Constructing informative priors using transfer learning

Published:25 June 2006Publication History

ABSTRACT

Many applications of supervised learning require good generalization from limited labeled data. In the Bayesian setting, we can try to achieve this goal by using an informative prior over the parameters, one that encodes useful domain knowledge. Focusing on logistic regression, we present an algorithm for automatically constructing a multivariate Gaussian prior with a full covariance matrix for a given supervised learning task. This prior relaxes a commonly used but overly simplistic independence assumption, and allows parameters to be dependent. The algorithm uses other "similar" learning problems to estimate the covariance of pairs of individual parameters. We then use a semidefinite program to combine these estimates and learn a good prior for the current learning task. We apply our methods to binary text classification, and demonstrate a 20 to 40% test error reduction over a commonly used prior.

References

  1. Ando, R. K., & Zhang, T. (2005). A framework for learning predictive structures from multiple tasks and unlabeled data. Journal of Machine Learning Research, 6, 1817--1853. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Baxter, J. (1997). A bayesian/information theoretic model of learning to learn via multiple task sampling. Machine Learning, 28, 7--39. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Ben-David, S., & Schuller, R. (2003). Exploiting task relatedness for multiple task learning. COLT.Google ScholarGoogle Scholar
  4. Caruana, R. (1997). Multitask learning. Machine Learning, 28, 41--75. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Chung, F. (1997). Spectral graph theory. Regional Conference Series in Mathematics, American Mathematical Society, 92, 1--212.Google ScholarGoogle Scholar
  6. Efron, B. (1979). Bootstrap methods: Another look at the jackknife. In The Annals of Statistics, vol. 7, 1--26.Google ScholarGoogle ScholarCross RefCross Ref
  7. Lang, K. (1995). Newsweeder: learning to filter net-news. ICML.Google ScholarGoogle Scholar
  8. Lawrence, N. D., & Platt, J. C. (2004). Learning to learn with the informative vector machine. ICML. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Miller, G. A. (1995). Wordnet: A lexical database for English. Commun. ACM, 38, 39--41. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Ng, A. Y., Jordan, M., & Weiss, Y. (2002). On spectral clustering: Analysis and an algorithm. NIPS.Google ScholarGoogle Scholar
  11. Nigam, K., Lafferty, J., & McCallum, A. (1999). Using maximum entropy for text classification. IJ-CAI Workshop on Machine Learning for Information Filtering.Google ScholarGoogle Scholar
  12. Thrun, S. (1996). Is learning the n-th thing any easier than learning the first? NIPS.Google ScholarGoogle Scholar
  13. Yu, K., Tresp, V., & Schwaighofer, A. (2005). Learning gaussian processes from multiple tasks. ICML. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Constructing informative priors using transfer learning

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Other conferences
          ICML '06: Proceedings of the 23rd international conference on Machine learning
          June 2006
          1154 pages
          ISBN:1595933832
          DOI:10.1145/1143844

          Copyright © 2006 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 25 June 2006

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • Article

          Acceptance Rates

          ICML '06 Paper Acceptance Rate140of548submissions,26%Overall Acceptance Rate140of548submissions,26%

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader