ABSTRACT
Many applications of supervised learning require good generalization from limited labeled data. In the Bayesian setting, we can try to achieve this goal by using an informative prior over the parameters, one that encodes useful domain knowledge. Focusing on logistic regression, we present an algorithm for automatically constructing a multivariate Gaussian prior with a full covariance matrix for a given supervised learning task. This prior relaxes a commonly used but overly simplistic independence assumption, and allows parameters to be dependent. The algorithm uses other "similar" learning problems to estimate the covariance of pairs of individual parameters. We then use a semidefinite program to combine these estimates and learn a good prior for the current learning task. We apply our methods to binary text classification, and demonstrate a 20 to 40% test error reduction over a commonly used prior.
- Ando, R. K., & Zhang, T. (2005). A framework for learning predictive structures from multiple tasks and unlabeled data. Journal of Machine Learning Research, 6, 1817--1853. Google ScholarDigital Library
- Baxter, J. (1997). A bayesian/information theoretic model of learning to learn via multiple task sampling. Machine Learning, 28, 7--39. Google ScholarDigital Library
- Ben-David, S., & Schuller, R. (2003). Exploiting task relatedness for multiple task learning. COLT.Google Scholar
- Caruana, R. (1997). Multitask learning. Machine Learning, 28, 41--75. Google ScholarDigital Library
- Chung, F. (1997). Spectral graph theory. Regional Conference Series in Mathematics, American Mathematical Society, 92, 1--212.Google Scholar
- Efron, B. (1979). Bootstrap methods: Another look at the jackknife. In The Annals of Statistics, vol. 7, 1--26.Google ScholarCross Ref
- Lang, K. (1995). Newsweeder: learning to filter net-news. ICML.Google Scholar
- Lawrence, N. D., & Platt, J. C. (2004). Learning to learn with the informative vector machine. ICML. Google ScholarDigital Library
- Miller, G. A. (1995). Wordnet: A lexical database for English. Commun. ACM, 38, 39--41. Google ScholarDigital Library
- Ng, A. Y., Jordan, M., & Weiss, Y. (2002). On spectral clustering: Analysis and an algorithm. NIPS.Google Scholar
- Nigam, K., Lafferty, J., & McCallum, A. (1999). Using maximum entropy for text classification. IJ-CAI Workshop on Machine Learning for Information Filtering.Google Scholar
- Thrun, S. (1996). Is learning the n-th thing any easier than learning the first? NIPS.Google Scholar
- Yu, K., Tresp, V., & Schwaighofer, A. (2005). Learning gaussian processes from multiple tasks. ICML. Google ScholarDigital Library
Index Terms
- Constructing informative priors using transfer learning
Recommendations
Bayesian learning of Bayesian networks with informative priors
This paper presents and evaluates an approach to Bayesian model averaging where the models are Bayesian nets (BNs). A comprehensive study of the literature on structural priors for BNs is conducted. A number of prior distributions are defined using ...
Informative priors in Bayesian inference and computation
The use of prior distributions is often a controversial topic in Bayesian inference. Informative priors are often avoided at all costs. However, when prior information is available, informative priors are appropriate means of introducing this information ...
Constructing informative Bayesian map priors
The problem of simultaneous localisation and mapping SLAM has been addressed in numerous ways with different approaches aiming to produce faster, more robust solutions that yield consistent maps. This focus, however, has resulted in a number of ...
Comments