skip to main content
10.1145/1273496.1273558acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicmlConference Proceedingsconference-collections
Article

Learning a meta-level prior for feature relevance from multiple related tasks

Published:20 June 2007Publication History

ABSTRACT

In many prediction tasks, selecting relevant features is essential for achieving good generalization performance. Most feature selection algorithms consider all features to be a priori equally likely to be relevant. In this paper, we use transfer learning---learning on an ensemble of related tasks---to construct an informative prior on feature relevance. We assume that features themselves have meta-features that are predictive of their relevance to the prediction task, and model their relevance as a function of the meta-features using hyperparameters (called meta-priors). We present a convex optimization algorithm for simultaneously learning the meta-priors and feature weights from an ensemble of related prediction tasks which share a similar relevance structure. Our approach transfers the "meta-priors" among different tasks, which makes it possible to deal with settings where tasks have nonoverlapping features or the relevance of the features vary over the tasks. We show that learning feature relevance improves performance on two real data sets which illustrate such settings: (1) predicting ratings in a collaborative filtering task, and (2) distinguishing arguments of a verb in a sentence.

References

  1. Argyriou, A., Evgeniou, T., & Pontil, M. (2006). Multi-task feature learning. Proceeding of NIPS. Cambridge, MA: MIT Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Baxter, J. (1997). A bayesian/information theoretic model of learning to learn viamultiple task sampling. Mach. Learn., 28, 7--39. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Baxter, J. (2000). Model for inductive learning. J. of Artificial Intelligence Research.Google ScholarGoogle Scholar
  4. Caruana, R. (1997). Multitask learning. Machine Learning, 28, 41--75. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Evgeniou, T., Micchelli, C., & Pontil, M. (2005). Learning multiple tasks with kernel methods. J. Mach. Learn. Res. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Fink, M., Shwatz-Shalev, S., Singer, Y., & Ullman, S. (2006). Online multiclass learning by interclass hypothesis sharing. Proc. 23rd International Conference on Machine Learning. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Gildea, D., & Jurafsky, D. (2002). Automatic labeling of semantic roles. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Heskes, T. (2000). Empirical bayes for learning to learn. Proc. 17th International Conference on Machine Learning. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Kaelbling, L. (2003). JMLR special issue on variable and feature selection.Google ScholarGoogle Scholar
  10. Kingsbury, P., Palmer, M., & Marcus, M. (2002). Adding semantic annotation to the penn treebank. Proceedings of the Human Language Technology Conference (HLT'02).Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. MacKay, D. (1992). Bayesian interpolation. Neural Computation, 4, 415--447. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Marlin, B. (2004). Collaborative filtering: A machine learning perspective.Google ScholarGoogle Scholar
  13. McCallum, A., Rosenfeld, R., Mitchell, T., & Ng, A. Y. (1998). Improving text classification by shrinkage in a hierarchy of classes.Google ScholarGoogle Scholar
  14. McCullagh, P., & Nelder, J. (1989). Generalized linear models. London: Chapman and Hall.Google ScholarGoogle Scholar
  15. Moschitti, A. (2004). A study on convolution kernels for shallow statistic parsing. ACL. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Neal, R. (1995). Bayesian learning for neural networks. Doctoral dissertation. Adviser-Geoffrey Hinton. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Pradhan, S., Hacioglu, K., Krugler, V., Ward, W., Martin, J. H., & Jurafsky, D. (2005). Support vector learning for semantic argument classification. Machine Learning, 60, 11--39. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Raina, R., Ng, A., & Koller, D. (2006). Transfer learning by constructing informative priors. Proc. 21st International Conference on Machine Learning. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Taskar, B., Wong, M., & Koller, D. (2003). Learning on the test data: Leveraging unseen features. Proc. 20th International Conference on Machine Learning.Google ScholarGoogle Scholar
  20. Teh, Y., Seeger, M., & Jordan, M. (2005). Semiparameteric latent factor models. Workshop on Artificial Intelligence and Statistics 10.Google ScholarGoogle Scholar
  21. Thrun, S. (1996). Is learning the n-th thing any easier than learning the first? Advances in Neural Information Processing Systems (pp. 640--646). The MIT Press.Google ScholarGoogle Scholar
  22. Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J. Royal. Statist. Soc B.Google ScholarGoogle Scholar
  23. Yu, K., Tresp, V., & Schwaighofer, A. (2005). Learning gaussian processes from multiple tasks. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Yuan, M., & Lin, Y. (2006). Model selection and estimation in regression with grouped variables. J. R. Statist. Soc. B, 68, 49--67.Google ScholarGoogle ScholarCross RefCross Ref
  25. Zhang, J., Ghahramani, Z., & Yang, Y. (2005). Learning multiple related tasks using latent independent component analysis. Advances in Neural Information Processing Systems 17.Google ScholarGoogle Scholar
  1. Learning a meta-level prior for feature relevance from multiple related tasks

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Other conferences
      ICML '07: Proceedings of the 24th international conference on Machine learning
      June 2007
      1233 pages
      ISBN:9781595937933
      DOI:10.1145/1273496

      Copyright © 2007 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 20 June 2007

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • Article

      Acceptance Rates

      Overall Acceptance Rate140of548submissions,26%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader