ABSTRACT
In many prediction tasks, selecting relevant features is essential for achieving good generalization performance. Most feature selection algorithms consider all features to be a priori equally likely to be relevant. In this paper, we use transfer learning---learning on an ensemble of related tasks---to construct an informative prior on feature relevance. We assume that features themselves have meta-features that are predictive of their relevance to the prediction task, and model their relevance as a function of the meta-features using hyperparameters (called meta-priors). We present a convex optimization algorithm for simultaneously learning the meta-priors and feature weights from an ensemble of related prediction tasks which share a similar relevance structure. Our approach transfers the "meta-priors" among different tasks, which makes it possible to deal with settings where tasks have nonoverlapping features or the relevance of the features vary over the tasks. We show that learning feature relevance improves performance on two real data sets which illustrate such settings: (1) predicting ratings in a collaborative filtering task, and (2) distinguishing arguments of a verb in a sentence.
- Argyriou, A., Evgeniou, T., & Pontil, M. (2006). Multi-task feature learning. Proceeding of NIPS. Cambridge, MA: MIT Press. Google ScholarDigital Library
- Baxter, J. (1997). A bayesian/information theoretic model of learning to learn viamultiple task sampling. Mach. Learn., 28, 7--39. Google ScholarDigital Library
- Baxter, J. (2000). Model for inductive learning. J. of Artificial Intelligence Research.Google Scholar
- Caruana, R. (1997). Multitask learning. Machine Learning, 28, 41--75. Google ScholarDigital Library
- Evgeniou, T., Micchelli, C., & Pontil, M. (2005). Learning multiple tasks with kernel methods. J. Mach. Learn. Res. Google ScholarDigital Library
- Fink, M., Shwatz-Shalev, S., Singer, Y., & Ullman, S. (2006). Online multiclass learning by interclass hypothesis sharing. Proc. 23rd International Conference on Machine Learning. Google ScholarDigital Library
- Gildea, D., & Jurafsky, D. (2002). Automatic labeling of semantic roles. Google ScholarDigital Library
- Heskes, T. (2000). Empirical bayes for learning to learn. Proc. 17th International Conference on Machine Learning. Google ScholarDigital Library
- Kaelbling, L. (2003). JMLR special issue on variable and feature selection.Google Scholar
- Kingsbury, P., Palmer, M., & Marcus, M. (2002). Adding semantic annotation to the penn treebank. Proceedings of the Human Language Technology Conference (HLT'02).Google ScholarDigital Library
- MacKay, D. (1992). Bayesian interpolation. Neural Computation, 4, 415--447. Google ScholarDigital Library
- Marlin, B. (2004). Collaborative filtering: A machine learning perspective.Google Scholar
- McCallum, A., Rosenfeld, R., Mitchell, T., & Ng, A. Y. (1998). Improving text classification by shrinkage in a hierarchy of classes.Google Scholar
- McCullagh, P., & Nelder, J. (1989). Generalized linear models. London: Chapman and Hall.Google Scholar
- Moschitti, A. (2004). A study on convolution kernels for shallow statistic parsing. ACL. Google ScholarDigital Library
- Neal, R. (1995). Bayesian learning for neural networks. Doctoral dissertation. Adviser-Geoffrey Hinton. Google ScholarDigital Library
- Pradhan, S., Hacioglu, K., Krugler, V., Ward, W., Martin, J. H., & Jurafsky, D. (2005). Support vector learning for semantic argument classification. Machine Learning, 60, 11--39. Google ScholarDigital Library
- Raina, R., Ng, A., & Koller, D. (2006). Transfer learning by constructing informative priors. Proc. 21st International Conference on Machine Learning. Google ScholarDigital Library
- Taskar, B., Wong, M., & Koller, D. (2003). Learning on the test data: Leveraging unseen features. Proc. 20th International Conference on Machine Learning.Google Scholar
- Teh, Y., Seeger, M., & Jordan, M. (2005). Semiparameteric latent factor models. Workshop on Artificial Intelligence and Statistics 10.Google Scholar
- Thrun, S. (1996). Is learning the n-th thing any easier than learning the first? Advances in Neural Information Processing Systems (pp. 640--646). The MIT Press.Google Scholar
- Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J. Royal. Statist. Soc B.Google Scholar
- Yu, K., Tresp, V., & Schwaighofer, A. (2005). Learning gaussian processes from multiple tasks. Google ScholarDigital Library
- Yuan, M., & Lin, Y. (2006). Model selection and estimation in regression with grouped variables. J. R. Statist. Soc. B, 68, 49--67.Google ScholarCross Ref
- Zhang, J., Ghahramani, Z., & Yang, Y. (2005). Learning multiple related tasks using latent independent component analysis. Advances in Neural Information Processing Systems 17.Google Scholar
- Learning a meta-level prior for feature relevance from multiple related tasks
Recommendations
Relevance feature mapping for content-based multimedia information retrieval
This paper presents a novel ranking framework for content-based multimedia information retrieval (CBMIR). The framework introduces relevance features and a new ranking scheme. Each relevance feature measures the relevance of an instance with respect to ...
The effect of low-level image features on pseudo relevance feedback
Relevance feedback (RF) is a technique popularly used to improve the effectiveness of traditional content-based image retrieval systems. However, users must provide relevant and/or irrelevant images as feedback for their queries, which is a tedious ...
Improved AdaBoost-based image retrieval with relevance feedback via paired feature learning
Boost learning algorithm, such as AdaBoost, has been widely used in a variety of applications in multimedia and computer vision. Relevance feedback-based image retrieval has been formulated as a classification problem with a small number of training ...
Comments