ABSTRACT
One major limitation of linear models is the lack of capability to capture predictive information from interactions between features. While introducing high-order feature interaction terms can overcome this limitation, this approach tremendously increases the model complexity and imposes significant challenges in the learning against overfitting. In this paper, we proposed a novel Multi-Task feature Interaction Learning~(MTIL) framework to exploit the task relatedness from high-order feature interactions, which provides better generalization performance by inductive transfer among tasks via shared representations of feature interactions. We formulate two concrete approaches under this framework and provide efficient algorithms: the shared interaction approach and the embedded interaction approach. The former assumes tasks share the same set of interactions, and the latter assumes feature interactions from multiple tasks come from a shared subspace. We have provided efficient algorithms for solving the two approaches. Extensive empirical studies on both synthetic and real datasets have demonstrated the effectiveness of the proposed framework.
- A. Argyriou, T. Evgeniou, and M. Pontil. Convex multi-task feature learning. Machine Learning, 73(3):243--272, 2008. Google ScholarDigital Library
- B. Bakker and T. Heskes. Task clustering and gating for bayesian multitask learning. The Journal of Machine Learning Research, 4:83--99, 2003. Google ScholarDigital Library
- A. Beck and M. Teboulle. A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM journal on imaging sciences, 2(1):183--202, 2009. Google ScholarDigital Library
- S. Ben-David and R. Schuller. Exploiting task relatedness for multiple task learning. In Learning Theory and Kernel Machines, pages 567--580. Springer, 2003.Google ScholarCross Ref
- J. Bien, J. Taylor, and R. Tibshirani. A lasso for hierarchical interactions. Annals of statistics, 41(3):1111, 2013.Google ScholarCross Ref
- S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein. Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations and Trends® in Machine Learning, 3(1):1--122, 2011. Google ScholarDigital Library
- R. J. Cadoret, W. R. Yates, G. Woodworth, and M. A. Stewart. Genetic-environmental interaction in the genesis of aggressivity and conduct disorders. Archives of General Psychiatry, 52(11):916--924, 1995.Google ScholarCross Ref
- R. Caruana. Multitask learning. Machine learning, 28(1):41--75, 1997. Google ScholarDigital Library
- S. Chang, G.-J. Qi, C. C. Aggarwal, J. Zhou, M. Wang, and T. S. Huang. Factorized similarity learning in networks. In ICDM, pages 60--69. IEEE, 2014. Google ScholarDigital Library
- X. Chen, X. Shi, X. Xu, Z. Wang, R. Mills, C. Lee, and J. Xu. A two-graph guided multi-task lasso approach for eqtl mapping. In AISTATS, pages 208--217, 2012.Google Scholar
- N. H. Choi, W. Li, and J. Zhu. Variable selection with the strong heredity constraint and its oracle property. JASA, 105(489):354--364, 2010.Google ScholarCross Ref
- T. C. Eley, K. Sugden, A. Corsico, A. M. Gregory, P. Sham, P. McGuffin, R. Plomin, and I. W. Craig. Gene--environment interaction analysis of serotonin system markers with adolescent depression. Molecular psychiatry, 9(10):908--915, 2004.Google ScholarCross Ref
- T. Evgeniou and M. Pontil. Regularized multi--task learning. In SIGKDD, pages 109--117. ACM, 2004. Google ScholarDigital Library
- J. Friedman, T. Hastie, and R. Tibshirani. A note on the group lasso and a sparse group lasso. arXiv preprint arXiv:1001.0736, 2010.Google Scholar
- S. Gandy, B. Recht, and I. Yamada. Tensor completion and low-n-rank tensor recovery via convex optimization. Inverse Problems, 27(2):025010, 2011.Google ScholarCross Ref
- J. Gatt, C. Nemeroff, C. Dobson-Stone, R. Paul, R. Bryant, P. Schofield, E. Gordon, A. Kemp, and L. Williams. Interactions between bdnf val66met polymorphism and early life stress predict brain and arousal pathways to syndromal depression and anxiety. Molecular psychiatry, 14(7):681--695, 2009.Google ScholarCross Ref
- P. Gong, C. Zhang, Z. Lu, J. Z. Huang, and J. Ye. A general iterative shrinkage and thresholding algorithm for non-convex regularized optimization problems. In ICML, volume 28, page 37, 2013.Google Scholar
- P. Gong, J. Zhou, W. Fan, and J. Ye. Efficient multi-task feature learning with calibration. In SIGKDD, pages 761--770. ACM, 2014. Google ScholarDigital Library
- K. Hemminki, J. L. Bermejo, and A. Försti. The balance between heritable and environmental aetiology of human disease. Nature Reviews Genetics, 7(12):958--965, 2006.Google ScholarCross Ref
- S. Ji and J. Ye. An accelerated gradient method for trace norm minimization. In ICML, pages 457--464. ACM, 2009. Google ScholarDigital Library
- S. Kim and E. P. Xing. Tree-guided group lasso for multi-task regression with structured sparsity. ICML, 2010.Google Scholar
- J. Lee, Y. Sun, and M. Saunders. Proximal newton-type methods for convex optimization. In NIPS, pages 836--844, 2012.Google Scholar
- Y. Liu, J. Wang, and J. Ye. An efficient algorithm for weak hierarchical lasso. In SIGKDD, pages 283--292. ACM, 2014. Google ScholarDigital Library
- G. Obozinski, B. Taskar, and M. I. Jordan. Joint covariate selection and joint subspace selection for multiple classification problems. Statistics and Computing, 20(2):231--252, 2010. Google ScholarDigital Library
- N. Parikh and S. P. Boyd. Proximal algorithms. Foundations and Trends in optimization, 1(3):127--239, 2014. Google ScholarDigital Library
- P. Radchenko and G. M. James. Variable selection using adaptive nonlinear interaction structures in high dimensions. Journal of the American Statistical Association, 105(492):1541--1553, 2010.Google ScholarCross Ref
- B. Romera-Paredes, H. Aung, N. Bianchi-Berthouze, and M. Pontil. Multilinear multitask learning. In ICML, pages 1444--1452, 2013.Google Scholar
- M. J. Somers. Organizational commitment, turnover and absenteeism: An examination of direct and interaction effects. Journal of Organizational Behavior, 16(1):49--58, 1995.Google ScholarCross Ref
- R. Tibshirani. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B (Methodological), pages 267--288, 1996.Google Scholar
- R. Tomioka, K. Hayashi, and H. Kashima. Estimation of low-rank tensors via convex optimization. arXiv preprint arXiv:1010.0789, 2010.Google Scholar
- R. Tomioka and T. Suzuki. Convex tensor decomposition via structured schatten norm regularization. In NIPS, pages 1331--1339, 2013.Google Scholar
- P. Tseng. Convergence of a block coordinate descent method for nondifferentiable minimization. Journal of optimization theory and applications, 109(3):475--494, 2001. Google ScholarDigital Library
- B. A. Turlach, W. N. Venables, and S. J. Wright. Simultaneous variable selection. Technometrics, 47(3):349--363, 2005.Google ScholarCross Ref
- S. J. Wright, R. D. Nowak, and M. A. Figueiredo. Sparse reconstruction by separable approximation. Signal Processing, IEEE Transactions on, 57(7):2479--2493, 2009. Google ScholarDigital Library
- J. Xu, P.-N. Tan, and L. Luo. Orion: Online regularized multi-task regression and its application to ensemble forecasting. In ICDM, pages 1061--1066. IEEE, 2014. Google ScholarDigital Library
- L. Yuan, J. Liu, and J. Ye. Efficient methods for overlapping group lasso. In NIPS, pages 352--360, 2011.Google Scholar
- Y. Zhang, D.-Y. Yeung, and Q. Xu. Probabilistic multi-task feature selection. In NIPS, pages 2559--2567, 2010.Google Scholar
- J. Zhou, J. Chen, and J. Ye. Clustered multi-task learning via alternating structure optimization. In NIPS, pages 702--710, 2011.Google ScholarDigital Library
- J. Zhou, J. Liu, V. A. Narayan, J. Ye, A. D. N. Initiative, et al. Modeling disease progression via multi-task learning. NeuroImage, 78:233--248, 2013.Google ScholarCross Ref
Index Terms
- Multi-Task Feature Interaction Learning
Recommendations
Robust multi-task feature learning
KDD '12: Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data miningMulti-task learning (MTL) aims to improve the performance of multiple related tasks by exploiting the intrinsic relationships among them. Recently, multi-task feature learning algorithms have received increasing attention and they have been successfully ...
Multi-stage multi-task feature learning
Multi-task sparse feature learning aims to improve the generalization performance by exploiting the shared features among tasks. It has been successfully applied to many applications including computer vision and biomedical informatics. Most of the ...
Bounds on the Spectral Norm and the Nuclear Norm of a Tensor Based on Tensor Partitions
It is known that computing the spectral norm and the nuclear norm of a tensor is NP-hard in general. In this paper, we provide neat bounds for the spectral norm and the nuclear norm of a tensor based on tensor partitions. The spectral norm (respectively, ...
Comments