Multi-Task Feature Interaction Learning

Authors:
Kaixiang Lin

Michigan State University, East Lansing, USA

Michigan State University, East Lansing, USA
View Profile

,
Jianpeng Xu

Michigan State University, East Lansing, USA

Michigan State University, East Lansing, USA
View Profile

,
Inci M. Baytas

Michigan State University, East Lansing, USA

Michigan State University, East Lansing, USA
View Profile

,
Shuiwang Ji

Washington State University, Pullman, USA

Washington State University, Pullman, USA
View Profile

,
Jiayu Zhou

Michigan State University, East Lansing, USA

Michigan State University, East Lansing, USA
View Profile

KDD '16: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data MiningAugust 2016Pages 1735–1744https://doi.org/10.1145/2939672.2939834

Published:13 August 2016Publication History

KDD '16: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

Pages 1735–1744

ABSTRACT

One major limitation of linear models is the lack of capability to capture predictive information from interactions between features. While introducing high-order feature interaction terms can overcome this limitation, this approach tremendously increases the model complexity and imposes significant challenges in the learning against overfitting. In this paper, we proposed a novel Multi-Task feature Interaction Learning~(MTIL) framework to exploit the task relatedness from high-order feature interactions, which provides better generalization performance by inductive transfer among tasks via shared representations of feature interactions. We formulate two concrete approaches under this framework and provide efficient algorithms: the shared interaction approach and the embedded interaction approach. The former assumes tasks share the same set of interactions, and the latter assumes feature interactions from multiple tasks come from a shared subspace. We have provided efficient algorithms for solving the two approaches. Extensive empirical studies on both synthetic and real datasets have demonstrated the effectiveness of the proposed framework.

References

A. Argyriou, T. Evgeniou, and M. Pontil. Convex multi-task feature learning. Machine Learning, 73(3):243--272, 2008. Google ScholarDigital Library
B. Bakker and T. Heskes. Task clustering and gating for bayesian multitask learning. The Journal of Machine Learning Research, 4:83--99, 2003. Google ScholarDigital Library
A. Beck and M. Teboulle. A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM journal on imaging sciences, 2(1):183--202, 2009. Google ScholarDigital Library
S. Ben-David and R. Schuller. Exploiting task relatedness for multiple task learning. In Learning Theory and Kernel Machines, pages 567--580. Springer, 2003.Google ScholarCross Ref
J. Bien, J. Taylor, and R. Tibshirani. A lasso for hierarchical interactions. Annals of statistics, 41(3):1111, 2013.Google ScholarCross Ref
S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein. Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations and Trends® in Machine Learning, 3(1):1--122, 2011. Google ScholarDigital Library
R. J. Cadoret, W. R. Yates, G. Woodworth, and M. A. Stewart. Genetic-environmental interaction in the genesis of aggressivity and conduct disorders. Archives of General Psychiatry, 52(11):916--924, 1995.Google ScholarCross Ref
R. Caruana. Multitask learning. Machine learning, 28(1):41--75, 1997. Google ScholarDigital Library
S. Chang, G.-J. Qi, C. C. Aggarwal, J. Zhou, M. Wang, and T. S. Huang. Factorized similarity learning in networks. In ICDM, pages 60--69. IEEE, 2014. Google ScholarDigital Library
X. Chen, X. Shi, X. Xu, Z. Wang, R. Mills, C. Lee, and J. Xu. A two-graph guided multi-task lasso approach for eqtl mapping. In AISTATS, pages 208--217, 2012.Google Scholar
N. H. Choi, W. Li, and J. Zhu. Variable selection with the strong heredity constraint and its oracle property. JASA, 105(489):354--364, 2010.Google ScholarCross Ref
T. C. Eley, K. Sugden, A. Corsico, A. M. Gregory, P. Sham, P. McGuffin, R. Plomin, and I. W. Craig. Gene--environment interaction analysis of serotonin system markers with adolescent depression. Molecular psychiatry, 9(10):908--915, 2004.Google ScholarCross Ref
T. Evgeniou and M. Pontil. Regularized multi--task learning. In SIGKDD, pages 109--117. ACM, 2004. Google ScholarDigital Library
J. Friedman, T. Hastie, and R. Tibshirani. A note on the group lasso and a sparse group lasso. arXiv preprint arXiv:1001.0736, 2010.Google Scholar
S. Gandy, B. Recht, and I. Yamada. Tensor completion and low-n-rank tensor recovery via convex optimization. Inverse Problems, 27(2):025010, 2011.Google ScholarCross Ref
J. Gatt, C. Nemeroff, C. Dobson-Stone, R. Paul, R. Bryant, P. Schofield, E. Gordon, A. Kemp, and L. Williams. Interactions between bdnf val66met polymorphism and early life stress predict brain and arousal pathways to syndromal depression and anxiety. Molecular psychiatry, 14(7):681--695, 2009.Google ScholarCross Ref
P. Gong, C. Zhang, Z. Lu, J. Z. Huang, and J. Ye. A general iterative shrinkage and thresholding algorithm for non-convex regularized optimization problems. In ICML, volume 28, page 37, 2013.Google Scholar
P. Gong, J. Zhou, W. Fan, and J. Ye. Efficient multi-task feature learning with calibration. In SIGKDD, pages 761--770. ACM, 2014. Google ScholarDigital Library
K. Hemminki, J. L. Bermejo, and A. Försti. The balance between heritable and environmental aetiology of human disease. Nature Reviews Genetics, 7(12):958--965, 2006.Google ScholarCross Ref
S. Ji and J. Ye. An accelerated gradient method for trace norm minimization. In ICML, pages 457--464. ACM, 2009. Google ScholarDigital Library
S. Kim and E. P. Xing. Tree-guided group lasso for multi-task regression with structured sparsity. ICML, 2010.Google Scholar
J. Lee, Y. Sun, and M. Saunders. Proximal newton-type methods for convex optimization. In NIPS, pages 836--844, 2012.Google Scholar
Y. Liu, J. Wang, and J. Ye. An efficient algorithm for weak hierarchical lasso. In SIGKDD, pages 283--292. ACM, 2014. Google ScholarDigital Library
G. Obozinski, B. Taskar, and M. I. Jordan. Joint covariate selection and joint subspace selection for multiple classification problems. Statistics and Computing, 20(2):231--252, 2010. Google ScholarDigital Library
N. Parikh and S. P. Boyd. Proximal algorithms. Foundations and Trends in optimization, 1(3):127--239, 2014. Google ScholarDigital Library
P. Radchenko and G. M. James. Variable selection using adaptive nonlinear interaction structures in high dimensions. Journal of the American Statistical Association, 105(492):1541--1553, 2010.Google ScholarCross Ref
B. Romera-Paredes, H. Aung, N. Bianchi-Berthouze, and M. Pontil. Multilinear multitask learning. In ICML, pages 1444--1452, 2013.Google Scholar
M. J. Somers. Organizational commitment, turnover and absenteeism: An examination of direct and interaction effects. Journal of Organizational Behavior, 16(1):49--58, 1995.Google ScholarCross Ref
R. Tibshirani. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B (Methodological), pages 267--288, 1996.Google Scholar
R. Tomioka, K. Hayashi, and H. Kashima. Estimation of low-rank tensors via convex optimization. arXiv preprint arXiv:1010.0789, 2010.Google Scholar
R. Tomioka and T. Suzuki. Convex tensor decomposition via structured schatten norm regularization. In NIPS, pages 1331--1339, 2013.Google Scholar
P. Tseng. Convergence of a block coordinate descent method for nondifferentiable minimization. Journal of optimization theory and applications, 109(3):475--494, 2001. Google ScholarDigital Library
B. A. Turlach, W. N. Venables, and S. J. Wright. Simultaneous variable selection. Technometrics, 47(3):349--363, 2005.Google ScholarCross Ref
S. J. Wright, R. D. Nowak, and M. A. Figueiredo. Sparse reconstruction by separable approximation. Signal Processing, IEEE Transactions on, 57(7):2479--2493, 2009. Google ScholarDigital Library
J. Xu, P.-N. Tan, and L. Luo. Orion: Online regularized multi-task regression and its application to ensemble forecasting. In ICDM, pages 1061--1066. IEEE, 2014. Google ScholarDigital Library
L. Yuan, J. Liu, and J. Ye. Efficient methods for overlapping group lasso. In NIPS, pages 352--360, 2011.Google Scholar
Y. Zhang, D.-Y. Yeung, and Q. Xu. Probabilistic multi-task feature selection. In NIPS, pages 2559--2567, 2010.Google Scholar
J. Zhou, J. Chen, and J. Ye. Clustered multi-task learning via alternating structure optimization. In NIPS, pages 702--710, 2011.Google ScholarDigital Library
J. Zhou, J. Liu, V. A. Narayan, J. Ye, A. D. N. Initiative, et al. Modeling disease progression via multi-task learning. NeuroImage, 78:233--248, 2013.Google ScholarCross Ref

Index Terms

Multi-Task Feature Interaction Learning
1. Computing methodologies
  1. Machine learning
    1. Learning paradigms
      1. Multi-task learning
        Transfer learning
      2. Supervised learning
2. Theory of computation
  1. Theory and algorithms for application domains
    1. Machine learning theory
      1. Structured prediction

Recommendations

Robust multi-task feature learning
KDD '12: Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining

Multi-task learning (MTL) aims to improve the performance of multiple related tasks by exploiting the intrinsic relationships among them. Recently, multi-task feature learning algorithms have received increasing attention and they have been successfully ...
Read More
Multi-stage multi-task feature learning

Multi-task sparse feature learning aims to improve the generalization performance by exploiting the shared features among tasks. It has been successfully applied to many applications including computer vision and biomedical informatics. Most of the ...
Read More
Bounds on the Spectral Norm and the Nuclear Norm of a Tensor Based on Tensor Partitions

It is known that computing the spectral norm and the nuclear norm of a tensor is NP-hard in general. In this paper, we provide neat bounds for the spectral norm and the nuclear norm of a tensor based on tensor partitions. The spectral norm (respectively, ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
KDD '16: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
August 2016
2176 pages
ISBN:9781450342322
DOI:10.1145/2939672
General Chairs:
Balaji Krishnapuram
IBM
,
Mohak Shah
Bosch
,
Program Chairs:
Alex Smola
Amazon
,
Charu Aggarwal
IBM
,
Dou Shen
Baidu
,
Rajeev Rastogi
Amazon
Copyright © 2016 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 13 August 2016
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
feature interaction
muti-task learning
structured regularization
tensor norm
Qualifiers
- research-article
Conference

Acceptance Rates
KDD '16 Paper Acceptance Rate66of1,115submissions,6%Overall Acceptance Rate1,133of8,635submissions,13%
More
Upcoming Conference
KDD '24

Sponsor:

sigkdd

sigkdd

The 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 25 - 29, 2024

Barcelona , Spain
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 22
  Total Citations
  View Citations
- 796
  Total Downloads
- Downloads (Last 12 months)105
- Downloads (Last 6 weeks)14
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Multi-Task Feature Interaction Learning

KDD '16: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

ABSTRACT

References

Cited By

Index Terms

Recommendations

Robust multi-task feature learning

Multi-stage multi-task feature learning

Bounds on the Spectral Norm and the Nuclear Norm of a Tensor Based on Tensor Partitions