skip to main content
10.1145/2939672.2939834acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article
Public Access

Multi-Task Feature Interaction Learning

Published:13 August 2016Publication History

ABSTRACT

One major limitation of linear models is the lack of capability to capture predictive information from interactions between features. While introducing high-order feature interaction terms can overcome this limitation, this approach tremendously increases the model complexity and imposes significant challenges in the learning against overfitting. In this paper, we proposed a novel Multi-Task feature Interaction Learning~(MTIL) framework to exploit the task relatedness from high-order feature interactions, which provides better generalization performance by inductive transfer among tasks via shared representations of feature interactions. We formulate two concrete approaches under this framework and provide efficient algorithms: the shared interaction approach and the embedded interaction approach. The former assumes tasks share the same set of interactions, and the latter assumes feature interactions from multiple tasks come from a shared subspace. We have provided efficient algorithms for solving the two approaches. Extensive empirical studies on both synthetic and real datasets have demonstrated the effectiveness of the proposed framework.

References

  1. A. Argyriou, T. Evgeniou, and M. Pontil. Convex multi-task feature learning. Machine Learning, 73(3):243--272, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. B. Bakker and T. Heskes. Task clustering and gating for bayesian multitask learning. The Journal of Machine Learning Research, 4:83--99, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. A. Beck and M. Teboulle. A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM journal on imaging sciences, 2(1):183--202, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. S. Ben-David and R. Schuller. Exploiting task relatedness for multiple task learning. In Learning Theory and Kernel Machines, pages 567--580. Springer, 2003.Google ScholarGoogle ScholarCross RefCross Ref
  5. J. Bien, J. Taylor, and R. Tibshirani. A lasso for hierarchical interactions. Annals of statistics, 41(3):1111, 2013.Google ScholarGoogle ScholarCross RefCross Ref
  6. S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein. Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations and Trends® in Machine Learning, 3(1):1--122, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. R. J. Cadoret, W. R. Yates, G. Woodworth, and M. A. Stewart. Genetic-environmental interaction in the genesis of aggressivity and conduct disorders. Archives of General Psychiatry, 52(11):916--924, 1995.Google ScholarGoogle ScholarCross RefCross Ref
  8. R. Caruana. Multitask learning. Machine learning, 28(1):41--75, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. S. Chang, G.-J. Qi, C. C. Aggarwal, J. Zhou, M. Wang, and T. S. Huang. Factorized similarity learning in networks. In ICDM, pages 60--69. IEEE, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. X. Chen, X. Shi, X. Xu, Z. Wang, R. Mills, C. Lee, and J. Xu. A two-graph guided multi-task lasso approach for eqtl mapping. In AISTATS, pages 208--217, 2012.Google ScholarGoogle Scholar
  11. N. H. Choi, W. Li, and J. Zhu. Variable selection with the strong heredity constraint and its oracle property. JASA, 105(489):354--364, 2010.Google ScholarGoogle ScholarCross RefCross Ref
  12. T. C. Eley, K. Sugden, A. Corsico, A. M. Gregory, P. Sham, P. McGuffin, R. Plomin, and I. W. Craig. Gene--environment interaction analysis of serotonin system markers with adolescent depression. Molecular psychiatry, 9(10):908--915, 2004.Google ScholarGoogle ScholarCross RefCross Ref
  13. T. Evgeniou and M. Pontil. Regularized multi--task learning. In SIGKDD, pages 109--117. ACM, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. J. Friedman, T. Hastie, and R. Tibshirani. A note on the group lasso and a sparse group lasso. arXiv preprint arXiv:1001.0736, 2010.Google ScholarGoogle Scholar
  15. S. Gandy, B. Recht, and I. Yamada. Tensor completion and low-n-rank tensor recovery via convex optimization. Inverse Problems, 27(2):025010, 2011.Google ScholarGoogle ScholarCross RefCross Ref
  16. J. Gatt, C. Nemeroff, C. Dobson-Stone, R. Paul, R. Bryant, P. Schofield, E. Gordon, A. Kemp, and L. Williams. Interactions between bdnf val66met polymorphism and early life stress predict brain and arousal pathways to syndromal depression and anxiety. Molecular psychiatry, 14(7):681--695, 2009.Google ScholarGoogle ScholarCross RefCross Ref
  17. P. Gong, C. Zhang, Z. Lu, J. Z. Huang, and J. Ye. A general iterative shrinkage and thresholding algorithm for non-convex regularized optimization problems. In ICML, volume 28, page 37, 2013.Google ScholarGoogle Scholar
  18. P. Gong, J. Zhou, W. Fan, and J. Ye. Efficient multi-task feature learning with calibration. In SIGKDD, pages 761--770. ACM, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. K. Hemminki, J. L. Bermejo, and A. Försti. The balance between heritable and environmental aetiology of human disease. Nature Reviews Genetics, 7(12):958--965, 2006.Google ScholarGoogle ScholarCross RefCross Ref
  20. S. Ji and J. Ye. An accelerated gradient method for trace norm minimization. In ICML, pages 457--464. ACM, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. S. Kim and E. P. Xing. Tree-guided group lasso for multi-task regression with structured sparsity. ICML, 2010.Google ScholarGoogle Scholar
  22. J. Lee, Y. Sun, and M. Saunders. Proximal newton-type methods for convex optimization. In NIPS, pages 836--844, 2012.Google ScholarGoogle Scholar
  23. Y. Liu, J. Wang, and J. Ye. An efficient algorithm for weak hierarchical lasso. In SIGKDD, pages 283--292. ACM, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. G. Obozinski, B. Taskar, and M. I. Jordan. Joint covariate selection and joint subspace selection for multiple classification problems. Statistics and Computing, 20(2):231--252, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. N. Parikh and S. P. Boyd. Proximal algorithms. Foundations and Trends in optimization, 1(3):127--239, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. P. Radchenko and G. M. James. Variable selection using adaptive nonlinear interaction structures in high dimensions. Journal of the American Statistical Association, 105(492):1541--1553, 2010.Google ScholarGoogle ScholarCross RefCross Ref
  27. B. Romera-Paredes, H. Aung, N. Bianchi-Berthouze, and M. Pontil. Multilinear multitask learning. In ICML, pages 1444--1452, 2013.Google ScholarGoogle Scholar
  28. M. J. Somers. Organizational commitment, turnover and absenteeism: An examination of direct and interaction effects. Journal of Organizational Behavior, 16(1):49--58, 1995.Google ScholarGoogle ScholarCross RefCross Ref
  29. R. Tibshirani. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B (Methodological), pages 267--288, 1996.Google ScholarGoogle Scholar
  30. R. Tomioka, K. Hayashi, and H. Kashima. Estimation of low-rank tensors via convex optimization. arXiv preprint arXiv:1010.0789, 2010.Google ScholarGoogle Scholar
  31. R. Tomioka and T. Suzuki. Convex tensor decomposition via structured schatten norm regularization. In NIPS, pages 1331--1339, 2013.Google ScholarGoogle Scholar
  32. P. Tseng. Convergence of a block coordinate descent method for nondifferentiable minimization. Journal of optimization theory and applications, 109(3):475--494, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. B. A. Turlach, W. N. Venables, and S. J. Wright. Simultaneous variable selection. Technometrics, 47(3):349--363, 2005.Google ScholarGoogle ScholarCross RefCross Ref
  34. S. J. Wright, R. D. Nowak, and M. A. Figueiredo. Sparse reconstruction by separable approximation. Signal Processing, IEEE Transactions on, 57(7):2479--2493, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. J. Xu, P.-N. Tan, and L. Luo. Orion: Online regularized multi-task regression and its application to ensemble forecasting. In ICDM, pages 1061--1066. IEEE, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. L. Yuan, J. Liu, and J. Ye. Efficient methods for overlapping group lasso. In NIPS, pages 352--360, 2011.Google ScholarGoogle Scholar
  37. Y. Zhang, D.-Y. Yeung, and Q. Xu. Probabilistic multi-task feature selection. In NIPS, pages 2559--2567, 2010.Google ScholarGoogle Scholar
  38. J. Zhou, J. Chen, and J. Ye. Clustered multi-task learning via alternating structure optimization. In NIPS, pages 702--710, 2011.Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. J. Zhou, J. Liu, V. A. Narayan, J. Ye, A. D. N. Initiative, et al. Modeling disease progression via multi-task learning. NeuroImage, 78:233--248, 2013.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Multi-Task Feature Interaction Learning

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          KDD '16: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
          August 2016
          2176 pages
          ISBN:9781450342322
          DOI:10.1145/2939672

          Copyright © 2016 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 13 August 2016

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          KDD '16 Paper Acceptance Rate66of1,115submissions,6%Overall Acceptance Rate1,133of8,635submissions,13%

          Upcoming Conference

          KDD '24

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader