skip to main content
survey
Public Access

Feature Selection: A Data Perspective

Published:06 December 2017Publication History
Skip Abstract Section

Abstract

Feature selection, as a data preprocessing strategy, has been proven to be effective and efficient in preparing data (especially high-dimensional data) for various data-mining and machine-learning problems. The objectives of feature selection include building simpler and more comprehensible models, improving data-mining performance, and preparing clean, understandable data. The recent proliferation of big data has presented some substantial challenges and opportunities to feature selection. In this survey, we provide a comprehensive and structured overview of recent advances in feature selection research. Motivated by current challenges and opportunities in the era of big data, we revisit feature selection research from a data perspective and review representative feature selection algorithms for conventional data, structured data, heterogeneous data and streaming data. Methodologically, to emphasize the differences and similarities of most existing feature selection algorithms for conventional data, we categorize them into four main groups: similarity-based, information-theoretical-based, sparse-learning-based, and statistical-based methods. To facilitate and promote the research in this community, we also present an open source feature selection repository that consists of most of the popular feature selection algorithms (http://featureselection.asu.edu/). Also, we use it as an example to show how to evaluate feature selection algorithms. At the end of the survey, we present a discussion about some open problems and challenges that require more attention in future research.

References

  1. Thomas Abeel, Thibault Helleputte, Yves Van de Peer, Pierre Dupont, and Yvan Saeys. 2010. Robust biomarker identification for cancer diagnosis with ensemble feature selection methods. Bioinformatics 26, 3 (2010), 392--398. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Edoardo M. Airoldi, David M. Blei, Stephen E. Fienberg, and Eric P. Xing. 2009. Mixed membership stochastic blockmodels. In NIPS. 33--40. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Salem Alelyani, Huan Liu, and Lei Wang. 2011. The effect of the characteristics of the dataset on the selection stability. In ICTAI. 970--977. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Salem Alelyani, Jiliang Tang, and Huan Liu. 2013. Feature selection for clustering: A review. Data Clustering: Algorithms and Applications 29 (2013).Google ScholarGoogle Scholar
  5. Jun Chin Ang, Andri Mirzal, Habibollah Haron, and Haza Nuzly Abdull Hamed. 2016. Supervised, unsupervised, and semi-supervised feature selection: A review on gene selection. IEEE/ACM TCBB 13, 5 (2016), 971--989. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Hiromasa Arai, Crystal Maung, Ke Xu, and Haim Schweitzer. 2016. Unsupervised feature selection by heuristic search with provable bounds on suboptimality. In AAAI. 666--672. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Francis R. Bach. 2008. Consistency of the group lasso and multiple kernel learning. J. Mach. Learn. Res. 9 (2008), 1179--1225. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Lars Backstrom and Jure Leskovec. 2011. Supervised random walks: Predicting and recommending links in social networks. In WSDM. 635--644. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Roberto Battiti. 1994. Using mutual information for selecting features in supervised neural net learning. IEEE Trans. Neural Network. 5, 4 (1994), 537--550. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Mustafa Bilgic, Lilyana Mihalkova, and Lise Getoor. 2010. Active learning for networked data. In ICML. 79--86. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Stephen Boyd and Lieven Vandenberghe. 2004. Convex Optimization. Cambridge University Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Gavin Brown, Adam Pocock, Ming-Jie Zhao, and Mikel Luján. 2012. Conditional likelihood maximisation: A unifying framework for information-theoretic feature selection. J. Mach. Learn. Res. 13, 1 (2012), 27--66. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Deng Cai, Chiyuan Zhang, and Xiaofei He. 2010. Unsupervised feature selection for multi-cluster data. In KDD. 333--342. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Xiao Cai, Feiping Nie, and Heng Huang. 2013. Exact top-k feature selection via &ell;<sub>2,0</sub>-norm constraint. In IJCAI. 1240--1246. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Girish Chandrashekar and Ferat Sahin. 2014. A survey on feature selection methods. Comput. Electr. Eng. 40, 1 (2014), 16--28. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Xiaojun Chang, Feiping Nie, Yi Yang, and Heng Huang. 2014. A convex formulation for semi-supervised multi-label feature selection. In AAAI. 1171--1177. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Chen Chen, Hanghang Tong, Lei Xie, Lei Ying, and Qing He. 2016. FASCINATE: Fast cross-layer dependency inference on multi-layered networks. In KDD. 765--774. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Kewei Cheng, Jundong Li, and Huan Liu. 2016. FeatureMiner: A tool for interactive feature selection. In CIKM. 2445--2448. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Kewei Cheng, Jundong Li, and Huan Liu. 2017. Unsupervised feature selection in signed social networks. In KDD. 777--786. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Alexandre d’Aspremont, Laurent El Ghaoui, Michael I. Jordan, and Gert R. G. Lanckriet. 2007. A direct formulation for sparse PCA using semidefinite programming. SIAM Rev. 49, 3 (2007), 434--448. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. John C. Davis and Robert J. Sampson. 1986. Statistics and Data Analysis in Geology. Vol. 646. Wiley. New York.Google ScholarGoogle Scholar
  22. Chris Ding, Ding Zhou, Xiaofeng He, and Hongyuan Zha. 2006. R 1-PCA: Rotational invariant -norm principal component analysis for robust subspace factorization. In ICML. 281--288. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Liang Du and Yi-Dong Shen. 2015. Unsupervised feature selection with adaptive structure learning. In KDD. 209--218. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Liang Du, Zhiyong Shen, Xuan Li, Peng Zhou, and Yi-Dong Shen. 2013. Local and global discriminative learning for unsupervised feature selection. In ICDM. 131--140.Google ScholarGoogle Scholar
  25. Richard O. Duda, Peter E. Hart, and David G. Stork. 2012. Pattern Classification. John Wiley 8 Sons.Google ScholarGoogle Scholar
  26. Janusz Dutkowski and Anna Gambin. 2007. On consensus biomarker selection. BMC Bioinform. 8, 5 (2007), S5.Google ScholarGoogle ScholarCross RefCross Ref
  27. Ali El Akadi, Abdeljalil El Ouardighi, and Driss Aboutajdine. 2008. A powerful feature selection approach based on mutual information. Int. J. Comput. Sci. Netw. Secur. 8, 4 (2008), 116.Google ScholarGoogle Scholar
  28. Jianqing Fan, Richard Samworth, and Yichao Wu. 2009. Ultrahigh dimensional feature selection: Beyond the linear model. J. Mach. Learn. Res. 10 (2009), 2013--2038. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Ahmed K. Farahat, Ali Ghodsi, and Mohamed S. Kamel. 2011. An efficient greedy method for unsupervised feature selection. In ICDM. 161--170. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Christiane Fellbaum. 1998. WordNet. Wiley Online Library.Google ScholarGoogle Scholar
  31. Yinfu Feng, Jun Xiao, Yueting Zhuang, and Xiaoming Liu. 2013. Adaptive unsupervised multi-view feature selection for visual concept recognition. In ACCV. 343--357. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. François Fleuret. 2004. Fast binary feature selection with conditional mutual information. JMLR 5 (2004), 1531--1555. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Jerome Friedman, Trevor Hastie, and Robert Tibshirani. 2010. A note on the group lasso and a sparse group lasso. arXiv preprint arXiv:1001.0736 (2010).Google ScholarGoogle Scholar
  34. Keinosuke Fukunaga. 2013. Introduction to Statistical Pattern Recognition. Academic Press.Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Shuyang Gao, Greg Ver Steeg, and Aram Galstyan. 2016. Variational information maximization for feature selection. In NIPS. 487--495. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. C. W. Gini. 1912. Variability and mutability, contribution to the study of statistical distribution and relaitons. Studi Economico-Giuricici Della R (1912).Google ScholarGoogle Scholar
  37. David E. Golberg. 1989. Genetic algorithms in search, optimization, and machine learning. Addison-Wesley. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Quanquan Gu, Marina Danilevsky, Zhenhui Li, and Jiawei Han. 2012. Locality preserving feature learning. In AISTATS. 477--485.Google ScholarGoogle Scholar
  39. Quanquan Gu and Jiawei Han. 2011. Towards feature selection in network. In CIKM. 1175--1184. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Quanquan Gu, Zhenhui Li, and Jiawei Han. 2011a. Correlated multi-label feature selection. In CIKM. ACM, 1087--1096. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Quanquan Gu, Zhenhui Li, and Jiawei Han. 2011b. Generalized fisher score for feature selection. In UAI. 266--273. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Quanquan Gu, Zhenhui Li, and Jiawei Han. 2011c. Joint feature selection and subspace learning. In IJCAI. 1294--1299. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Baofeng Guo and Mark S. Nixon. 2009. Gait feature subset selection by mutual information. IEEE TMSC(A) 39, 1 (2009), 36--46. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Isabelle Guyon and André Elisseeff. 2003. An introduction to variable and feature selection. JMLR 3 (2003), 1157--1182. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Isabelle Guyon, Steve Gunn, Masoud Nikravesh, and Lofti A Zadeh. 2008. Feature Extraction: Foundations and Applications. Springer. Google ScholarGoogle Scholar
  46. Mark A. Hall and Lloyd A. Smith. 1999. Feature selection for machine learning: Comparing a correlation-based filter approach to the wrapper. In FLAIRS. 235--239. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Satoshi Hara and Takanori Maehara. 2017. Enumerate lasso solutions for feature selection. In AAAI. 1985--1991.Google ScholarGoogle Scholar
  48. Trevor Hastie, Robert Tibshirani, Jerome Friedman, and James Franklin. 2005. The elements of statistical learning: Data mining, inference and prediction. Math. Intell. 27, 2 (2005), 83--85.Google ScholarGoogle ScholarCross RefCross Ref
  49. Xiaofei He, Deng Cai, and Partha Niyogi. 2005. Laplacian score for feature selection. In NIPS. 507--514. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Zengyou He and Weichuan Yu. 2010. Stable feature selection for biomarker discovery. Comput. Biol. Chem. 34, 4 (2010), 215--225. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Chenping Hou, Feiping Nie, Dongyun Yi, and Yi Wu. 2011. Feature selection via joint embedding learning and sparse regression. In IJCAI. 1324--1329. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Xia Hu, Jiliang Tang, Huiji Gao, and Huan Liu. 2013. ActNeT: Active learning for networked texts in microblogging. In SDM. 306--314.Google ScholarGoogle Scholar
  53. Hao Huang, Shinjae Yoo, and S Kasiviswanathan. 2015. Unsupervised feature selection on data streams. In CIKM. 1031--1040. Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. Junzhou Huang, Tong Zhang, and Dimitris Metaxas. 2011. Learning with structured sparsity. J. Mach. Learn. Res. 12 (2011), 3371--3412. Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Laurent Jacob, Guillaume Obozinski, and Jean-Philippe Vert. 2009. Group lasso with overlap and graph lasso. In ICML. 433--440. Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. Aleks Jakulin. 2005. Machine Learning Based on Attribute Interactions. Ph.D. Dissertation. Univerza v Ljubljani.Google ScholarGoogle Scholar
  57. Rodolphe Jenatton, Jean-Yves Audibert, and Francis Bach. 2011. Structured variable selection with sparsity-inducing norms. J. Mach. Learn. Res. 12 (2011), 2777--2824. Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. Rodolphe Jenatton, Julien Mairal, Francis R. Bach, and Guillaume R. Obozinski. 2010. Proximal methods for sparse hierarchical dictionary learning. In ICML. 487--494. Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. Ling Jian, Jundong Li, Kai Shu, and Huan Liu. 2016. Multi-label informed feature selection. In IJCAI. 1627--1633. Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. Yi Jiang and Jiangtao Ren. 2011. Eigenvalue sensitive feature selection. In ICML. 89--96. Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. Alexandros Kalousis, Julien Prados, and Melanie Hilario. 2007. Stability of feature selection algorithms: A study on high-dimensional spaces. Knowl. Inf. Syst. 12, 1 (2007), 95--116. Google ScholarGoogle ScholarDigital LibraryDigital Library
  62. Seyoung Kim and Eric P Xing. 2009. Statistical estimation of correlated genome associations to a quantitative trait network. PLoS Genet. 5, 8 (2009).Google ScholarGoogle Scholar
  63. Seyoung Kim and Eric P Xing. 2010. Tree-guided group lasso for multi-task regression with structured sparsity. In ICML. 543--550. Google ScholarGoogle ScholarDigital LibraryDigital Library
  64. Kenji Kira and Larry A. Rendell. 1992. A practical approach to feature selection. In ICML Workshop. 249--256. Google ScholarGoogle ScholarDigital LibraryDigital Library
  65. Ron Kohavi and George H. John. 1997. Wrappers for feature subset selection. Artif. Intell. 97, 1 (1997), 273--324. Google ScholarGoogle ScholarDigital LibraryDigital Library
  66. Daphne Koller and Mehran Sahami. 1995. Toward optimal feature selection. In ICML. 284--292. Google ScholarGoogle ScholarDigital LibraryDigital Library
  67. Gert R. G. Lanckriet, Nello Cristianini, Peter Bartlett, Laurent El Ghaoui, and Michael I. Jordan. 2004. Learning the kernel matrix with semidefinite programming. J. Mach. Learn. Res. 5 (2004), 27--72. Google ScholarGoogle ScholarDigital LibraryDigital Library
  68. David D. Lewis. 1992. Feature selection and feature extraction for text categorization. In Proceedings of the Workshop on Speech and Natural Language. 212--217. Google ScholarGoogle ScholarDigital LibraryDigital Library
  69. Jundong Li, Harsh Dani, Xia Hu, and Huan Liu. 2017. Radar: Residual analysis for anomaly detection in attributed networks. In IJCAI. 2152--2158. Google ScholarGoogle ScholarDigital LibraryDigital Library
  70. Jundong Li, Xia Hu, Ling Jian, and Huan Liu. 2016. Toward time-evolving feature selection on dynamic networks. In ICDM. 1003--1008.Google ScholarGoogle Scholar
  71. Jundong Li, Xia Hu, Jiliang Tang, and Huan Liu. 2015. Unsupervised streaming feature selection in social media. In CIKM. 1041--1050. Google ScholarGoogle ScholarDigital LibraryDigital Library
  72. Jundong Li, Xia Hu, Liang Wu, and Huan Liu. 2016. Robust unsupervised feature selection on networked data. In SDM. 387--395.Google ScholarGoogle Scholar
  73. Jundong Li and Huan Liu. 2017. Challenges of feature selection for big data analytics. IEEE Intell. Syst. 32, 2 (2017), 9--15. Google ScholarGoogle ScholarDigital LibraryDigital Library
  74. Jundong Li, Jiliang Tang, and Huan Liu. 2017a. Reconstruction-based unsupervised feature selection: An embedded approach. In IJCAI. 2159--2165. Google ScholarGoogle ScholarDigital LibraryDigital Library
  75. Jundong Li, Liang Wu, Osmar R. Zaïane, and Huan Liu. 2017b. Toward personalized relational learning. In SDM. 444--452.Google ScholarGoogle Scholar
  76. Yifeng Li, Chih-Yu Chen, and Wyeth W. Wasserman. 2015. Deep feature selection: Theory and application to identify enhancers and promoters. In RECOMB. 205--217.Google ScholarGoogle Scholar
  77. Zechao Li, Yi Yang, Jing Liu, Xiaofang Zhou, and Hanqing Lu. 2012. Unsupervised feature selection using nonnegative spectral analysis. In AAAI. 1026--1032. Google ScholarGoogle ScholarDigital LibraryDigital Library
  78. David Liben-Nowell and Jon Kleinberg. 2007. The link-prediction problem for social networks. J. Assist Inf. Sci. Technol. 58, 7 (2007), 1019--1031. Google ScholarGoogle ScholarDigital LibraryDigital Library
  79. Dahua Lin and Xiaoou Tang. 2006. Conditional infomax learning: An integrated framework for feature extraction and fusion. In ECCV. 68--82. Google ScholarGoogle ScholarDigital LibraryDigital Library
  80. Hongfu Liu, Haiyi Mao, and Yun Fu. 2016a. Robust multi-view feature selection. In ICDM. 281--290.Google ScholarGoogle Scholar
  81. Huan Liu and Hiroshi Motoda. 2007. Computational Methods of Feature Selection. CRC Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  82. Huan Liu and Rudy Setiono. 1995. Chi2: Feature selection and discretization of numeric attributes. In ICTAI. 388--391. Google ScholarGoogle ScholarDigital LibraryDigital Library
  83. Hongfu Liu, Ming Shao, and Yun Fu. 2016b. Consensus guided unsupervised feature selection. In AAAI. 1874--1880. Google ScholarGoogle ScholarDigital LibraryDigital Library
  84. Jun Liu, Shuiwang Ji, and Jieping Ye. 2009a. Multi-task feature learning via efficient &ell;<sub>2,0</sub>-norm minimization. In UAI. 339--348. Google ScholarGoogle ScholarDigital LibraryDigital Library
  85. Jun Liu, Shuiwang Ji, and Jieping Ye. 2009b. SLEP: Sparse Learning with Efficient Projections. Arizona State University. Retrieved from http://www.public.asu.edu/&sim;jye02/Software/SLEP.Google ScholarGoogle Scholar
  86. Jun Liu and Jieping Ye. 2010. Moreau-Yosida regularization for grouped tree structure learning. In NIPS. 1459--1467. Google ScholarGoogle ScholarDigital LibraryDigital Library
  87. Xinwang Liu, Lei Wang, Jian Zhang, Jianping Yin, and Huan Liu. 2014. Global and local structure preservation for feature selection. Trans. Neur. Netw. Learn. Syst. 25, 6 (2014), 1083--1095.Google ScholarGoogle ScholarCross RefCross Ref
  88. Bo Long, Zhongfei Mark Zhang, Xiaoyun Wu, and Philip S. Yu. 2006. Spectral clustering for multi-type relational data. In ICML. 585--592. Google ScholarGoogle ScholarDigital LibraryDigital Library
  89. Bo Long, Zhongfei Mark Zhang, and Philip S Yu. 2007. A probabilistic framework for relational clustering. In KDD. 470--479. Google ScholarGoogle ScholarDigital LibraryDigital Library
  90. Steven Loscalzo, Lei Yu, and Chris Ding. 2009. Consensus group stable feature selection. In KDD. 567--576. Google ScholarGoogle ScholarDigital LibraryDigital Library
  91. Shuangge Ma, Xiao Song, and Jian Huang. 2007. Supervised group Lasso with applications to microarray data analysis. BMC Bioinf. 8, 1 (2007), 60.Google ScholarGoogle ScholarCross RefCross Ref
  92. Sofus A Macskassy and Foster Provost. 2007. Classification in networked data: A toolkit and a univariate case study. J. Mach. Learn. Res. 8 (2007), 935--983. Google ScholarGoogle ScholarDigital LibraryDigital Library
  93. Peter V. Marsden and Noah E Friedkin. 1993. Network studies of social influence. Sociol. Methods Res. 22, 1 (1993), 127--151.Google ScholarGoogle ScholarCross RefCross Ref
  94. Mahdokht Masaeli, Yan Yan, Ying Cui, Glenn Fung, and Jennifer G. Dy. 2010. Convex principal feature selection. In SDM. 619--628.Google ScholarGoogle Scholar
  95. Crystal Maung and Haim Schweitzer. 2013. Pass-efficient unsupervised feature selection. In NIPS. 1628--1636. Google ScholarGoogle ScholarDigital LibraryDigital Library
  96. James McAuley, Ji Ming, Darryl Stewart, and Philip Hanna. 2005. Subband correlation and robust speech recognition. IEEE Trans. Speech Audio Process. 13, 5 (2005), 956--964.Google ScholarGoogle ScholarCross RefCross Ref
  97. Miller McPherson, Lynn Smith-Lovin, and James M Cook. 2001. Birds of a feather: Homophily in social networks. Ann. Rev. Sociol. (2001), 415--444.Google ScholarGoogle Scholar
  98. Lukas Meier, Sara Van De Geer, and Peter Bühlmann. 2008. The group lasso for logistic regression. J. Roy. Stat. Soc. B 70, 1 (2008), 53--71.Google ScholarGoogle ScholarCross RefCross Ref
  99. Patrick E. Meyer and Gianluca Bontempi. 2006. On the use of variable complementarity for feature selection in cancer classification. In Applications of Evolutionary Computing. 91--102. Google ScholarGoogle ScholarDigital LibraryDigital Library
  100. Patrick Emmanuel Meyer, Colas Schretter, and Gianluca Bontempi. 2008. Information-theoretic feature selection in microarray data using variable complementarity. IEEE J. Select. Top. Sign. Process. 2, 3 (2008), 261--274.Google ScholarGoogle ScholarCross RefCross Ref
  101. Patrenahalli M Narendra and Keinosuke Fukunaga. 1977. A branch and bound algorithm for feature subset selection. IEEE Trans. Comput. 100, 9 (1977), 917--922. Google ScholarGoogle ScholarDigital LibraryDigital Library
  102. Michael Netzer, Gunda Millonig, Melanie Osl, Bernhard Pfeifer, Siegfried Praun, Johannes Villinger, Wolfgang Vogel, and Christian Baumgartner. 2009. A new ensemble-based algorithm for identifying breath gas marker candidates in liver disease using ion molecule reaction mass spectrometry. Bioinformatics 25, 7 (2009), 941--947. Google ScholarGoogle ScholarDigital LibraryDigital Library
  103. Xuan Vinh Nguyen, Jeffrey Chan, Simone Romano, and James Bailey. 2014. Effective global approaches for mutual information based feature selection. In KDD. 512--521. Google ScholarGoogle ScholarDigital LibraryDigital Library
  104. Feiping Nie, Heng Huang, Xiao Cai, and Chris H Ding. 2010. Efficient and robust feature selection via joint -norms minimization. In NIPS. 1813--1821. Google ScholarGoogle ScholarDigital LibraryDigital Library
  105. Feiping Nie, Shiming Xiang, Yangqing Jia, Changshui Zhang, and Shuicheng Yan. 2008. Trace ratio criterion for feature selection. In AAAI. 671--676. Google ScholarGoogle ScholarDigital LibraryDigital Library
  106. Feiping Nie, Wei Zhu, Xuelong Li, and others. 2016. Unsupervised feature selection with structured graph optimization. In AAAI. 1302--1308. Google ScholarGoogle ScholarDigital LibraryDigital Library
  107. Guillaume Obozinski, Ben Taskar, and Michael Jordan. 2007. Joint Covariate Selection for Grouped Classification. Technical Report. Technical Report, Statistics Department, UC Berkeley.Google ScholarGoogle Scholar
  108. Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss, Vincent Dubourg, and others. 2011. Scikit-learn: Machine learning in python. J. Mach. Learn. Res. 12, Oct (2011), 2825--2830. Google ScholarGoogle ScholarDigital LibraryDigital Library
  109. Hanyang Peng and Yong Fan. 2016. Direct sparsity optimization based feature selection for multi-class classification. In IJCAI. 1918--1924. Google ScholarGoogle ScholarDigital LibraryDigital Library
  110. Hanyang Peng and Yong Fan. 2017. A general framework for sparsity regularized feature selection via iteratively reweighted least square minimization. In AAAI. 2471--2477.Google ScholarGoogle Scholar
  111. Hanchuan Peng, Fuhui Long, and Chris Ding. 2005. Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans. Pattern Anal. Mach. Intell. 27, 8 (2005), 1226--1238. Google ScholarGoogle ScholarDigital LibraryDigital Library
  112. Jie Peng, Ji Zhu, Anna Bergamaschi, Wonshik Han, Dong-Young Noh, Jonathan R Pollack, and Pei Wang. 2010. Regularized multivariate regression for identifying master predictors with application to integrative genomics study of breast cancer. Ann. Appl. Stat. 4, 1 (2010), 53.Google ScholarGoogle ScholarCross RefCross Ref
  113. Simon Perkins, Kevin Lacker, and James Theiler. 2003. Grafting: Fast, incremental feature selection by gradient descent in function space. J. Mach. Learn. Res. 3 (2003), 1333--1356. Google ScholarGoogle ScholarDigital LibraryDigital Library
  114. Simon Perkins and James Theiler. 2003. Online feature selection using grafting. In ICML. 592--599. Google ScholarGoogle ScholarDigital LibraryDigital Library
  115. Mingjie Qian and Chengxiang Zhai. 2013. Robust unsupervised feature selection. In IJCAI. 1621--1627. Google ScholarGoogle ScholarDigital LibraryDigital Library
  116. Ariadna Quattoni, Xavier Carreras, Michael Collins, and Trevor Darrell. 2009. An efficient projection for regularization. In ICML. 857--864. Google ScholarGoogle ScholarDigital LibraryDigital Library
  117. Marko Robnik-Šikonja and Igor Kononenko. 2003. Theoretical and empirical analysis of relieff and rrelieff. Mach. Learn. 53, 1-2 (2003), 23--69. Google ScholarGoogle ScholarDigital LibraryDigital Library
  118. Debaditya Roy, K Sri Rama Murty, and C Krishna Mohan. 2015. Feature selection using deep neural networks. In IJCNN. 1--6.Google ScholarGoogle Scholar
  119. Yvan Saeys, Thomas Abeel, and Yves Van de Peer. 2008. Robust feature selection using ensemble feature selection techniques. In ECMLPKDD (2008), 313--325.Google ScholarGoogle ScholarCross RefCross Ref
  120. Yvan Saeys, Iñaki Inza, and Pedro Larrañaga. 2007. A review of feature selection techniques in bioinformatics. Bioinformatics 23, 19 (2007), 2507--2517. Google ScholarGoogle ScholarDigital LibraryDigital Library
  121. Ted Sandler, John Blitzer, Partha P. Talukdar, and Lyle H. Ungar. 2009. Regularized learning with networks of features. In NIPS. 1401--1408. Google ScholarGoogle ScholarDigital LibraryDigital Library
  122. Prithviraj Sen, Galileo Namata, Mustafa Bilgic, Lise Getoor, Brian Galligher, and Tina Eliassi-Rad. 2008. Collective classification in network data. AI Mag. 29, 3 (2008), 93.Google ScholarGoogle ScholarDigital LibraryDigital Library
  123. Qiang Shen, Ren Diao, and Pan Su. 2012. Feature selection ensemble. Turing-100 10 (2012), 289--306.Google ScholarGoogle Scholar
  124. Jianbo Shi and Jitendra Malik. 2000. Normalized cuts and image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 22, 8 (2000), 888--905. Google ScholarGoogle ScholarDigital LibraryDigital Library
  125. Lei Shi, Liang Du, and Yi-Dong Shen. 2014. Robust spectral learning for unsupervised feature selection. In ICDM. 977--982. Google ScholarGoogle ScholarDigital LibraryDigital Library
  126. Alexander Shishkin, Anastasia Bezzubtseva, Alexey Drutsa, Ilia Shishkov, Ekaterina Gladkikh, Gleb Gusev, and Pavel Serdyukov. 2016. Efficient high-order interaction-aware feature selection based on conditional mutual information. In NIPS. 4637--4645. Google ScholarGoogle ScholarDigital LibraryDigital Library
  127. Sameer Singh, Jeremy Kubica, Scott Larsen, and Daria Sorokina. 2009. Parallel large scale feature selection for logistic regression. In SDM. 1172--1183.Google ScholarGoogle Scholar
  128. Mingkui Tan, Ivor W Tsang, and Li Wang. 2014. Towards ultrahigh dimensional feature selection for big data. J. Mach. Learn. Res. 15, 1 (2014), 1371--1429. Google ScholarGoogle ScholarDigital LibraryDigital Library
  129. Jiliang Tang, Salem Alelyani, and Huan Liu. 2014. Feature selection for classification: A review. Data Classification: Algorithms and Applications (2014), 37.Google ScholarGoogle Scholar
  130. Jiliang Tang, Xia Hu, Huiji Gao, and Huan Liu. 2013. Unsupervised feature selection for multi-view data in social media. In SDM. 270--278.Google ScholarGoogle Scholar
  131. Jiliang Tang, Xia Hu, Huiji Gao, and Huan Liu. 2014. Discriminant analysis for unsupervised feature selection. In SDM. 938--946.Google ScholarGoogle Scholar
  132. Jiliang Tang and Huan Liu. 2012a. Feature selection with linked data in social media. In SDM. 118--128.Google ScholarGoogle Scholar
  133. Jiliang Tang and Huan Liu. 2012b. Unsupervised feature selection for linked social media data. In KDD. 904--912. Google ScholarGoogle ScholarDigital LibraryDigital Library
  134. Jiliang Tang and Huan Liu. 2013. Coselect: Feature selection with instance selection for social media data. In SDM. 695--703.Google ScholarGoogle Scholar
  135. Lei Tang and Huan Liu. 2009. Relational learning via latent social dimensions. In KDD. 817--826. Google ScholarGoogle ScholarDigital LibraryDigital Library
  136. Robert Tibshirani. 1996. Regression shrinkage and selection via the lasso. J. Roy. Stat. Soc. B (1996), 267--288.Google ScholarGoogle Scholar
  137. Robert Tibshirani, Michael Saunders, Saharon Rosset, Ji Zhu, and Keith Knight. 2005. Sparsity and smoothness via the fused lasso. J. Roy. Stat. Soc. B 67, 1 (2005), 91--108.Google ScholarGoogle ScholarCross RefCross Ref
  138. Robert Tibshirani, Guenther Walther, and Trevor Hastie. 2001. Estimating the number of clusters in a data set via the gap statistic. J. Roy. Stat. Soc. B 63, 2 (2001), 411--423.Google ScholarGoogle ScholarCross RefCross Ref
  139. William T. Vetterling, Saul A. Teukolsky, and William H. Press. 1992. Numerical Recipes: Example Book (C). Press Syndicate of the University of Cambridge. Google ScholarGoogle ScholarDigital LibraryDigital Library
  140. Michel Vidal-Naquet and Shimon Ullman. 2003. Object recognition with informative features and linear classification. In ICCV. 281--288. Google ScholarGoogle ScholarDigital LibraryDigital Library
  141. Hua Wang, Feiping Nie, and Heng Huang. 2013. Multi-view clustering and feature learning via structured sparsity. In ICML. 352--360. Google ScholarGoogle ScholarDigital LibraryDigital Library
  142. Huan Wang, Shuicheng Yan, Dong Xu, Xiaoou Tang, and Thomas Huang. 2007. Trace ratio vs. ratio trace for dimensionality reduction. In CVPR. 1--8.Google ScholarGoogle Scholar
  143. Jie Wang and Jieping Ye. 2015. Multi-layer feature reduction for tree structured group lasso via hierarchical projection. In NIPS. 1279--1287. Google ScholarGoogle ScholarDigital LibraryDigital Library
  144. Jialei Wang, Peilin Zhao, Steven C. H. Hoi, and Rong Jin. 2014b. Online feature selection and its applications. IEEE TKDE 26, 3 (2014), 698--710. Google ScholarGoogle ScholarDigital LibraryDigital Library
  145. Qian Wang, Jiaxing Zhang, Sen Song, and Zheng Zhang. 2014a. Attentional neural network: Feature selection using cognitive feedback. In NIPS. 2033--2041. Google ScholarGoogle ScholarDigital LibraryDigital Library
  146. Xiaokai Wei, Bokai Cao, and Philip S. Yu. 2016a. Nonlinear joint unsupervised feature selection. In SDM. 414--422.Google ScholarGoogle ScholarDigital LibraryDigital Library
  147. Xiaokai Wei, Bokai Cao, and Philip S. Yu. 2016b. Unsupervised feature selection on networks: A generative view. In AAAI. 2215--2221. Google ScholarGoogle ScholarDigital LibraryDigital Library
  148. Xiaokai Wei, Sihong Xie, and Philip S. Yu. 2015. Efficient partial order preserving unsupervised feature selection on networks. In SDM. 82--90.Google ScholarGoogle Scholar
  149. Xiaokai Wei and Philip S. Yu. 2016. Unsupervised feature selection by preserving stochastic neighbors. In AISTATS. 995--1003.Google ScholarGoogle ScholarDigital LibraryDigital Library
  150. Liang Wu, Jundong Li, Xia Hu, and Huan Liu. 2017. Gleaning wisdom from the past: Early detection of emerging rumors in social media. In SDM. SIAM, 99--107.Google ScholarGoogle Scholar
  151. Xindong Wu, Kui Yu, Hao Wang, and Wei Ding. 2010. Online streaming feature selection. In ICML. 1159--1166. Google ScholarGoogle ScholarDigital LibraryDigital Library
  152. Zhixiang Xu, Gao Huang, Kilian Q. Weinberger, and Alice X. Zheng. 2014. Gradient boosted feature selection. In KDD. 522--531. Google ScholarGoogle ScholarDigital LibraryDigital Library
  153. Makoto Yamada, Avishek Saha, Hua Ouyang, Dawei Yin, and Yi Chang. 2014. N3LARS: Minimum redundancy maximum relevance feature selection for large and high-dimensional data. arXiv preprint arXiv:1411.2331 (2014).Google ScholarGoogle Scholar
  154. Feng Yang and K. Z. Mao. 2011. Robust feature selection for microarray data based on multicriterion fusion. IEEE/ACM Trans. Comput. Biol. Bioinform. 8, 4 (2011), 1080--1092. Google ScholarGoogle ScholarDigital LibraryDigital Library
  155. Howard Hua Yang and John E. Moody. 1999. Data visualization and feature selection: New algorithms for nongaussian data. In NIPS. 687--693. Google ScholarGoogle ScholarDigital LibraryDigital Library
  156. Sen Yang, Lei Yuan, Ying-Cheng Lai, Xiaotong Shen, Peter Wonka, and Jieping Ye. 2012. Feature grouping and selection over an undirected graph. In KDD. 922--930. Google ScholarGoogle ScholarDigital LibraryDigital Library
  157. Yi Yang, Heng Tao Shen, Zhigang Ma, Zi Huang, and Xiaofang Zhou. 2011. &ell;<sub>2,0</sub>-norm regularized discriminative feature selection for unsupervised learning. In IJCAI. 1589--1594. Google ScholarGoogle ScholarDigital LibraryDigital Library
  158. Yi Yang, Dong Xu, Feiping Nie, Shuicheng Yan, and Yueting Zhuang. 2010. Image clustering using local discriminant models and global integration. IEEE Trans. Inf. Process. 19, 10 (2010), 2761--2773. Google ScholarGoogle ScholarDigital LibraryDigital Library
  159. Yee Hwa Yang, Yuanyuan Xiao, and Mark R. Segal. 2005. Identifying differentially expressed genes from microarray experiments via statistic synthesis. Bioinformatics 21, 7 (2005), 1084--1093. Google ScholarGoogle ScholarDigital LibraryDigital Library
  160. Jieping Ye and Jun Liu. 2012. Sparse methods for biomedical data. ACM SIGKDD Explor. Newslett. 14, 1 (2012), 4--15. Google ScholarGoogle ScholarDigital LibraryDigital Library
  161. Kui Yu, Xindong Wu, Wei Ding, and Jian Pei. 2014. Towards scalable and accurate online feature selection for big data. In ICDM. 660--669. Google ScholarGoogle ScholarDigital LibraryDigital Library
  162. Lei Yu and Huan Liu. 2003. Feature selection for high-dimensional data: A fast correlation-based filter solution. In ICML. 856--863. Google ScholarGoogle ScholarDigital LibraryDigital Library
  163. Stella X. Yu and Jianbo Shi. 2003. Multiclass spectral clustering. In ICCV. 313--319. Google ScholarGoogle ScholarDigital LibraryDigital Library
  164. Lei Yuan, Jun Liu, and Jieping Ye. 2011. Efficient methods for overlapping group lasso. In NIPS. 352--360. Google ScholarGoogle ScholarDigital LibraryDigital Library
  165. Ming Yuan and Yi Lin. 2006. Model selection and estimation in regression with grouped variables. J. Roy Stat. Soc. B 68, 1 (2006), 49--67.Google ScholarGoogle ScholarCross RefCross Ref
  166. Sepehr Abbasi Zadeh, Mehrdad Ghadiri, Vahab S. Mirrokni, and Morteza Zadimoghaddam. 2017. Scalable feature selection via distributed diversity maximization. In AAAI. 2876--2883.Google ScholarGoogle Scholar
  167. Jian Zhang, Zoubin Ghahramani, and Yiming Yang. 2008. Flexible latent variable models for multi-task learning. Mach. Learn. 73, 3 (2008), 221--242. Google ScholarGoogle ScholarDigital LibraryDigital Library
  168. Miao Zhang, Chris H. Q. Ding, Ya Zhang, and Feiping Nie. 2014. Feature selection at the discrete limit. In AAAI. 1355--1361. Google ScholarGoogle ScholarDigital LibraryDigital Library
  169. Qin Zhang, Peng Zhang, Guodong Long, Wei Ding, Chengqi Zhang, and Xindong Wu. 2015. Towards mining trapezoidal data streams. In ICDM. 1111--1116. Google ScholarGoogle ScholarDigital LibraryDigital Library
  170. Lei Zhao, Qinghua Hu, and Wenwu Wang. 2015. Heterogeneous feature selection with multi-modal deep neural networks and sparse group lasso. IEEE Trans. Multimedia 17, 11 (2015), 1936--1948.Google ScholarGoogle ScholarDigital LibraryDigital Library
  171. Peng Zhao, Guilherme Rocha, and Bin Yu. 2009. The composite absolute penalties family for grouped and hierarchical variable selection. The Annals of Statistics (2009), 3468--3497.Google ScholarGoogle Scholar
  172. Zhou Zhao, Xiaofei He, Deng Cai, Lijun Zhang, Wilfred Ng, and Yueting Zhuang. 2016. Graph regularized feature selection with data reconstruction. IEEE Trans. Knowl. Data Eng. 28, 3 (2016), 689--700. Google ScholarGoogle ScholarDigital LibraryDigital Library
  173. Zheng Zhao and Huan Liu. 2007. Spectral feature selection for supervised and unsupervised learning. In ICML. 1151--1157. Google ScholarGoogle ScholarDigital LibraryDigital Library
  174. Zheng Zhao and Huan Liu. 2008. Multi-source feature selection via geometry-dependent covariance analysis. In FSDM. 36--47. Google ScholarGoogle ScholarDigital LibraryDigital Library
  175. Zheng Zhao, Lei Wang, Huan Liu, and others. 2010. Efficient spectral feature selection with minimum redundancy. In AAAI. 673--678. Google ScholarGoogle ScholarDigital LibraryDigital Library
  176. Zheng Zhao, Ruiwen Zhang, James Cox, David Duling, and Warren Sarle. 2013. Massively parallel feature selection: An approach based on variance preservation. Mach. Learn. 92, 1 (2013), 195--220. Google ScholarGoogle ScholarDigital LibraryDigital Library
  177. Jing Zhou, Dean Foster, Robert Stine, and Lyle Ungar. 2005. Streaming feature selection using alpha-investing. In KDD. 384--393. Google ScholarGoogle ScholarDigital LibraryDigital Library
  178. Jiayu Zhou, Jun Liu, Vaibhav A Narayan, and Jieping Ye. 2012. Modeling disease progression via fused sparse group lasso. In KDD. 1095--1103. Google ScholarGoogle ScholarDigital LibraryDigital Library
  179. Yao Zhou and Jingrui He. 2017. A randomized approach for crowdsourcing in the presence of multiple views. In ICDM.Google ScholarGoogle Scholar
  180. Zhi-Hua Zhou. 2012. Ensemble Methods: Foundations and Algorithms. CRC Press. Google ScholarGoogle ScholarCross RefCross Ref
  181. Ji Zhu, Saharon Rosset, Robert Tibshirani, and Trevor J. Hastie. 2004. 1-norm support vector machines. In NIPS. 49--56. Google ScholarGoogle ScholarDigital LibraryDigital Library
  182. Pengfei Zhu, Qinghua Hu, Changqing Zhang, and Wangmeng Zuo. 2016. Coupled dictionary learning for unsupervised feature selection. In AAAI. 2422--2428. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Feature Selection: A Data Perspective

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Computing Surveys
      ACM Computing Surveys  Volume 50, Issue 6
      November 2018
      752 pages
      ISSN:0360-0300
      EISSN:1557-7341
      DOI:10.1145/3161158
      • Editor:
      • Sartaj Sahni
      Issue’s Table of Contents

      Copyright © 2017 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 6 December 2017
      • Accepted: 1 August 2017
      • Revised: 1 July 2017
      • Received: 1 September 2016
      Published in csur Volume 50, Issue 6

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • survey
      • Research
      • Refereed

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader