skip to main content
10.1145/3205651.3208227acmconferencesArticle/Chapter ViewAbstractPublication PagesgeccoConference Proceedingsconference-collections
research-article

Embedded feature selection using probabilistic model-based optimization

Published: 06 July 2018 Publication History

Abstract

In machine learning, feature selection is a commonly used technique for improving the predictive performance and interpretability of a trained model. Feature selection techniques are classified into three approaches: the filter, wrapper, and embedded approaches. The embedded approach performs the feature selection process during the model training and achieves a good balance between performance and computational cost in general. In the paper, we propose a novel embedded feature selection method using probabilistic model-based evolutionary optimization. We introduce the multivariate Bernoulli distribution, which determines the selection of features, and we optimize its parameters during the training. The distribution parameter update rule is the same as that of the population-based incremental learning (PBIL), but we simultaneously update the parameters of the machine learning model using an ordinary gradient descent method. This method can be easily implemented into non-linear models, such as neural networks. Moreover, we incorporate the penalty term into the objective function to control the number of selected feature. We apply the proposed method with the neural network model to the feature selection of three classification problems. The proposed method achieves competitive performance and reasonable computational cost compared with conventional feature selection methods.

References

[1]
Shun-ichi Amari. 1998. Natural Gradient Works Efficiently in Learning. Neural Computation 10, 2 (1998), 251--276.
[2]
Shumeet Baluja. 1994. Population based incremental learning: A method for integrating genetic search based function optimization and competitve learning. CMU-CS-94-163, Carnegie-Mellon Univ Pittsburgh Pa Dept Of Computer Science (1994).
[3]
Aaron-Defazio, Francis Bach, and Simon Lacoste-Julien. 2014. SAGA: A Fast Incremental Gradient Method With Support for Non-Strongly Convex Composite Objectives. In Proceedings of Advances In Neural Information Processing Systems (NIPS) 27.
[4]
Isabelle Guyon and André Elisseeff. 2003. An Introduction to Variable and Feature Selection. Journal of Machine Learning Research 3 (2003), 1157--1182.
[5]
Georges R. Harik, Fernando G. Lobo, and David E. Goldberg. 1999. The compact genetic algorithm. IEEE Transactions on Evolutionary Computation 3, 4 (1999). 287--297.
[6]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2015. Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification. In 2015 IEEE International Conference on Computer Vision (ICCV). IEEE, 1026--1034.
[7]
Jonathan J. Hull. 1994. A Database for Handwritten Text Recognition Research. IEEE Transactions on Pattern Analysis and Machine Intelligence 16, 5 (1994), 550--554.
[8]
Sergey Ioffe and Christian Szegedy. 2015. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. In Proceedings of the 32nd International Conference on Machine Learning (ICML), Vol. PMLR37. 448--456.
[9]
Diederik P. Kingma and Jimmy Ba. 2014. Adam: A Method for Stochastic Optimization. In Proceedings of the 3rd International Conference on Learning Representations (ICLR).
[10]
Ron Kohavi. 1996. Scaling Up the Accuracy of Naive-Bayes Classifiers: A Decision-tree Hybrid. In Proceedings of the Second International Conference on Knowledge Discovery and Data Mining (KDD'96). AAAI Press, 202--207.
[11]
Ken Lang. 1995. NewsWeeder: Learning to Filter Netnews. In Machine Learning Proceedings 1995, Armand Prieditis and Stuart Russell (Eds.). Morgan Kaufmann, San Francisco (CA), 331--339.
[12]
Fan Li, Yiming Yang, and Eric P Xing. 2006. From Lasso regression to Feature vector machine. Advances In Neural Information Processing Systems (NIPS) 18 (2006), 779--786.
[13]
Vinod Nair and Geoffrey E Hinton. 2010. Rectified Linear Units Improve Restricted Boltzmann Machines. In Proceedings of the 27th International Conference on Machine Learning (ICML). 807--814.
[14]
Yann Ollivier, Ludovic Arnold, Anne Auger, and Nikolaus Hansen. 2017. Information-Geometric Optimization Algorithms: A Unifying Picture via Invariance Principles. Journal of Machine Learning Research 18, 18 (2017), 1--65.
[15]
Hanchuan Peng, Fuhui Long, and Chris Ding. 2005. Feature selection based on mutual information: Criteria of Max-Dependency, Max-Relevance, and Min-Redundancy. IEEE Transactions on Pattern Analysis and Machine Intelligence 27, 8 (2005), 1226--1238.
[16]
Marko Robnik-Šikonja and Igor Kononenko. 2003. Theoretical and Empirical Analysis of ReliefF and RReliefF. Machine Learning 53, 1/2 (2003), 23--69.
[17]
Shinichi Shirakawa, Yasushi Iwata, and Youhei Akimoto. 2018. Dynamic Optimization of Neural Network Structures Using Probabilistic Modeling. Thirty-Second AAAI Conference on Artificial Intelligence (AAAI-18) (2018).
[18]
Robert Tibshirani. 1994. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society Series B 58 (1994), 267--288.
[19]
Zhixiang Xu, Gao Huang, Kilian Q Weinberger, and Alice X Zheng. 2014. Gradient boosted feature selection. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD '14). ACM Press, New York, New York, USA, 522--531.
[20]
Bing Xue, Mengjie Zhang, and Will N Browne. 2013. Particle swarm optimization for feature selection in classification: A multi-objective approach. IEEE Transactions on Cybernetics 43, 6 (2013), 1656--1671.
[21]
Bing Xue, Mengjie Zhang, Will N. Browne, and Xin Yao. 2016. A Survey on Evolutionary Computation Approaches to Feature Selection. IEEE Transactions on Evolutionary Computation 20, 4 (2016), 606--626.
[22]
Makoto Yamada, Wittawat Jitkrittum, Leonid Sigal, Eric P Xing, and Masashi Sugiyama. 2014. High-Dimensional Feature Selection by Feature-Wise Kernelized Lasso. Neural Computation 26, 1 (2014), 185--207.

Cited By

View all
  • (2024)Importance estimate of features via analysis of their weight and gradient profileScientific Reports10.1038/s41598-024-72640-414:1Online publication date: 9-Oct-2024
  • (2023)Variational quantum algorithm for unconstrained black box binary optimization: Application to feature selectionQuantum10.22331/q-2023-01-26-9097(909)Online publication date: 26-Jan-2023
  • (2022)Parameter Tuned Unsupervised Fuzzy Deep Learning for Clinical Data Classification2022 International Conference on Electronic Systems and Intelligent Computing (ICESIC)10.1109/ICESIC53714.2022.9783488(358-363)Online publication date: 22-Apr-2022
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
GECCO '18: Proceedings of the Genetic and Evolutionary Computation Conference Companion
July 2018
1968 pages
ISBN:9781450357647
DOI:10.1145/3205651
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 06 July 2018

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. embedded approach
  2. feature selection
  3. information geometric optimization
  4. natural gradient
  5. neural network

Qualifiers

  • Research-article

Conference

GECCO '18
Sponsor:

Acceptance Rates

Overall Acceptance Rate 1,669 of 4,410 submissions, 38%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)4
  • Downloads (Last 6 weeks)0
Reflects downloads up to 02 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Importance estimate of features via analysis of their weight and gradient profileScientific Reports10.1038/s41598-024-72640-414:1Online publication date: 9-Oct-2024
  • (2023)Variational quantum algorithm for unconstrained black box binary optimization: Application to feature selectionQuantum10.22331/q-2023-01-26-9097(909)Online publication date: 26-Jan-2023
  • (2022)Parameter Tuned Unsupervised Fuzzy Deep Learning for Clinical Data Classification2022 International Conference on Electronic Systems and Intelligent Computing (ICESIC)10.1109/ICESIC53714.2022.9783488(358-363)Online publication date: 22-Apr-2022
  • (2019)A multi-filter feature selection in detecting distributed denial-of-service attackProceedings of the 3rd International Conference on Telecommunications and Communication Engineering10.1145/3369555.3369572(57-62)Online publication date: 9-Nov-2019
  • (2019)Joint Optimization of Convolutional Neural Network and Image Preprocessing Selection for Embryo Grade Prediction in In Vitro FertilizationAdvances in Visual Computing10.1007/978-3-030-33723-0_2(14-24)Online publication date: 21-Oct-2019
  • (2019)Controlling Model Complexity in Probabilistic Model-Based Dynamic Optimization of Neural Network StructuresArtificial Neural Networks and Machine Learning – ICANN 2019: Deep Learning10.1007/978-3-030-30484-3_33(393-405)Online publication date: 9-Sep-2019

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media