research-article

Embedded feature selection using probabilistic model-based optimization

Authors:

Shinichi Shirakawa,

Youhei AkimotoAuthors Info & Claims

GECCO '18: Proceedings of the Genetic and Evolutionary Computation Conference Companion

Pages 1922 - 1925

https://doi.org/10.1145/3205651.3208227

Published: 06 July 2018 Publication History

Abstract

In machine learning, feature selection is a commonly used technique for improving the predictive performance and interpretability of a trained model. Feature selection techniques are classified into three approaches: the filter, wrapper, and embedded approaches. The embedded approach performs the feature selection process during the model training and achieves a good balance between performance and computational cost in general. In the paper, we propose a novel embedded feature selection method using probabilistic model-based evolutionary optimization. We introduce the multivariate Bernoulli distribution, which determines the selection of features, and we optimize its parameters during the training. The distribution parameter update rule is the same as that of the population-based incremental learning (PBIL), but we simultaneously update the parameters of the machine learning model using an ordinary gradient descent method. This method can be easily implemented into non-linear models, such as neural networks. Moreover, we incorporate the penalty term into the objective function to control the number of selected feature. We apply the proposed method with the neural network model to the feature selection of three classification problems. The proposed method achieves competitive performance and reasonable computational cost compared with conventional feature selection methods.

References

[1]

Shun-ichi Amari. 1998. Natural Gradient Works Efficiently in Learning. Neural Computation 10, 2 (1998), 251--276.

Digital Library

[2]

Shumeet Baluja. 1994. Population based incremental learning: A method for integrating genetic search based function optimization and competitve learning. CMU-CS-94-163, Carnegie-Mellon Univ Pittsburgh Pa Dept Of Computer Science (1994).

Digital Library

[3]

Aaron-Defazio, Francis Bach, and Simon Lacoste-Julien. 2014. SAGA: A Fast Incremental Gradient Method With Support for Non-Strongly Convex Composite Objectives. In Proceedings of Advances In Neural Information Processing Systems (NIPS) 27.

Digital Library

[4]

Isabelle Guyon and André Elisseeff. 2003. An Introduction to Variable and Feature Selection. Journal of Machine Learning Research 3 (2003), 1157--1182.

Digital Library

[5]

Georges R. Harik, Fernando G. Lobo, and David E. Goldberg. 1999. The compact genetic algorithm. IEEE Transactions on Evolutionary Computation 3, 4 (1999). 287--297.

Digital Library

[6]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2015. Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification. In 2015 IEEE International Conference on Computer Vision (ICCV). IEEE, 1026--1034.

Digital Library

[7]

Jonathan J. Hull. 1994. A Database for Handwritten Text Recognition Research. IEEE Transactions on Pattern Analysis and Machine Intelligence 16, 5 (1994), 550--554.

Digital Library

[8]

Sergey Ioffe and Christian Szegedy. 2015. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. In Proceedings of the 32nd International Conference on Machine Learning (ICML), Vol. PMLR37. 448--456.

Digital Library

[9]

Diederik P. Kingma and Jimmy Ba. 2014. Adam: A Method for Stochastic Optimization. In Proceedings of the 3rd International Conference on Learning Representations (ICLR).

[10]

Ron Kohavi. 1996. Scaling Up the Accuracy of Naive-Bayes Classifiers: A Decision-tree Hybrid. In Proceedings of the Second International Conference on Knowledge Discovery and Data Mining (KDD'96). AAAI Press, 202--207.

Digital Library

[11]

Ken Lang. 1995. NewsWeeder: Learning to Filter Netnews. In Machine Learning Proceedings 1995, Armand Prieditis and Stuart Russell (Eds.). Morgan Kaufmann, San Francisco (CA), 331--339.

Digital Library

[12]

Fan Li, Yiming Yang, and Eric P Xing. 2006. From Lasso regression to Feature vector machine. Advances In Neural Information Processing Systems (NIPS) 18 (2006), 779--786.

Digital Library

[13]

Vinod Nair and Geoffrey E Hinton. 2010. Rectified Linear Units Improve Restricted Boltzmann Machines. In Proceedings of the 27th International Conference on Machine Learning (ICML). 807--814.

Digital Library

[14]

Yann Ollivier, Ludovic Arnold, Anne Auger, and Nikolaus Hansen. 2017. Information-Geometric Optimization Algorithms: A Unifying Picture via Invariance Principles. Journal of Machine Learning Research 18, 18 (2017), 1--65.

Digital Library

[15]

Hanchuan Peng, Fuhui Long, and Chris Ding. 2005. Feature selection based on mutual information: Criteria of Max-Dependency, Max-Relevance, and Min-Redundancy. IEEE Transactions on Pattern Analysis and Machine Intelligence 27, 8 (2005), 1226--1238.

Digital Library

[16]

Marko Robnik-Šikonja and Igor Kononenko. 2003. Theoretical and Empirical Analysis of ReliefF and RReliefF. Machine Learning 53, 1/2 (2003), 23--69.

Digital Library

[17]

Shinichi Shirakawa, Yasushi Iwata, and Youhei Akimoto. 2018. Dynamic Optimization of Neural Network Structures Using Probabilistic Modeling. Thirty-Second AAAI Conference on Artificial Intelligence (AAAI-18) (2018).

[18]

Robert Tibshirani. 1994. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society Series B 58 (1994), 267--288.

[19]

Zhixiang Xu, Gao Huang, Kilian Q Weinberger, and Alice X Zheng. 2014. Gradient boosted feature selection. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD '14). ACM Press, New York, New York, USA, 522--531.

Digital Library

[20]

Bing Xue, Mengjie Zhang, and Will N Browne. 2013. Particle swarm optimization for feature selection in classification: A multi-objective approach. IEEE Transactions on Cybernetics 43, 6 (2013), 1656--1671.

[21]

Bing Xue, Mengjie Zhang, Will N. Browne, and Xin Yao. 2016. A Survey on Evolutionary Computation Approaches to Feature Selection. IEEE Transactions on Evolutionary Computation 20, 4 (2016), 606--626.

Digital Library

[22]

Makoto Yamada, Wittawat Jitkrittum, Leonid Sigal, Eric P Xing, and Masashi Sugiyama. 2014. High-Dimensional Feature Selection by Feature-Wise Kernelized Lasso. Neural Computation 26, 1 (2014), 185--207.

Digital Library

Cited By

Chan HVeas E(2024)Importance estimate of features via analysis of their weight and gradient profileScientific Reports10.1038/s41598-024-72640-414:1Online publication date: 9-Oct-2024
https://doi.org/10.1038/s41598-024-72640-4
Zoufal CMishmash RSharma NKumar NSheshadri ADeshmukh AIbrahim NGacon JWoerner S(2023)Variational quantum algorithm for unconstrained black box binary optimization: Application to feature selectionQuantum10.22331/q-2023-01-26-9097(909)Online publication date: 26-Jan-2023
https://doi.org/10.22331/q-2023-01-26-909
Saranya SSabiyath Fatima N(2022)Parameter Tuned Unsupervised Fuzzy Deep Learning for Clinical Data Classification2022 International Conference on Electronic Systems and Intelligent Computing (ICESIC)10.1109/ICESIC53714.2022.9783488(358-363)Online publication date: 22-Apr-2022
https://doi.org/10.1109/ICESIC53714.2022.9783488
Show More Cited By

Index Terms

Embedded feature selection using probabilistic model-based optimization
1. Computing methodologies
  1. Machine learning
    1. Machine learning algorithms
      1. Feature selection
2. Theory of computation
  1. Design and analysis of algorithms
    1. Mathematical optimization
      1. Discrete optimization
        Optimization with randomized search heuristics
        Evolutionary algorithms

Recommendations

Binary biogeography-based optimization based SVM-RFE for feature selection
Abstract
Rapid data growth presents many challenges for Machine Learning (ML) tasks as it can include lots of irrelevant, noisy, and redundant features. Thus, it is vital to select the most relevant features to the classification task, known as ...
Highlights
- An improved BBO is proposed for feature selection tasks.
- Fusing SVM-RFE in the ...
Diversity-Based Feature Selection from Neural Network with Low Computational Cost
Neural Information Processing

This paper presents a new approach to identify the activity of input attributes efficiently in the wrapper model of feature selection. The relevant features are selected by the diversity among the inputs of the neural network and the entire process is ...
Multi-objective Optimization Based Feature Selection Using Correlation
Advanced Data Mining and Applications
Abstract
The optimal feature selection (FS) problem is widely targeted in the field of machine learning (ML). There are several ways to select the best features when the dataset dimension is small. However, when the dataset and number of features tend to ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

GECCO '18: Proceedings of the Genetic and Evolutionary Computation Conference Companion

July 2018

1968 pages

ISBN:9781450357647

DOI:10.1145/3205651

Editor:
Hernan Aguirre
Shinshu University
,
General Chair:
Keiki Takadama
The University of Electro-Communications

Copyright © 2018 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGEVO: ACM Special Interest Group on Genetic and Evolutionary Computation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 06 July 2018

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

GECCO '18

Sponsor:

SIGEVO

GECCO '18: Genetic and Evolutionary Computation Conference

July 15 - 19, 2018

Kyoto, Japan

Acceptance Rates

Overall Acceptance Rate 1,669 of 4,410 submissions, 38%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

6
Total Citations
View Citations
181
Total Downloads

Downloads (Last 12 months)4
Downloads (Last 6 weeks)0

Reflects downloads up to 02 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Chan HVeas E(2024)Importance estimate of features via analysis of their weight and gradient profileScientific Reports10.1038/s41598-024-72640-414:1Online publication date: 9-Oct-2024
https://doi.org/10.1038/s41598-024-72640-4
Zoufal CMishmash RSharma NKumar NSheshadri ADeshmukh AIbrahim NGacon JWoerner S(2023)Variational quantum algorithm for unconstrained black box binary optimization: Application to feature selectionQuantum10.22331/q-2023-01-26-9097(909)Online publication date: 26-Jan-2023
https://doi.org/10.22331/q-2023-01-26-909
Saranya SSabiyath Fatima N(2022)Parameter Tuned Unsupervised Fuzzy Deep Learning for Clinical Data Classification2022 International Conference on Electronic Systems and Intelligent Computing (ICESIC)10.1109/ICESIC53714.2022.9783488(358-363)Online publication date: 22-Apr-2022
https://doi.org/10.1109/ICESIC53714.2022.9783488
Jun YLeau YAlias SPark YWatanabe HLi J(2019)A multi-filter feature selection in detecting distributed denial-of-service attackProceedings of the 3rd International Conference on Telecommunications and Communication Engineering10.1145/3369555.3369572(57-62)Online publication date: 9-Nov-2019
https://dl.acm.org/doi/10.1145/3369555.3369572
Uchida KSaito SPamungkasari PKawai YHanoum IJuwono FShirakawa S(2019)Joint Optimization of Convolutional Neural Network and Image Preprocessing Selection for Embryo Grade Prediction in In Vitro FertilizationAdvances in Visual Computing10.1007/978-3-030-33723-0_2(14-24)Online publication date: 21-Oct-2019
https://doi.org/10.1007/978-3-030-33723-0_2
Saito SShirakawa S(2019)Controlling Model Complexity in Probabilistic Model-Based Dynamic Optimization of Neural Network StructuresArtificial Neural Networks and Machine Learning – ICANN 2019: Deep Learning10.1007/978-3-030-30484-3_33(393-405)Online publication date: 9-Sep-2019
https://doi.org/10.1007/978-3-030-30484-3_33

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten