research-article

Public Access

Simple Evolutionary Optimization Can Rival Stochastic Gradient Descent in Neural Networks

Authors:

Kenneth O. StanleyAuthors Info & Claims

GECCO '16: Proceedings of the Genetic and Evolutionary Computation Conference 2016

Pages 477 - 484

https://doi.org/10.1145/2908812.2908916

Published: 20 July 2016 Publication History

Abstract

While evolutionary algorithms (EAs) have long offered an alternative approach to optimization, in recent years backpropagation through stochastic gradient descent (SGD) has come to dominate the fields of neural network optimization and deep learning. One hypothesis for the absence of EAs in deep learning is that modern neural networks have become so high dimensional that evolution with its inexact gradient cannot match the exact gradient calculations of backpropagation. Furthermore, the evaluation of a single individual in evolution on the big data sets now prevalent in deep learning would present a prohibitive obstacle towards efficient optimization. This paper challenges these views, suggesting that EAs can be made to run significantly faster than previously thought by evaluating individuals only on a small number of training examples per generation. Surprisingly, using this approach with only a simple EA (called the limited evaluation EA or LEEA) is competitive with the performance of the state-of-the-art SGD variant RMSProp on several benchmarks with neural networks with over 1,000 weights. More investigation is warranted, but these initial results suggest the possibility that EAs could be the first viable training alternative for deep learning outside of SGD, thereby opening up deep learning to all the tools of evolutionary computation.

References

[1]

D. Almeida and N. Sauder. GradNets: Dynamic interpolation between neural architectures. ArXiv e-prints, abs/1511.06827, 2015.

[2]

P. J. Angeline, G. M. Saunders, and J. B. Pollack. An evolutionary algorithm that constructs recurrent neural networks. IEEE Transactions on Neural Networks, 5 (1): 54--65, 1994.

Digital Library

[3]

R. K. Belew, J. McInerney, and N. N. Schraudolph. Evolving networks: Using the genetic algorithm with connectionist learning. In Artificial Life II: Proceedings of the Workshop on Artificial Life, 1990.

[4]

Y. Bengio. Learning deep architectures for AI. Foundations and Trends in Machine Learning, 2 (1): 1--127, 2009.

Digital Library

[5]

Y. Bengio and Y. LeCun. Scaling learning algorithms towards ai. Large-Scale Kernel Machines, 34, 2007.

[6]

Y. Bengio, P. Lamblin, D. Popovici, and H. Larochelle. Greedy layer-wise training of deep networks. In Advances in Neural Information Processing Systems 19 (NIPS), Cambridge, MA, 2007. MIT Press.

Digital Library

[7]

H. Braun and J. Weisbrod. Evolving neural feedforward networks. In Artificial Neural Nets and Genetic Algorithms, pages 25--32. Springer, 1993.

[8]

n et al.(2012)Cireşan, Meier, Masci, and Schmidhuber}ciresan2012:nn12D. Cireşan, U. Meier, J. Masci, and J. Schmidhuber. Multi-column deep neural network for traffic sign classification. Neural Networks, 32: 333--338, 2012.

Digital Library

[9]

A. Cotter, O. Shamir, N. Srebro, and K. Sridharan. Better mini-batch algorithms via accelerated gradient methods. In Advances in neural information processing systems, pages 1647--1655, 2011.

[10]

D. Dasgupta and D. McGregor. Designing application-specific neural networks using the structured genetic algorithm. In Proceedings of the International Conference on Combinations of Genetic Algorithms and Neural Networks, pages 87--96, 1992.

[11]

Y. Dauphin, R. Pascanu, Ç. Gülçehre, K. Cho, S. Ganguli, and Y. Bengio. Identifying and attacking the saddle point problem in high-dimensional non-convex optimization. ArXiv e-prints, abs/1406.2572, 2014.

[12]

D. Floreano, P. Dürr, and C. Mattiussi. Neuroevolution: from architectures to learning. Evolutionary Intelligence, 1: 47--62, 2008.

[13]

D. B. Fogel. phBlondie24: Playing at the Edge of AI. 2001.

Digital Library

[14]

L. G. Fonseca, A. C. Lemonge, and H. J. Barbosa. A study on fitness inheritance for enhanced efficiency in real-coded genetic algorithms. In Proceedings of the 2012 IEEE Congress on Evolutionary Computation (CEC), pages 1--8. IEEE, 2012.

[15]

J. Gauci and K. O. Stanley. Autonomous evolution of topographic regularities in artificial neural networks. Neural Computation, 22 (7): 1860--1898, 2010.

Digital Library

[16]

X. Glorot, A. Bordes, and Y. Bengio. Deep sparse rectifier neural networks. In International Conference on Artificial Intelligence and Statistics, pages 315--323, 2011.

[17]

C. Goerick and T. Rodemann. Evolution strategies: An alternative to gradient-based learning. In Proceedings of the International Conference on Engineering Applications of Neural Networks, volume 1, pages 479--482. Citeseer, 1996.

[18]

D. E. Goldberg. Simple genetic algorithms and the minimal, deceptive problem. In L. Davis, editor, Genetic algorithms and simulated annealing, pages 74--88, London, 1987. Pitman.

[19]

D. E. Goldberg and J. Richardson. Genetic algorithms with sharing for multimodal function optimization. In Genetic algorithms and their applications: Proceedings of the Second International Conference on Genetic Algorithms, pages 41--49. Hillsdale, NJ: Lawrence Erlbaum, 1987.

Digital Library

[20]

F. Gomez and R. Miikkulainen. Incremental evolution of complex general behavior. Adaptive Behavior, 5: 317--342, 1997.

Digital Library

[21]

K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. arXiv preprint arXiv:1512.03385, 2015.

[22]

G. E. Hinton, S. Osindero, and Y.-W. Teh. A fast learning algorithm for deep belief nets. Neural Computation, 18 (7): 1527--1554, 2006.

Digital Library

[23]

S. Hochreiter and J. Schmidhuber. Long short-term memory. Neural computation, 9 (8): 1735--1780, 1997.

Digital Library

[24]

G.-B. Huang, P. Saratchandran, and N. Sundararajan. A generalized growing and pruning RBF (GGAP-RBF) neural network for function approximation. Neural Networks, IEEE Transactions on, 16 (1): 57--67, 2005.

Digital Library

[25]

C. Igel. Neuroevolution for reinforcement learning using evolution strategies. In R. Sarker, R. Reynolds, H. Abbass, K. C. Tan, B. McKay, D. Essam, and T. Gedeon, editors, Proceedings of the 2003 Congress on Evolutionary Computation, pages 2588--2595, Piscataway, NJ, 2003. IEEE Press.

[26]

P. Larranaga and J. A. Lozano. Estimation of distribution algorithms: A new tool for evolutionary computation, volume 2. Springer Science & Business Media, 2002.

Digital Library

[27]

Q. Le, M. Ranzato, R. Monga, M. Devin, K. Chen, G. Corrado, J. Dean, and A. Ng. Building high-level features using large scale unsupervised learning. In International Conference in Machine Learning (ICML-2012), 2012.

[28]

Y. LeCun and C. Cortes. The MNIST database of handwritten digits, 1998.

[29]

J. Lehman and K. O. Stanley. Abandoning objectives: Evolution through the search for novelty alone. Evolutionary Computation, 19 (2): 189--223.

Digital Library

[30]

N.-Y. Liang, G.-B. Huang, P. Saratchandran, and N. Sundararajan. A fast and accurate online sequential learning algorithm for feedforward networks. Neural Networks, IEEE Transactions on, 17 (6): 1411--1423, 2006.

Digital Library

[31]

T. P. Lillicrap, D. Cownden, D. B. Tweed, and C. J. Akerman. Random feedback weights support learning in deep neural networks. arXiv preprint arXiv:1411.0247, 2014.

[32]

M. Mandischer. A comparison of evolution strategies and backpropagation for neural network training. Neurocomputing, 42 (1): 87--117, 2002.

[33]

R. Marc'Aurelio, L. Boureau, and Y. LeCun. Sparse feature learning for deep belief networks. In Advances in Neural Information Processing Systems 20 (NIPS), pages 1185--1192, Cambridge, MA, 2007. MIT Press.

Digital Library

[34]

D. J. Montana and L. Davis. Training feedforward neural networks using genetic algorithms. pages 762--767, 1989.

Digital Library

[35]

D. E. Moriarty and R. Miikkulainen. Forming neural networks through efficient and adaptive co-evolution. Evolutionary Computation, 5: 373--399, 1997.

Digital Library

[36]

J.-B. Mouret and S. Doncieux. Encouraging behavioral diversity in evolutionary robotics: An empirical study. Evolutionary computation, 20 (1): 91--133, 2012.

Digital Library

[37]

H. Mühlenbein. Limitations of multi-layer perceptron networks -- steps towards genetic neural networks. Parallel Computing, 14 (3): 249--260, 1990.

[38]

V. Nair and G. E. Hinton. Rectified linear units improve restricted boltzmann machines. In Proceedings of the 27th International Conference on Machine Learning (ICML-10), pages 807--814, 2010.

Digital Library

[39]

V. W. Porto, D. B. Fogel, and L. J. Fogel. Alternative neural network training methods. IEEE Intelligent Systems, (3): 16--22, 1995.

Digital Library

[40]

D. E. Rumelhart, G. E. Hinton, and R. J. Williams. Learning internal representations by error propagation. In Parallel Distributed Processing, pages 318--362. 1986.

Digital Library

[41]

R. E. Smith, B. A. Dike, and S. A. Stegmann. Fitness inheritance in genetic algorithms. In Proceedings of the 1995 ACM Symposium on Applied Computing, SAC '95, 1995.

Digital Library

[42]

N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov. Dropout: A simple way to prevent neural networks from overfitting. The Journal of Machine Learning Research, 15 (1): 1929--1958, 2014.

Digital Library

[43]

R. K. Srivastava, K. Greff, and J. Schmidhuber. Highway networks. ArXiv e-prints, abs/1505.00387, 2015.

[44]

K. O. Stanley and R. Miikkulainen. Evolving neural networks through augmenting topologies. Evolutionary Computation, 10: 99--127, 2002.

Digital Library

[45]

K. O. Stanley and R. Miikkulainen. A taxonomy for artificial embryogeny. Artificial Life, 9 (2): 93--130, 2003.

Digital Library

[46]

K. O. Stanley and R. Miikkulainen. Competitive coevolution through evolutionary complexification. Journal of Artificial Intelligence Research (JAIR), 21: 63--100, 2004.

Digital Library

[47]

K. O. Stanley, D. B. D'Ambrosio, and J. Gauci. A hypercube-based indirect encoding for evolving large-scale neural networks. Artificial Life, 15 (2): 185--212, 2009.

Digital Library

[48]

T. Tieleman and G. Hinton. Lecture 6.5-RMSprop: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning, 4, 2012.

[49]

D. Whitley. Fundamental principles of deception. Foundations of Genetic Algorithms (FOGA 1), 1: 221, 1991.

[50]

D. Whitley, T. Starkweather, and C. Bogart. Genetic algorithms and neural networks: Optimizing connections and connectivity. Parallel Computing, 14: 347--361, 1990.

[51]

W. Wienholt. Minimizing the system error in feedforward neural networks with evolution strategy. In ICANN 93, pages 490--493. Springer, 1993.

[52]

D. R. Wilson and T. R. Martinez. The general inefficiency of batch training for gradient descent learning. Neural Networks, 16 (10): 1429--1451, 2003.

Digital Library

[53]

X. Yao. Evolving artificial neural networks. Proceedings of the IEEE, 87 (9): 1423--1447, 1999.

Cited By

Vo NTang HTran LNguyen PKim SLee SLee J(2025)A Neural Network Approach for Domain Reduction in High-Dimensional Optimization ProblemComputational and Experimental Simulations in Engineering10.1007/978-3-031-81673-4_69(947-964)Online publication date: 3-Jan-2025
https://doi.org/10.1007/978-3-031-81673-4_69
Pandey SNair SLi XHandl J(2024)Onboard Class Incremental Learning for Resource-Constrained scenarios using Genetic Algorithm and TinyMLProceedings of the Genetic and Evolutionary Computation Conference Companion10.1145/3638530.3654392(299-302)Online publication date: 14-Jul-2024
https://dl.acm.org/doi/10.1145/3638530.3654392
Inan TShehu ALi XHandl J(2024)Revisiting Evolutionary Algorithms for Optimization for Deep Learning: Introducing DL-HEAProceedings of the Genetic and Evolutionary Computation Conference Companion10.1145/3638530.3654341(435-438)Online publication date: 14-Jul-2024
https://dl.acm.org/doi/10.1145/3638530.3654341
Show More Cited By

Index Terms

Simple Evolutionary Optimization Can Rival Stochastic Gradient Descent in Neural Networks
1. Computing methodologies
  1. Artificial intelligence
  2. Machine learning
    1. Learning paradigms
      1. Supervised learning
        Supervised learning by regression
    2. Machine learning approaches
      1. Bio-inspired approaches
        Genetic algorithms
      2. Neural networks

Recommendations

Evolutionary stochastic gradient descent for optimization of deep neural networks
NIPS'18: Proceedings of the 32nd International Conference on Neural Information Processing Systems

We propose a population-based Evolutionary Stochastic Gradient Descent (ESGD) framework for optimizing deep neural networks. ESGD combines SGD and gradient-free evolutionary algorithms as complementary algorithms in one framework in which the ...
A proof of convergence for stochastic gradient descent in the training of artificial neural networks with ReLU activation for constant target functions
Abstract
In this article we study the stochastic gradient descent (SGD) optimization method in the training of fully connected feedforward artificial neural networks with ReLU activation. The main result of this work proves that the risk of the SGD process ...
Observing the evolution of neural networks learning to play the game of Othello

A study was conducted to find out how game-playing strategies for Othello (also known as reversi) can be learned without expert knowledge. The approach used the coevolution of a fixed-architecture neural-network-based evaluation function combined with a ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

GECCO '16: Proceedings of the Genetic and Evolutionary Computation Conference 2016

July 2016

1196 pages

ISBN:9781450342063

DOI:10.1145/2908812

Editor:
Tobias Friedrich
Hasso Plattner Institute
,
General Chair:
Frank Neumann
University of Adelaide
,
Program Chair:
Andrew M. Sutton
Hasso Plattner Institute

Copyright © 2016 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGEVO: ACM Special Interest Group on Genetic and Evolutionary Computation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 July 2016

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

National Science Foundation

Conference

GECCO '16

Sponsor:

SIGEVO

GECCO '16: Genetic and Evolutionary Computation Conference

July 20 - 24, 2016

Colorado, Denver, USA

Acceptance Rates

GECCO '16 Paper Acceptance Rate 137 of 381 submissions, 36%;

Overall Acceptance Rate 1,669 of 4,410 submissions, 38%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

75
Total Citations
View Citations
1,865
Total Downloads

Downloads (Last 12 months)342
Downloads (Last 6 weeks)46

Reflects downloads up to 16 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Vo NTang HTran LNguyen PKim SLee SLee J(2025)A Neural Network Approach for Domain Reduction in High-Dimensional Optimization ProblemComputational and Experimental Simulations in Engineering10.1007/978-3-031-81673-4_69(947-964)Online publication date: 3-Jan-2025
https://doi.org/10.1007/978-3-031-81673-4_69
Pandey SNair SLi XHandl J(2024)Onboard Class Incremental Learning for Resource-Constrained scenarios using Genetic Algorithm and TinyMLProceedings of the Genetic and Evolutionary Computation Conference Companion10.1145/3638530.3654392(299-302)Online publication date: 14-Jul-2024
https://dl.acm.org/doi/10.1145/3638530.3654392
Inan TShehu ALi XHandl J(2024)Revisiting Evolutionary Algorithms for Optimization for Deep Learning: Introducing DL-HEAProceedings of the Genetic and Evolutionary Computation Conference Companion10.1145/3638530.3654341(435-438)Online publication date: 14-Jul-2024
https://dl.acm.org/doi/10.1145/3638530.3654341
Turan ÖEfe M(2024)Genetic Algorithm and Binary Masks for Co-Learning Multiple Dataset in Deep Neural Networks2024 10th International Conference on Control, Decision and Information Technologies (CoDIT)10.1109/CoDIT62066.2024.10708280(1-6)Online publication date: 1-Jul-2024
https://doi.org/10.1109/CoDIT62066.2024.10708280
Chauhan DShivani Cheng R(2024)Competitive Swarm Optimizer: A decade surveySwarm and Evolutionary Computation10.1016/j.swevo.2024.10154387(101543)Online publication date: Jun-2024
https://doi.org/10.1016/j.swevo.2024.101543
Rahman AHe LWang H(2024)Activation function optimization scheme for image classificationKnowledge-Based Systems10.1016/j.knosys.2024.112502305(112502)Online publication date: Dec-2024
https://doi.org/10.1016/j.knosys.2024.112502
Hossain SUddin MYan KHossain MGolder MHoque M(2024)Prediction of the mechanical performance of polyethylene fiber-based engineered cementitious composite (PE-ECC)Low-carbon Materials and Green Construction10.1007/s44242-024-00040-y2:1Online publication date: 18-Jun-2024
https://doi.org/10.1007/s44242-024-00040-y
Limtrakul SWetweerapong J(2023)AN ENHANCED DIFFERENTIAL EVOLUTION ALGORITHM WITH ADAPTIVE WEIGHT BOUNDS FOR EFFICIENT TRAINING OF NEURAL NETWORKSInformatyka, Automatyka, Pomiary w Gospodarce i Ochronie Środowiska10.35784/iapgos.336613:1(4-13)Online publication date: 31-Mar-2023
https://doi.org/10.35784/iapgos.3366
Crulis BSerres BDe Runz CVenturini GHong JLanperne MPark JCerny TShahriar H(2023)Are alternatives to backpropagation useful for training Binary Neural Networks? An experimental study in image classificationProceedings of the 38th ACM/SIGAPP Symposium on Applied Computing10.1145/3555776.3577674(1171-1178)Online publication date: 27-Mar-2023
https://dl.acm.org/doi/10.1145/3555776.3577674
Le-Duc TNguyen QLee JNguyen-Xuan H(2023)Strengthening Gradient Descent by Sequential Motion Optimization for Deep Neural NetworksIEEE Transactions on Evolutionary Computation10.1109/TEVC.2022.317105227:3(565-579)Online publication date: Jun-2023
https://doi.org/10.1109/TEVC.2022.3171052
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Figures

Tables

Media

View Table of Conten