skip to main content
10.1145/2908812.2908916acmconferencesArticle/Chapter ViewAbstractPublication PagesgeccoConference Proceedingsconference-collections
research-article
Public Access

Simple Evolutionary Optimization Can Rival Stochastic Gradient Descent in Neural Networks

Published: 20 July 2016 Publication History

Abstract

While evolutionary algorithms (EAs) have long offered an alternative approach to optimization, in recent years backpropagation through stochastic gradient descent (SGD) has come to dominate the fields of neural network optimization and deep learning. One hypothesis for the absence of EAs in deep learning is that modern neural networks have become so high dimensional that evolution with its inexact gradient cannot match the exact gradient calculations of backpropagation. Furthermore, the evaluation of a single individual in evolution on the big data sets now prevalent in deep learning would present a prohibitive obstacle towards efficient optimization. This paper challenges these views, suggesting that EAs can be made to run significantly faster than previously thought by evaluating individuals only on a small number of training examples per generation. Surprisingly, using this approach with only a simple EA (called the limited evaluation EA or LEEA) is competitive with the performance of the state-of-the-art SGD variant RMSProp on several benchmarks with neural networks with over 1,000 weights. More investigation is warranted, but these initial results suggest the possibility that EAs could be the first viable training alternative for deep learning outside of SGD, thereby opening up deep learning to all the tools of evolutionary computation.

References

[1]
D. Almeida and N. Sauder. GradNets: Dynamic interpolation between neural architectures. ArXiv e-prints, abs/1511.06827, 2015.
[2]
P. J. Angeline, G. M. Saunders, and J. B. Pollack. An evolutionary algorithm that constructs recurrent neural networks. IEEE Transactions on Neural Networks, 5 (1): 54--65, 1994.
[3]
R. K. Belew, J. McInerney, and N. N. Schraudolph. Evolving networks: Using the genetic algorithm with connectionist learning. In Artificial Life II: Proceedings of the Workshop on Artificial Life, 1990.
[4]
Y. Bengio. Learning deep architectures for AI. Foundations and Trends in Machine Learning, 2 (1): 1--127, 2009.
[5]
Y. Bengio and Y. LeCun. Scaling learning algorithms towards ai. Large-Scale Kernel Machines, 34, 2007.
[6]
Y. Bengio, P. Lamblin, D. Popovici, and H. Larochelle. Greedy layer-wise training of deep networks. In Advances in Neural Information Processing Systems 19 (NIPS), Cambridge, MA, 2007. MIT Press.
[7]
H. Braun and J. Weisbrod. Evolving neural feedforward networks. In Artificial Neural Nets and Genetic Algorithms, pages 25--32. Springer, 1993.
[8]
n et al.(2012)Cireşan, Meier, Masci, and Schmidhuber}ciresan2012:nn12D. Cireşan, U. Meier, J. Masci, and J. Schmidhuber. Multi-column deep neural network for traffic sign classification. Neural Networks, 32: 333--338, 2012.
[9]
A. Cotter, O. Shamir, N. Srebro, and K. Sridharan. Better mini-batch algorithms via accelerated gradient methods. In Advances in neural information processing systems, pages 1647--1655, 2011.
[10]
D. Dasgupta and D. McGregor. Designing application-specific neural networks using the structured genetic algorithm. In Proceedings of the International Conference on Combinations of Genetic Algorithms and Neural Networks, pages 87--96, 1992.
[11]
Y. Dauphin, R. Pascanu, Ç. Gülçehre, K. Cho, S. Ganguli, and Y. Bengio. Identifying and attacking the saddle point problem in high-dimensional non-convex optimization. ArXiv e-prints, abs/1406.2572, 2014.
[12]
D. Floreano, P. Dürr, and C. Mattiussi. Neuroevolution: from architectures to learning. Evolutionary Intelligence, 1: 47--62, 2008.
[13]
D. B. Fogel. phBlondie24: Playing at the Edge of AI. 2001.
[14]
L. G. Fonseca, A. C. Lemonge, and H. J. Barbosa. A study on fitness inheritance for enhanced efficiency in real-coded genetic algorithms. In Proceedings of the 2012 IEEE Congress on Evolutionary Computation (CEC), pages 1--8. IEEE, 2012.
[15]
J. Gauci and K. O. Stanley. Autonomous evolution of topographic regularities in artificial neural networks. Neural Computation, 22 (7): 1860--1898, 2010.
[16]
X. Glorot, A. Bordes, and Y. Bengio. Deep sparse rectifier neural networks. In International Conference on Artificial Intelligence and Statistics, pages 315--323, 2011.
[17]
C. Goerick and T. Rodemann. Evolution strategies: An alternative to gradient-based learning. In Proceedings of the International Conference on Engineering Applications of Neural Networks, volume 1, pages 479--482. Citeseer, 1996.
[18]
D. E. Goldberg. Simple genetic algorithms and the minimal, deceptive problem. In L. Davis, editor, Genetic algorithms and simulated annealing, pages 74--88, London, 1987. Pitman.
[19]
D. E. Goldberg and J. Richardson. Genetic algorithms with sharing for multimodal function optimization. In Genetic algorithms and their applications: Proceedings of the Second International Conference on Genetic Algorithms, pages 41--49. Hillsdale, NJ: Lawrence Erlbaum, 1987.
[20]
F. Gomez and R. Miikkulainen. Incremental evolution of complex general behavior. Adaptive Behavior, 5: 317--342, 1997.
[21]
K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. arXiv preprint arXiv:1512.03385, 2015.
[22]
G. E. Hinton, S. Osindero, and Y.-W. Teh. A fast learning algorithm for deep belief nets. Neural Computation, 18 (7): 1527--1554, 2006.
[23]
S. Hochreiter and J. Schmidhuber. Long short-term memory. Neural computation, 9 (8): 1735--1780, 1997.
[24]
G.-B. Huang, P. Saratchandran, and N. Sundararajan. A generalized growing and pruning RBF (GGAP-RBF) neural network for function approximation. Neural Networks, IEEE Transactions on, 16 (1): 57--67, 2005.
[25]
C. Igel. Neuroevolution for reinforcement learning using evolution strategies. In R. Sarker, R. Reynolds, H. Abbass, K. C. Tan, B. McKay, D. Essam, and T. Gedeon, editors, Proceedings of the 2003 Congress on Evolutionary Computation, pages 2588--2595, Piscataway, NJ, 2003. IEEE Press.
[26]
P. Larranaga and J. A. Lozano. Estimation of distribution algorithms: A new tool for evolutionary computation, volume 2. Springer Science & Business Media, 2002.
[27]
Q. Le, M. Ranzato, R. Monga, M. Devin, K. Chen, G. Corrado, J. Dean, and A. Ng. Building high-level features using large scale unsupervised learning. In International Conference in Machine Learning (ICML-2012), 2012.
[28]
Y. LeCun and C. Cortes. The MNIST database of handwritten digits, 1998.
[29]
J. Lehman and K. O. Stanley. Abandoning objectives: Evolution through the search for novelty alone. Evolutionary Computation, 19 (2): 189--223.
[30]
N.-Y. Liang, G.-B. Huang, P. Saratchandran, and N. Sundararajan. A fast and accurate online sequential learning algorithm for feedforward networks. Neural Networks, IEEE Transactions on, 17 (6): 1411--1423, 2006.
[31]
T. P. Lillicrap, D. Cownden, D. B. Tweed, and C. J. Akerman. Random feedback weights support learning in deep neural networks. arXiv preprint arXiv:1411.0247, 2014.
[32]
M. Mandischer. A comparison of evolution strategies and backpropagation for neural network training. Neurocomputing, 42 (1): 87--117, 2002.
[33]
R. Marc'Aurelio, L. Boureau, and Y. LeCun. Sparse feature learning for deep belief networks. In Advances in Neural Information Processing Systems 20 (NIPS), pages 1185--1192, Cambridge, MA, 2007. MIT Press.
[34]
D. J. Montana and L. Davis. Training feedforward neural networks using genetic algorithms. pages 762--767, 1989.
[35]
D. E. Moriarty and R. Miikkulainen. Forming neural networks through efficient and adaptive co-evolution. Evolutionary Computation, 5: 373--399, 1997.
[36]
J.-B. Mouret and S. Doncieux. Encouraging behavioral diversity in evolutionary robotics: An empirical study. Evolutionary computation, 20 (1): 91--133, 2012.
[37]
H. Mühlenbein. Limitations of multi-layer perceptron networks -- steps towards genetic neural networks. Parallel Computing, 14 (3): 249--260, 1990.
[38]
V. Nair and G. E. Hinton. Rectified linear units improve restricted boltzmann machines. In Proceedings of the 27th International Conference on Machine Learning (ICML-10), pages 807--814, 2010.
[39]
V. W. Porto, D. B. Fogel, and L. J. Fogel. Alternative neural network training methods. IEEE Intelligent Systems, (3): 16--22, 1995.
[40]
D. E. Rumelhart, G. E. Hinton, and R. J. Williams. Learning internal representations by error propagation. In Parallel Distributed Processing, pages 318--362. 1986.
[41]
R. E. Smith, B. A. Dike, and S. A. Stegmann. Fitness inheritance in genetic algorithms. In Proceedings of the 1995 ACM Symposium on Applied Computing, SAC '95, 1995.
[42]
N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov. Dropout: A simple way to prevent neural networks from overfitting. The Journal of Machine Learning Research, 15 (1): 1929--1958, 2014.
[43]
R. K. Srivastava, K. Greff, and J. Schmidhuber. Highway networks. ArXiv e-prints, abs/1505.00387, 2015.
[44]
K. O. Stanley and R. Miikkulainen. Evolving neural networks through augmenting topologies. Evolutionary Computation, 10: 99--127, 2002.
[45]
K. O. Stanley and R. Miikkulainen. A taxonomy for artificial embryogeny. Artificial Life, 9 (2): 93--130, 2003.
[46]
K. O. Stanley and R. Miikkulainen. Competitive coevolution through evolutionary complexification. Journal of Artificial Intelligence Research (JAIR), 21: 63--100, 2004.
[47]
K. O. Stanley, D. B. D'Ambrosio, and J. Gauci. A hypercube-based indirect encoding for evolving large-scale neural networks. Artificial Life, 15 (2): 185--212, 2009.
[48]
T. Tieleman and G. Hinton. Lecture 6.5-RMSprop: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning, 4, 2012.
[49]
D. Whitley. Fundamental principles of deception. Foundations of Genetic Algorithms (FOGA 1), 1: 221, 1991.
[50]
D. Whitley, T. Starkweather, and C. Bogart. Genetic algorithms and neural networks: Optimizing connections and connectivity. Parallel Computing, 14: 347--361, 1990.
[51]
W. Wienholt. Minimizing the system error in feedforward neural networks with evolution strategy. In ICANN 93, pages 490--493. Springer, 1993.
[52]
D. R. Wilson and T. R. Martinez. The general inefficiency of batch training for gradient descent learning. Neural Networks, 16 (10): 1429--1451, 2003.
[53]
X. Yao. Evolving artificial neural networks. Proceedings of the IEEE, 87 (9): 1423--1447, 1999.

Cited By

View all
  • (2025)A Neural Network Approach for Domain Reduction in High-Dimensional Optimization ProblemComputational and Experimental Simulations in Engineering10.1007/978-3-031-81673-4_69(947-964)Online publication date: 3-Jan-2025
  • (2024)Onboard Class Incremental Learning for Resource-Constrained scenarios using Genetic Algorithm and TinyMLProceedings of the Genetic and Evolutionary Computation Conference Companion10.1145/3638530.3654392(299-302)Online publication date: 14-Jul-2024
  • (2024)Revisiting Evolutionary Algorithms for Optimization for Deep Learning: Introducing DL-HEAProceedings of the Genetic and Evolutionary Computation Conference Companion10.1145/3638530.3654341(435-438)Online publication date: 14-Jul-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
GECCO '16: Proceedings of the Genetic and Evolutionary Computation Conference 2016
July 2016
1196 pages
ISBN:9781450342063
DOI:10.1145/2908812
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 July 2016

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. artificial intelligence
  2. deep learning
  3. machine learning
  4. neural networks
  5. pattern recognition and classification

Qualifiers

  • Research-article

Funding Sources

Conference

GECCO '16
Sponsor:
GECCO '16: Genetic and Evolutionary Computation Conference
July 20 - 24, 2016
Colorado, Denver, USA

Acceptance Rates

GECCO '16 Paper Acceptance Rate 137 of 381 submissions, 36%;
Overall Acceptance Rate 1,669 of 4,410 submissions, 38%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)342
  • Downloads (Last 6 weeks)46
Reflects downloads up to 16 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2025)A Neural Network Approach for Domain Reduction in High-Dimensional Optimization ProblemComputational and Experimental Simulations in Engineering10.1007/978-3-031-81673-4_69(947-964)Online publication date: 3-Jan-2025
  • (2024)Onboard Class Incremental Learning for Resource-Constrained scenarios using Genetic Algorithm and TinyMLProceedings of the Genetic and Evolutionary Computation Conference Companion10.1145/3638530.3654392(299-302)Online publication date: 14-Jul-2024
  • (2024)Revisiting Evolutionary Algorithms for Optimization for Deep Learning: Introducing DL-HEAProceedings of the Genetic and Evolutionary Computation Conference Companion10.1145/3638530.3654341(435-438)Online publication date: 14-Jul-2024
  • (2024)Genetic Algorithm and Binary Masks for Co-Learning Multiple Dataset in Deep Neural Networks2024 10th International Conference on Control, Decision and Information Technologies (CoDIT)10.1109/CoDIT62066.2024.10708280(1-6)Online publication date: 1-Jul-2024
  • (2024)Competitive Swarm Optimizer: A decade surveySwarm and Evolutionary Computation10.1016/j.swevo.2024.10154387(101543)Online publication date: Jun-2024
  • (2024)Activation function optimization scheme for image classificationKnowledge-Based Systems10.1016/j.knosys.2024.112502305(112502)Online publication date: Dec-2024
  • (2024)Prediction of the mechanical performance of polyethylene fiber-based engineered cementitious composite (PE-ECC)Low-carbon Materials and Green Construction10.1007/s44242-024-00040-y2:1Online publication date: 18-Jun-2024
  • (2023)AN ENHANCED DIFFERENTIAL EVOLUTION ALGORITHM WITH ADAPTIVE WEIGHT BOUNDS FOR EFFICIENT TRAINING OF NEURAL NETWORKSInformatyka, Automatyka, Pomiary w Gospodarce i Ochronie Środowiska10.35784/iapgos.336613:1(4-13)Online publication date: 31-Mar-2023
  • (2023)Are alternatives to backpropagation useful for training Binary Neural Networks? An experimental study in image classificationProceedings of the 38th ACM/SIGAPP Symposium on Applied Computing10.1145/3555776.3577674(1171-1178)Online publication date: 27-Mar-2023
  • (2023)Strengthening Gradient Descent by Sequential Motion Optimization for Deep Neural NetworksIEEE Transactions on Evolutionary Computation10.1109/TEVC.2022.317105227:3(565-579)Online publication date: Jun-2023
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media