ABSTRACT
Despite the popularity and success of neural networks in research, the number of resulting commercial or industrial applications have been limited. A primary cause of this lack of adoption is due to the fact that neural networks are usually implemented as software running on general-purpose processors. Algorithms to implement a neural network in software are typically O(n2) problems -- as a result, neural networks are unable to provide the performance and scalability required in non-academic settings.
In this paper, we investigate how FPGAs can be used to take advantage of the inherent parallelism in neural networks to provide a better implementation in terms of scalability and performance. We will focus on the Restricted Boltzmann machine, a popular type of neural network, because its architecture is particularly well-suited to hardware designs. The proposed, multi-purpose hardware framework is designed to reduce the O(n22) problem into an O(n) implementation while only requiring O(n) resources. The framework is tested on a Xilinx Virtex II-Pro XC2VP70 FPGA running at 100MHz. The resources support a Restricted Boltzmann machine of 128x128 nodes, which results in a computational speed of 1.02 billion connection-updates-per-second and a speed-up of 35 fold over an optimized C program running on a 2.8GHz Intel processor.
- G. E. Hinton, S. Osindero, and Y. Teh, "A Fast Learning Algorithm for Deep Belief Nets," Neural Computation, vol. 18, pp. 1527--1554, 2006. Google ScholarDigital Library
- G. E. Hinton and R. R. Salakhutdinov, "Reducing the Dimensionality of Data with Neural Networks," Science, vol. 313, pp. 504--507, July 2006.Google ScholarCross Ref
- C. S. Lindsey and T. Lindblad, "Survey of neural network hardware," Applications and Science of Artificial Neural Networks, pp. 1194--1205, 1995.Google ScholarCross Ref
- Y. Liao, "Neural Networks in Hardware: A Survey," tech. rep., Santa Cruz, CA, USA, 2001.Google Scholar
- J. Zhu and P. Sutton, "FPGA Implementations of Neural Networks -- A Survey of a Decade of Progress," Lecture Notes in Computer Science, no. 2778, pp. 1062--1066, 2003.Google ScholarCross Ref
- P. Ferreira, P. Ribeiro, A. Antunes, and F. M. Dias, "A high bit resolution FPGA implementation of a FNN with a new algorithm for the activation function," Neurocomputing, vol. 71, pp. 71--77, 2007. Google ScholarDigital Library
- D. Shen, L. Jin, and X. Ma, "FPGA Implementation of Feature Extraction and Neural Network Classifier for Handwritten Digit Recognition," Lecture notes in computer science, vol. 3173, pp. 988--995, 2004.Google Scholar
- P. Smolensky, Information processing in dynamical systems: Foundations of harmony theory. Parallel Distributed Processing: Volume 1: Foundations, MIT Press, Cambridge, MA, 1986. Google ScholarDigital Library
- Y. Freund and D. Haussler, "Unsupervised Learning of Distributions on Binary Vectors Using Two Layer Networks," NIPS, pp. 912--919, 1992.Google Scholar
- D. Geman and S. Geman, "Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 6, no. 6, pp. 721--741, 1984.Google ScholarDigital Library
- D. H. Ackley, G. E. Hinton, and T. J. Sejnowski, "A Learning Algorithm for Boltzmann Machines," Cognitive Science, vol. 9, pp. 147--169, 1985.Google ScholarCross Ref
- G. E. Hinton and T. J. Sejnowski, Learning and relearning in Boltzmann machines. Parallel Distributed Processing: Volume 1: Foundations, MIT Press, Cambridge, MA, 1986. Google ScholarDigital Library
- M. Saldana and P. Chow, "TMD-MPI: An MPI Implementation for Multiple Processors across Multiple FPGAs," IEEE International Conference on Field-Programmable Logic and Applications (FPL 2006), pp. 329--334, 2006.Google Scholar
- M. A. Carreira-Perpinan and G. E. Hinton, "On Contrastive Divergence Learning," Artificial Intelligence and Statistics, 2005.Google Scholar
- P. Ferreira, P. Ribeiro, A. Antunes, and F. M. Dias, "A high bit resolution FPGA implementation of a FNN with a new algorithm for the activation function," Neurocomputing, vol. 71, pp. 71--77, 2007. Google ScholarDigital Library
Index Terms
- A high-performance FPGA architecture for restricted boltzmann machines
Recommendations
A Fully Pipelined FPGA Architecture of a Factored Restricted Boltzmann Machine Artificial Neural Network
Artificial neural networks (ANNs) are a natural target for hardware acceleration by FPGAs and GPGPUs because commercial-scale applications can require days to weeks to train using CPUs, and the algorithms are highly parallelizable. Previous work on ...
Building a multi-FPGA virtualized restricted boltzmann machine architecture using embedded MPI
FPGA '11: Proceedings of the 19th ACM/SIGDA international symposium on Field programmable gate arraysSeveral FPGA architectures exist for accelerating Restricted Boltzmann Machines (RBMs). However, the network size for most is limited by the amount of available on-chip memory. Therefore, many FPGAs are required to implement very large networks for use ...
High-performance reconfigurable hardware architecture for restricted Boltzmann machines
Despite the popularity and success of neural networks in research, the number of resulting commercial or industrial applications has been limited. A primary cause for this lack of adoption is that neural networks are usually implemented as software ...
Comments