skip to main content
10.1145/1508128.1508139acmconferencesArticle/Chapter ViewAbstractPublication PagesfpgaConference Proceedingsconference-collections
research-article

A comparison of CPUs, GPUs, FPGAs, and massively parallel processor arrays for random number generation

Published: 22 February 2009 Publication History

Abstract

The future of high-performance computing is likely to rely on the ability to efficiently exploit huge amounts of parallelism. One way of taking advantage of this parallelism is to formulate problems as "embarrassingly parallel" Monte-Carlo simulations, which allow applications to achieve a linear speedup over multiple computational nodes, without requiring a super-linear increase in inter-node communication. However, such applications are reliant on a cheap supply of high quality random numbers, particularly for the three main maximum entropy distributions: uniform, used as a general source of randomness; Gaussian, for discrete-time simulations; and exponential, for discrete-event simulations. In this paper we look at four different types of platform: conventional multi-core CPUs (Intel Core2); GPUs (NVidia GTX 200); FPGAs (Xilinx Virtex-5); and Massively Parallel Processor Arrays (Ambric AM2000). For each platform we determine the most appropriate algorithm for generating each type of number, then calculate the peak generation rate and estimated power efficiency for each device.

References

[1]
J. H. Ahrens and U. Dieter. Computer methods for sampling from the exponential and normal distributions. Commun. ACM, 15(10):873--882, 1972.
[2]
Ambric, Inc. Am2000 Family Architecture Reference, May 2008.
[3]
G. E. P. Box and M. E. Muller. A note on the generation of random normal deviates. Annals of Mathematical Statistics, 29(2):610--611, 1958.
[4]
R. P. Brent. Algorithm 488: A Gaussian pseudo-random number generator. Commun. ACM, 17(12):704--706, 1974.
[5]
R. Cheung, D. Lee, W. Luk, and J. Villasenor. Hardware generation of arbitrary random number distributions from uniform distributions via the inverson method. IEEE Transactions on VLSI, 15(8):952--962, 2007.
[6]
L. Devroye. Non-Uniform Random Variate Generation. Springer-Verlag New York, 1996.
[7]
M. Y. et. al. FPGA implementation of a data-driven stochastic biochemical simulator with the next reaction method. In Proc. Int. Conf. on Field Programmable Logic and Applications, pages 254--259, 2007.
[8]
G. E. Forsythe. Von neumann's comparison method for random sampling from the normal and other distributions. Mathematics of Computation, 26(120):817--826, 1972.
[9]
L. Howes and D. Thomas. GPU Gems 3: Efficient Random Number Generation and Application Using CUDA, chapter 37. Addison-Wesley, 2007.
[10]
P. L'Ecuyer. Tables of maximally equidistributed combined LFSR generators. Mathematics of Computation, 68(225):261--269, 1999.
[11]
P. L'Ecuyer. Elsevier Handbooks in Operations Research and Management Science: Simulation, chapter 3: Random Number Generation, pages 55--81. Elsevier Science, 2006.
[12]
P. L'Ecuyer and R. Simard. TestU01 random number test suite. www.iro.umontreal.ca/~simardr/indexe.html, 2007.
[13]
D. Lee, J. Villasenor, W. Luk, and P. Leong. A hardware Gaussian noise generator using the box-muller method and its error analysis. IEEE Transactions on Computers, 55(6):659--671, 2006.
[14]
D.-U. Lee, W. Luk, J. D. Villasenor, and P. Y. Cheung. A Gaussian noise generator for hardware-based simulations. IEEE Transactions On Computers, 53(12):1523--1534, december 2004.
[15]
D.-U. Lee, W. Luk, J. D. Villasenor, G. Zhang, and P. H. Leong. A hardware Gaussian noise generator using the wallace method. IEEE Transactions on VLSI Systems, 13(8):911--920, 2005.
[16]
G. Marsaglia. Xorshift RNGs. Journal of Statistical Software, 8(14):1--6, 2003.
[17]
G. Marsaglia and W. W. Tsang. The ziggurat method for generating random variables. Journal of Statistical Software, 5(8):1--7, 2000.
[18]
F. Panneton, P. L'Ecuyer, and M. Matsumoto. Improved long-period generators based on linear recurrences modulo 2. ACM Transactions on Mathematical Software, 32(1):1--16, 2006.
[19]
R. A. Rueppel. Correlation immunity and the summation generator. In CRYPTO 85, pages 260--272, 1986.
[20]
M. Saito and M. Matsumoto. SIMD-oriented fast mersenne twister: a 128-bit pseudorandom number generator. In Monte-Carlo and Quasi-Monte Carlo Methods, pages 607--622, 2006.
[21]
D. B. Thomas and W. Luk. High quality uniform random number generation using LUT optimised state-transition matrices. Journal of VLSI Signal Processing, 47(1), 2007.
[22]
D. B. Thomas and W. Luk. Non-uniform random number generation through piecewise linear approximations. IET Computers and Digital Techniques, 1(4):312--321, 2007.
[23]
D. B. Thomas and W. Luk. Credit risk modelling using hardware accelerated monte-carlo simulation. In Proc. IEEE Symp. on FPGAs for Custom Computing Machines, 2008.
[24]
D. B. Thomas and W. Luk. Sampling from the exponential distribution using independent bernoulli variates. In Proc. Int. Conf. on Field Programmable Logic and Applications, 2008.
[25]
D. B. Thomas, W. Luk, P. H. Leong, and J. D. Villasenor. Gaussian random number generators. ACM Computing Surveys, 39(4):11, 2007.
[26]
C. S. Wallace. Fast pseudorandom generators for normal and exponential variates. ACM Transactions on Mathematical Software, 22(1):119--127, 1996.
[27]
G. L. Zhang, P. H. Leong, D.-U. Lee, J. D. Villasenor, R. C. Cheung, and W. Luk. Ziggurat-based hardware Gaussian random number generator. In Proc. Int. Conf. on Field Programmable Logic and Applications, pages 275--280. IEEE Computer Society Press, 2005.

Cited By

View all
  • (2025)Exploring Hybrid FitzHugh-Rinzel (FHR) Neuron Model Behavior: Cost-Effective FPGA Implementation for High-Frequency and High-Precision Matching by Electromagnetic Flux EffectsIEEE Transactions on Circuits and Systems I: Regular Papers10.1109/TCSI.2024.350342172:2(844-853)Online publication date: Feb-2025
  • (2024)FPGA Implementation of Complex-Valued Neural Network for Polar-Represented Image ClassificationSensors10.3390/s2403089724:3(897)Online publication date: 30-Jan-2024
  • (2024)Benchmarking Matrix Multiplications For Variable Qubit Size And Depth2024 ITU Kaleidoscope: Innovation and Digital Transformation for a Sustainable World (ITU K)10.23919/ITUK62727.2024.10772828(1-7)Online publication date: 21-Oct-2024
  • Show More Cited By

Index Terms

  1. A comparison of CPUs, GPUs, FPGAs, and massively parallel processor arrays for random number generation

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    FPGA '09: Proceedings of the ACM/SIGDA international symposium on Field programmable gate arrays
    February 2009
    302 pages
    ISBN:9781605584102
    DOI:10.1145/1508128
    • General Chair:
    • Paul Chow,
    • Program Chair:
    • Peter Cheung
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 22 February 2009

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. fpga
    2. gpu
    3. monte-carlo
    4. mppa
    5. random numbers

    Qualifiers

    • Research-article

    Conference

    FPGA '09
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 125 of 627 submissions, 20%

    Upcoming Conference

    FPGA '25

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)80
    • Downloads (Last 6 weeks)12
    Reflects downloads up to 22 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2025)Exploring Hybrid FitzHugh-Rinzel (FHR) Neuron Model Behavior: Cost-Effective FPGA Implementation for High-Frequency and High-Precision Matching by Electromagnetic Flux EffectsIEEE Transactions on Circuits and Systems I: Regular Papers10.1109/TCSI.2024.350342172:2(844-853)Online publication date: Feb-2025
    • (2024)FPGA Implementation of Complex-Valued Neural Network for Polar-Represented Image ClassificationSensors10.3390/s2403089724:3(897)Online publication date: 30-Jan-2024
    • (2024)Benchmarking Matrix Multiplications For Variable Qubit Size And Depth2024 ITU Kaleidoscope: Innovation and Digital Transformation for a Sustainable World (ITU K)10.23919/ITUK62727.2024.10772828(1-7)Online publication date: 21-Oct-2024
    • (2024)Accelerating electrostatic particle-in-cell simulation: A novel FPGA-based approach for efficient plasma investigationsPLOS ONE10.1371/journal.pone.030257819:6(e0302578)Online publication date: 3-Jun-2024
    • (2024)REC: REtime Convolutional Layers to Fully Exploit Harvested Energy for ReRAM-based CNN AcceleratorsACM Transactions on Embedded Computing Systems10.1145/365259323:6(1-25)Online publication date: 11-Sep-2024
    • (2024)A Systematic Study of Parallelization Strategies for Optimizing Scientific Computing Performance Bounds2024 IEEE 37th International System-on-Chip Conference (SOCC)10.1109/SOCC62300.2024.10737865(1-6)Online publication date: 16-Sep-2024
    • (2024)A Hybrid Parallelism Framework of SPH for the Applications in Automobile GearboxComputational and Experimental Simulations in Engineering10.1007/978-3-031-77489-8_34(432-443)Online publication date: 3-Dec-2024
    • (2023)An FPGA-Based Performance Analysis of Hardware Caching Techniques for Blockchain Key-Value DatabaseApplied Sciences10.3390/app1307409213:7(4092)Online publication date: 23-Mar-2023
    • (2023)Edge Intelligence with Distributed Processing of DNNs: A SurveyComputer Modeling in Engineering & Sciences10.32604/cmes.2023.023684136:1(5-42)Online publication date: 2023
    • (2023)An Energy-Efficient Bayesian Neural Network Accelerator With CiM and a Time-Interleaved Hadamard Digital GRNG Using 22-nm FinFETIEEE Journal of Solid-State Circuits10.1109/JSSC.2023.328318658:10(2826-2838)Online publication date: Oct-2023
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media