skip to main content
10.1145/2593069.2593156acmotherconferencesArticle/Chapter ViewAbstractPublication PagesdacConference Proceedingsconference-collections
research-article

Low Power GPGPU Computation with Imprecise Hardware

Authors Info & Claims
Published:01 June 2014Publication History

ABSTRACT

Massively parallel computation in GPUs significantly boosts performance of compute-intensive applications but creates power and thermal issues that limit further performance scaling. This paper demonstrates significant GPGPU power savings by relaxing application accuracy requirements and enabling the use of low power imprecise hardware (IHW). A synthesized set of novel imprecise floating point arithmetic units is presented. GPGPU-Sim and GPUWattch are used to estimate impacts of IHW units on output quality and system-level power consumption, providing a quality-power tradeoff model for application-specific optimization. Experimental results for a 45 nm process show up to 32% power savings with negligible impacts on output quality.

References

  1. NVIDIA, "Whitepaper NVIDIA's Next Generation CUDA Compute Architecture," pp. 1--22, 2009, URL: http://www.nvidia.com/content/PDF/fermi_white_papers/NVIDIA_Fermi_Compute_Architecture_Whitepaper.pdfGoogle ScholarGoogle Scholar
  2. S. Che, M. Boyer, J. Meng, D. Tarjan, J. W. Sheaffer, S.-H. Lee, and K. Skadron, "Rodinia: A benchmark suite for heterogeneous computing," IISWC, pp. 44--54, Oct. 2009 Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. A. Bakhoda, G. L. Yuan, W. W. L. Fung, H. Wong, and T. M. Aamodt, "Analyzing CUDA workloads using a detailed GPU simulator," ISPASS, pp. 163--174, Apr. 2009Google ScholarGoogle Scholar
  4. J. Leng, T. Hetherington, A. Eltantawy, S. Gilani, N. S. Kim, T. M. Aamodt, and V. J. Reddi, "GPUWattch: Enabling energy optimizations in GPGPUs," ISCA, pp. 487--498, June 2013 Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. A. B. Kahng and S. Kang, "Accuracy-configurable adder for approximate arithmetic designs," DAC, pp. 820--825, June 2012 Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. M. Weber, M. Putic, H. Zhang, and J. Lach, "Balancing adder for error tolerant applications," ISCAS, pp. 3038--3041, May 2013Google ScholarGoogle Scholar
  7. K. Du, P. Varman, and K. Mohanram, "Static window addition: A new paradigm for the design of variable latency adders," ICCD, pp. 455--456, Oct. 2011 Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. V. Gupta, D. Mohapatra, S. P. Park, A. Raghunathan, and K. Roy, "IMPACT: IMPrecise adders for low-power approximate computing," ISLPED, pp. 409--414, Aug. 2011 Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. K. E. Wires, M. J. Schulte, and J. E. Stine, "Variable-correction truncated floating point multipliers," ACSSC, pp. 1344--1348, Oct.-Nov. 2000Google ScholarGoogle Scholar
  10. A. Gupta, S. Mandavalli, V. J. Mooney, K.-V. Ling, A. Basu, H. Johan, and B. Tandianus, "Low power probabilistic floating point multiplier design," ISVLSI, pp. 182--187, July 2011 Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. J. Ying, F. Tong, D. Nagle, and R. A. Rutenbar, "Reducing power by optimizing the necessary precision / range of floating-point arithmetic," IEEE TVLSI, vol. 8, no. 3, pp. 273--286, June 2000 Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. K. Pillai, R. V. K. Pillai, D. Al-Khalili, and a. J. Al-Khalili, "A low power approach to floating point adder design," ICCD, pp. 178--185, Oct. 1997 Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. J. Won and K. Choi, "Low power self-timed floating-point divider in 0.25 um technology," ESSCIRC, pp. 113--116, Sept. 2000Google ScholarGoogle Scholar
  14. M. Kuhlmann and K. K. Parhi, "Fast low-power shared division and square-root architecture," ICCD, pp. 128--135, Oct. 1998 Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. V. Gupta, D. Mohapatra, A. Raghunathan, and K. Roy, "Low-power digital signal processing using approximate adders," IEEE TCAD, vol. 32, no. 1, pp. 124--137, Jan. 2013Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. N. R. Shanbhag and S. Member, "Reliable low-power digital signal processing via educed precision redundancy," IEEE TVLSI, vol. 12, no. 5, pp.497--510, May 2004 Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. J. Pool, A. Lastra, M. Singh, and N. C. Hill, "Energy-precision tradeoffs in mobile graphics processing units," ICCD, pp. 60--67, Oct. 2008Google ScholarGoogle Scholar
  18. M. D. Ercegovac and T. Lang, Digital Arithmetic. Morgan Kaufmann, Oxford, Elsevier Science, 2004Google ScholarGoogle Scholar
  19. R. E. Caflisch, "Monte Carlo and quasi-Monte Carlo methods," Acta Numerica, vol. 7, pp. 1--49, Jan. 1998Google ScholarGoogle ScholarCross RefCross Ref
  20. S. Li, "McPAT: An integrated power, area, and timing modeling framework for multicore and manycore architectures," Mirco, pp. 469--480, Dec. 2009 Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. K. Skadron, M. R. Stan, W. Huang, S. Velusamy, K. Sankaranarayanan, and D. Tarjan, "Temperature-aware microarchitecture." ISCA, pp. 2--13, 2003 Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Y. Yu and S. T. Acton, "Speckle reducing anisotropic diffusion," IEEE TIP, vol. 11, no. 11, pp. 1260--1270, Jan. 2002 Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. A. J. Pinho, D. Electrnica, and T. Inesc, "Figures of merit for quality assessment of binary edge maps," ICIP, vol. 3, pp. 591--594, Sept. 1996Google ScholarGoogle Scholar
  24. Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, "Image quality assessment: from error visibility to structural similarity," IEEE TIP, vol. 13, no. 4, pp. 600--612, Apr. 2004 Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Low Power GPGPU Computation with Imprecise Hardware

      Recommendations

      Reviews

      Kai Diethelm

      In high-performance computing, reducing the amount of energy required to perform the actual computations has recently become a highly important issue. In this paper, Zhang et al. deal with this topic in the framework of a general-purpose computing on graphics processing units (GPGPU)-based hardware platform. The authors observe that certain arithmetical operations are very energy intensive and could be replaced by corresponding first-order approximations requiring a significantly smaller amount of energy. Thus, they suggest using so-called “imprecise hardware” where, for example, a classical hardware multiplier is implemented in such a way that the usual 24×24-bit mantissa multiplication is replaced by a 25×25-bit addition. In combination with a suitable handling of the exponents, this leads to an approximate way of computing the product. Using appropriate simulation tools, the authors demonstrate that their approach leads to substantially smaller energy requirements. Similar ideas are introduced for other frequently used arithmetical operations. Clearly, such an approach has a negative impact on the accuracy of the final result, but theoretical analysis and some concrete examples show that the degradation of the output is usually not severe. Online Computing Reviews Service

      Access critical reviews of Computing literature here

      Become a reviewer for Computing Reviews.

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Other conferences
        DAC '14: Proceedings of the 51st Annual Design Automation Conference
        June 2014
        1249 pages
        ISBN:9781450327305
        DOI:10.1145/2593069

        Copyright © 2014 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 1 June 2014

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Research
        • Refereed limited

        Acceptance Rates

        Overall Acceptance Rate1,770of5,499submissions,32%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader