skip to main content
10.1145/2934583.2934587acmconferencesArticle/Chapter ViewAbstractPublication PagesislpedConference Proceedingsconference-collections
research-article

A Fully Parameterizable Low Power Design of Vector Fused Multiply-Add Using Active Clock-Gating Techniques

Authors Info & Claims
Published:08 August 2016Publication History

ABSTRACT

The need for power-efficiency is driving a rethink of design decisions in processor architectures. While vector processors succeeded in the high-performance market in the past, they need a re-tailoring for the mobile market that they are entering now. Floating point fused multiply-add, being a power consuming functional unit, deserves special attention. Although clock-gating is a well-known method to reduce switching power in synchronous designs, there are unexplored opportunities for its application to vector processors, especially when considering active operating mode. In this research, we comprehensively identify, propose, and evaluate the most suitable clock-gating techniques for vector fused multiply-add units (VFU). These techniques ensure power savings without jeopardizing the timing. Using vector masking and vector multi-lane-aware clock-gating, we report power reductions of up to 52%, assuming active VFU operating at the peak performance. Among other findings, we observe that vector instruction-based clock-gating techniques achieve power savings for all vector floating-point instructions. We perform this research in a fully parameterizable and automated fashion using various tools at both architectural and circuit levels.

References

  1. Berkeley hardware floating-point units. https://github.com/ucb-bar/berkeley-hardfloat/, 2015.Google ScholarGoogle Scholar
  2. Reference Manual for ARM Architecture - ARMv7-A. http://arm.com/, 2015.Google ScholarGoogle Scholar
  3. O. Arcas et al. An empirical evaluation of high-level synthesis languages and tools for database acceleration. In FPL, pages 1--8, 2014.Google ScholarGoogle Scholar
  4. K. Asanović. Vector microprocessor. PhD Thesis, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. J. Bachrach et al. Chisel: constructing hardware in a scala embedded language. In DAG, pages 1216--1225, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. M. Ekman and P. Stenstrom. A robust main-memory compression scheme. In ISGA, 2005.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. M. Ercegovac and T. Lang. Digital Arithmetic. MKP, 2003.Google ScholarGoogle Scholar
  8. R. Espasa et al. Vector architectures: past, present and future. In ISC, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. R. Espasa et al. Tarantula: a vector extension to the Alpha architecture. In ISCA, pages 281--292, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. S. Galal et al. Fpu generator for design space exploration. In ARITH, pages 25--34, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. K. R. Gandhi and N. R. Mahapatra. A study of hardware techniques that dynamically exploit frequent operands to reduce power consumption in integer function units. In ICCD, pages 426--428, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Y. Lee et al. Exploring the tradeoffs between programmability and efficiency in data-parallel accelerators. In ISCA, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. H. Li et al. Deterministic clock gating for microprocessor power reduction. HPCA, pages 113--, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. N. Mohyuddin et al. Deterministic clock gating to eliminate wasteful activity due to wrong-path instructions in out-of-order superscalar processors. In ICCD, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. B. Nikolic. Simpler, more efficient design. In ESSCIRC, 2015.Google ScholarGoogle ScholarCross RefCross Ref
  16. J. Preiss et al. Advanced clockgating schemes for fused-multiply-add-type floating-point units. In ARITH, pages 48--56, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. I. Ratković et al. On the selection of adder unit in energy efficient vector processing. In ISQED, 2013.Google ScholarGoogle ScholarCross RefCross Ref
  18. I. Ratković et al. Joint circuit-system design space exploration of multiplier unit structure for energy-efficient vector processors. In ISVLSI, 2015.Google ScholarGoogle ScholarCross RefCross Ref
  19. I. Ratković et al. An overview of architecture-level power-and energy-efficient design techniques. Advances in Computers, 2015.Google ScholarGoogle Scholar
  20. M. Stanić et al. Valib and simplevector: tools for rapid initial research on vector architectures. In CF, page 7, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. T. Xanthopoulos and A. P. Chandrakasan. A low-power idct macro-cell for mpeg-2 mp@ ml exploiting data distribution properties for minimal activity. IEEE JSSC, 34(5):693--703, 1999.Google ScholarGoogle ScholarCross RefCross Ref
  22. B. Zimmer et al. A risc-v vector processor with simultaneous-switching switched-capacitor dc--dc converters in 28 nm fdsoi. IEEE Journal of Solid-State Circuits, 51(4):930--942, 2016.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. A Fully Parameterizable Low Power Design of Vector Fused Multiply-Add Using Active Clock-Gating Techniques

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          ISLPED '16: Proceedings of the 2016 International Symposium on Low Power Electronics and Design
          August 2016
          392 pages
          ISBN:9781450341851
          DOI:10.1145/2934583

          Copyright © 2016 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 8 August 2016

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article
          • Research
          • Refereed limited

          Acceptance Rates

          ISLPED '16 Paper Acceptance Rate60of190submissions,32%Overall Acceptance Rate398of1,159submissions,34%

          Upcoming Conference

          ISLPED '24

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader