skip to main content
research-article
Public Access

Heterogeneous Von Neumann/dataflow microprocessors

Published: 21 May 2019 Publication History

Abstract

General-purpose processors (GPPs), which traditionally rely on a Von Neumann-based execution model, incur burdensome power overheads, largely due to the need to dynamically extract parallelism and maintain precise state. Further, it is extremely difficult to improve their performance without increasing energy usage. Decades-old explicit-dataflow architectures eliminate many Von Neumann overheads, but have not been successful as stand-alone alternatives because of poor performance on certain workloads, due to insufficient control speculation and communication overheads.
We observe a synergy between out-of-order (OOO) and explicit-dataflow processors, whereby dynamically switching between them according to the behavior of program phases can greatly improve performance and energy efficiency. This work studies the potential of such a paradigm of heterogeneous execution models, by developing a specialization engine for explicit-dataflow (SEED) and integrating it with a standard out-of-order (OOO) core. When integrated with a dual-issue OOO, it becomes both faster (1.33x) and dramatically more energy efficient (1.70x). Integrated with an in-order core, it becomes faster than even a dual-issue OOO, with twice the energy efficiency.

References

[1]
Arvind, K., Nikhil, R.S. Executing a program on the MIT tagged-token dataflow architecture. IEEE Trans. Comput. 39, 3 (1990), 300--318.
[2]
Budiu, M., Artigas, P.V., Goldstein S.C. Dataflow: A complement to superscalar. In ISPASS '05 Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software (March 20--22, 2005) IEEE Computer Society, Washington, DC, USA, 177--186.
[3]
Burger, D., Keckler, S.W., McKinley, K.S., Dahlin, M., John, L.K., Lin, C., Moore, C.R., Burrill, J., McDonald, R.G., Yoder, W., Team, T.T. Scaling to the end of silicon with edge architectures. Computer 37, 7 (July 2004), 44--55.
[4]
Clark, N., Kudlur, M., Park, H., Mahlke, S., Flautner, K. Application-specific processing on a general-purpose core via transparent instruction set customization. In MICRO 37 Proceedings of the 37th Annual IEEE/ACM International Symposium on Microarchitecture (Portland, Oregon, December 04--08, 2004), IEEE Computer Society, Washington, DC, USA, 30--40.
[5]
Gebhart, M., Maher, B.A., Coons, K.E., Diamond, J., Gratz, P., Marino, M., Ranganathan, N., Robatmili, B., Smith, A., Burrill, J., Keckler, S.W., Burger, D., McKinley, K.S. An evaluation of the trips computer system. In ASPLOS XIV Proceedings of the 14th International Conference on Architectural Support for Programming Languages and Operating Systems (Washington, DC, USA, March 07--11, 2009), ACM, New York, NY, USA, 1--12.
[6]
Govindaraju, V., Ho, C.-H., Nowatzki, T., Chhugani, J., Satish, N., Sankaralingam, K., Kim, C. DYSER: Unifying functionality and parallelism specialization for energy-efficient computing. IEEE Micro 32, 5 (Sept. 2012), 38--51.
[7]
Gupta, S., Feng, S., Ansari, A., Mahlke, S., August, D. Bundled execution of recurring traces for energy-efficient general purpose processing. In ASPLOS XIV Proceedings of the 14th International Conference on Architectural Support for Programming Languages and Operating Systems (Washington, DC, USA, March 07--11, 2009), ACM, New York, NY, USA, 1--12.
[8]
Hayenga, M., Naresh, V., Lipasti, M. Revolver: Processor architecture for power efficient loop execution. In 2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA) (Orlando, FL, USA, 2014), IEEE, 591--602.
[9]
Kocher, P., Horn, J., Fogh, A., Genkin, D., Gruss, D., Haas, W., Hamburg, M., Lipp, M., Mangard, S., Prescher, T., Schwarz, M., Yarom, Y. Spectre attacks: Exploiting speculative execution. In 40th IEEE Symposium on Security and Privacy (S\&P'19) (IEEE Computer Society, 2019).
[10]
Lee, C., Potkonjak, M., Mangione-Smith, W. MediaBench: A tool for evaluating and synthesizing multimedia and communications systems. In MICRO 30 Proceedings of the 30th Annual ACM/IEEE International Symposium on Microarchitecture (Research Triangle Park, North Carolina, USA, December 01--03, 1997), 330--335.
[11]
Li, S., Ahn, J.H., Strong, R.D., Brockman, J.B., Tullsen, D.M., Jouppi, N.P. McPAT: An integrated power, area, and timing modeling framework for multicore and manycore architectures. In MICRO 42 Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture (New York, New York, December 12--16, 2009), 469--480.
[12]
Liu, Y., Furber S. A low power embedded dataflow coprocessor. In ISVLSI '05 Proceedings of the IEEE Computer Society Annual Symposium on VLSI: New Frontiers in VLSI Design (May 11--12, 2005), 246--247.
[13]
Nowatzki, T., Gangadhar, V., Ardalani, N., Sankaralingam, K. Stream-dataflow acceleration. In ISCA '17 Proceedings of the 44th Annual International Symposium on Computer Architecture (Toronto, ON, Canada, June 24--28, 2017), 416--429.
[14]
Nowatzki, T., Gangadhar, V., Sankaralingam, K., Wright, G. Pushing the limits of accelerator efficiency while retaining programmability. In 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA), (March 12--16, 2016), 27--39.
[15]
Nowatzki, T., Govindaraju, V., Sankaralingam, K. A graph-based program representation for analyzing hardware specialization approaches. Comput Archit. Lett. 14, 2 (July-Dec 2015), 94--98.
[16]
Nowatzki, T., Sankaralingam, K. Analyzing behavior specialized acceleration. In ASPLOS '16 Proceedings of the Twenty-First International Conference on Architectural Support for Programming Languages and Operating Systems (Atlanta, Georgia, USA, April 02--06, 2016), ACM, New York, NY, USA, 697--711.
[17]
Nowatzki, T., Sartin-Tarm, M., De Carli, L., Sankaralingam, K., Estan, C., Robatmili, B. A general constraint-centric scheduling framework for spatial architectures. In PLDI '13 Proceedings of the 34th ACM SIGPLAN Conference on Programming Language Design and Implementation (Seattle, Washington, USA, June 16--19, 2013), ACM, New York, NY, USA, 495--506.
[18]
Padmanabha, S., Lukefahr, A., Das, R., Mahlke, S.A. Trace based phase prediction for tightly-coupled heterogeneous cores. In MICRO-46 Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture (Davis, California, December 07--11, 2013), ACM, New York, NY, USA, 445--456.
[19]
Papadopoulos, G.M. Monsoon: An explicit token-store architecture. In ISCA '90 Proceedings of the 17th Annual International Symposium on Computer Architecture (Seattle, Washington, USA, May 28--31, 1990), ACM, New York, NY, USA, 82--91.
[20]
Swanson, S., Michelson, K., Schwerin, A., Oskin, M. WaveScalar. In MICRO 36 Proceedings of the 36<sup>th</sup> Annual IEEE/ACM International Symposium on Microarchitecture (December 03--05, 2003), IEEE Computer Society, Washington, DC, USA, 291.
[21]
Venkatesh, G., Sampson, J., Goulding, N., Garcia, S., Bryksin, V., Lugo-Martinez, J., Swanson, S., Taylor, M.B. Conservation cores: Reducing the energy of mature computations. In ASPLOS XV Proceedings of the Fifteenth Edition of ASPLOS on Architectural Support for Programming Languages and Operating Systems (Pittsburgh, Pennsylvania, USA, March 13--17, 2010), ACM, New York, NY, USA, 205--218.
[22]
Watkins, M.A., Nowatzki, T., Carno, A. Software transparent dynamic binary translation for coarse-grain reconfigurable architectures. In 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA) (March 12--16, 2016), 138--150.

Cited By

View all
  • (2023)Clockhands: Rename-free Instruction Set Architecture for Out-of-order ProcessorsProceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3613424.3614272(1-16)Online publication date: 28-Oct-2023
  • (2023)Klotski: DNN Model Orchestration Framework for Dataflow Architecture Accelerators2023 IEEE/ACM International Conference on Computer Aided Design (ICCAD)10.1109/ICCAD57390.2023.10323893(1-9)Online publication date: 28-Oct-2023
  • (2022)A Loop Optimization Method for Dataflow Architecture2022 IEEE 24th Int Conf on High Performance Computing & Communications; 8th Int Conf on Data Science & Systems; 20th Int Conf on Smart City; 8th Int Conf on Dependability in Sensor, Cloud & Big Data Systems & Application (HPCC/DSS/SmartCity/DependSys)10.1109/HPCC-DSS-SmartCity-DependSys57074.2022.00059(202-211)Online publication date: Dec-2022
  • Show More Cited By

Index Terms

  1. Heterogeneous Von Neumann/dataflow microprocessors

        Recommendations

        Comments

        Information & Contributors

        Information

        Published In

        cover image Communications of the ACM
        Communications of the ACM  Volume 62, Issue 6
        June 2019
        85 pages
        ISSN:0001-0782
        EISSN:1557-7317
        DOI:10.1145/3336127
        Issue’s Table of Contents
        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        Published: 21 May 2019
        Published in CACM Volume 62, Issue 6

        Permissions

        Request permissions for this article.

        Check for updates

        Qualifiers

        • Research-article
        • Research
        • Refereed

        Funding Sources

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)744
        • Downloads (Last 6 weeks)99
        Reflects downloads up to 15 Feb 2025

        Other Metrics

        Citations

        Cited By

        View all
        • (2023)Clockhands: Rename-free Instruction Set Architecture for Out-of-order ProcessorsProceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3613424.3614272(1-16)Online publication date: 28-Oct-2023
        • (2023)Klotski: DNN Model Orchestration Framework for Dataflow Architecture Accelerators2023 IEEE/ACM International Conference on Computer Aided Design (ICCAD)10.1109/ICCAD57390.2023.10323893(1-9)Online publication date: 28-Oct-2023
        • (2022)A Loop Optimization Method for Dataflow Architecture2022 IEEE 24th Int Conf on High Performance Computing & Communications; 8th Int Conf on Data Science & Systems; 20th Int Conf on Smart City; 8th Int Conf on Dependability in Sensor, Cloud & Big Data Systems & Application (HPCC/DSS/SmartCity/DependSys)10.1109/HPCC-DSS-SmartCity-DependSys57074.2022.00059(202-211)Online publication date: Dec-2022
        • (2021)Accelerator-level parallelismCommunications of the ACM10.1145/346097064:12(36-38)Online publication date: 19-Nov-2021
        • (2021)Classification and Mapping of Model Elements for Designing Runtime Reconfigurable SystemsIEEE Access10.1109/ACCESS.2021.31298999(156337-156360)Online publication date: 2021
        • (2021)DRT: A Lightweight Runtime for Developing Benchmarks for a Dataflow Execution ModelArchitecture of Computing Systems10.1007/978-3-030-81682-7_6(84-100)Online publication date: 7-Jun-2021

        View Options

        View options

        PDF

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        Digital Edition

        View this article in digital edition.

        Digital Edition

        Magazine Site

        View this article on the magazine site (external)

        Magazine Site

        Login options

        Full Access

        Figures

        Tables

        Media

        Share

        Share

        Share this Publication link

        Share on social media