skip to main content
10.1145/2818950.2818985acmotherconferencesArticle/Chapter ViewAbstractPublication PagesmemsysConference Proceedingsconference-collections
research-article
Public Access

Understanding Energy Aspects of Processing-near-Memory for HPC Workloads

Published: 05 October 2015 Publication History

Abstract

Interests in the concept of processing-near-memory (PNM) have been reignited with recent improvements of the 3D integration technology. In this work, we analyze the energy consumption characteristics of a system which comprises a conventional processor and a 3D memory stack with fully-programmable cores. We construct a high-level analytical energy model based on the underlying architecture and the technology with which each component is built. From the preliminary experiments with 11 HPC benchmarks from Mantevo benchmark suite, we observed that misses per kilo instructions (MPKI) of last-level cache (LLC) is one of the most important characteristics in determining the friendliness of the application to the PNM execution.

References

[1]
Haswell-Based Xeon E3-1200. http://goo.gl/EDF3nh.
[2]
Inside the HMC. http://goo.gl/DYoMY4.
[3]
Intel Xeon Processor E3-1275. http://goo.gl/EjmNJd.
[4]
MacSim Simulator. https://goo.gl/gkosY6.
[5]
Mantevo. https://mantevo.org/.
[6]
D. Elliott, M. Stumm, W. Snelgrove, C. Cojocaru, and R. McKenzie. Computational RAM: implementing processors in memory. 16(1):32--41, Jan 1999.
[7]
K. Flautner, N. S. Kim, S. Martin, D. Blaauw, and T. Mudge. Drowsy caches: simple techniques for reducing leakage power. In Computer Architecture, 2002. Proceedings. 29th Annual International Symposium on, pages 148--157, 2002.
[8]
J. Jeddeloh and B. Keeth. Hybrid memory cube new DRAM architecture increases density and performance. In VLSI Technology (VLSIT), 2012 Symposium on, pages 87--88, June 2012.
[9]
Y. Kang, W. Huang, S.-M. Yoo, D. Keen, Z. Ge, V. Lam, P. Pattnaik, and J. Torrellas. FlexRAM: toward an advanced intelligent memory system. In Computer Design, 1999. (ICCD '99) International Conference on, pages 192--201, 1999.
[10]
G. Kim, J. Kim, J. H. Ahn, and J. Kim. Memory-centric system interconnect design with hybrid memory cubes. In Parallel Architectures and Compilation Techniques (PACT), 2013 22nd International Conference on, pages 145--155, Sept 2013.
[11]
N. Muralimanohar, R. Balasubramonian, and N. P. Jouppi. CACTI 6.0: A Tool to Understand Large Caches, 2009.
[12]
A. Naveh, E. Rotem, A. Mendelson, S. Gochman, R. Chabukswar, K. Krishnan, and A. Kumar. Power and thermal management in the Intel core duo processor. Intel Technology Journal, 10(2), 2006.
[13]
S. Pugsley, J. Jestes, H. Zhang, R. Balasubramonian, V. Srinivasan, A. Buyuktosunoglu, A. Davis, and F. Li. NDC: Analyzing the impact of 3D-stacked memory+logic devices on MapReduce workloads. In Performance Analysis of Systems and Software (ISPASS), 2014 IEEE International Symposium on, pages 190--200, March 2014.
[14]
A. F. Rodrigues, K. S. Hemmert, B. W. Barrett, C. Kersey, R. Oldfield, M. Weston, R. Risen, J. Cook, P. Rosenfeld, E. CooperBalls, and B. Jacob. The structural simulation toolkit. SIGMETRICS Perform. Eval. Rev., 38(4):37--42, Mar. 2011.
[15]
P. Rosenfeld. Performance Exploration of the Hybrid Memory Cube. Ph.D. dissertation, University of Maryland, College Park, 2014.
[16]
P. Rosenfeld, E. Cooper-Balis, and B. Jacob. DRAMSim2: A Cycle Accurate Memory System Simulator. IEEE Comput. Archit. Lett., 10(1):16--19, Jan. 2011.
[17]
G. Sandhu. DRAM Scaling & Bandwidth Challenges. In NSF Workshop on Emerging Technologies for Interconnects (WETI), February 2012.
[18]
T. Sherwood, E. Perelman, G. Hamerly, and B. Calder. Automatically characterizing large scale program behavior. In Proceedings of the 10th International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS X, pages 45--57, New York, NY, USA, 2002. ACM.
[19]
A. N. Udipi, N. Muralimanohar, R. Balasubramonian, A. Davis, and N. P. Jouppi. Combining Memory and a Controller with Photonics Through 3D-stacking to Enable Scalable and Energy-efficient Systems. In Proceedings of the 38th Annual International Symposium on Computer Architecture, ISCA '11, pages 425--436, New York, NY, USA, 2011. ACM.
[20]
A. N. Udipi, N. Muralimanohar, N. Chatterjee, R. Balasubramonian, A. Davis, and N. P. Jouppi. Rethinking DRAM Design and Organization for Energy-constrained Multi-cores. In Proceedings of the 37th Annual International Symposium on Computer Architecture, ISCA '10, pages 175--186, New York, NY, USA, 2010. ACM.
[21]
D. H. Woo, N. H. Seong, D. Lewis, and H.-H. Lee. An optimized 3D-stacked memory architecture by exploiting excessive, high-density TSV bandwidth. In High Performance Computer Architecture (HPCA), 2010 IEEE 16th International Symposium on, pages 1--12, Jan 2010.
[22]
D. Zhang, N. Jayasena, A. Lyashevsky, J. L. Greathouse, L. Xu, and M. Ignatowski. TOP-PIM: Throughput-oriented Programmable Processing in Memory. In Proceedings of the 23rd International Symposium on High-performance Parallel and Distributed Computing, HPDC '14, pages 85--98, New York, NY, USA, 2014. ACM.
[23]
D. P. Zhang, N. Jayasena, A. Lyashevsky, J. Greathouse, M. Meswani, M. Nutter, and M. Ignatowski. A New Perspective on Processing-in-memory Architecture Design. In Proceedings of the ACM SIGPLAN Workshop on Memory Systems Performance and Correctness, MSPC '13, pages 7:1--7:3, New York, NY, USA, 2013. ACM.

Cited By

View all
  • (2022)Co-scheduling Ensembles of In Situ Workflows2022 IEEE/ACM Workshop on Workflows in Support of Large-Scale Science (WORKS)10.1109/WORKS56498.2022.00011(43-51)Online publication date: Nov-2022
  • (2021)DAMOV: A New Methodology and Benchmark Suite for Evaluating Data Movement BottlenecksIEEE Access10.1109/ACCESS.2021.31109939(134457-134502)Online publication date: 2021
  • (2020)Making Better Use of Processing-in-Memory Through Potential-Based Task OffloadingIEEE Access10.1109/ACCESS.2020.29834328(61631-61641)Online publication date: 2020
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
MEMSYS '15: Proceedings of the 2015 International Symposium on Memory Systems
October 2015
278 pages
ISBN:9781450336048
DOI:10.1145/2818950
© 2015 Association for Computing Machinery. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of the United States government. As such, the United States Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 05 October 2015

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Energy Estimation
  2. Offloading
  3. PNM

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Funding Sources

Conference

MEMSYS '15
MEMSYS '15: International Symposium on Memory Systems
October 5 - 8, 2015
DC, Washington DC, USA

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)77
  • Downloads (Last 6 weeks)8
Reflects downloads up to 02 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2022)Co-scheduling Ensembles of In Situ Workflows2022 IEEE/ACM Workshop on Workflows in Support of Large-Scale Science (WORKS)10.1109/WORKS56498.2022.00011(43-51)Online publication date: Nov-2022
  • (2021)DAMOV: A New Methodology and Benchmark Suite for Evaluating Data Movement BottlenecksIEEE Access10.1109/ACCESS.2021.31109939(134457-134502)Online publication date: 2021
  • (2020)Making Better Use of Processing-in-Memory Through Potential-Based Task OffloadingIEEE Access10.1109/ACCESS.2020.29834328(61631-61641)Online publication date: 2020
  • (2018)StaleLearn: Learning Acceleration with Asynchronous Synchronization Between Model Replicas on PIMIEEE Transactions on Computers10.1109/TC.2017.278023767:6(861-873)Online publication date: 1-Jun-2018
  • (2018)PM3: Power Modeling and Power Management for Processing-in-Memory2018 IEEE International Symposium on High Performance Computer Architecture (HPCA)10.1109/HPCA.2018.00054(558-570)Online publication date: Feb-2018
  • (2017)Triple Engine Processor (TEP)ACM Transactions on Architecture and Code Optimization10.1145/315592014:4(1-25)Online publication date: 18-Dec-2017
  • (2016)Characterizing the Performance of Hybrid Memory Cube Using ApexMAP Application ProbesProceedings of the Second International Symposium on Memory Systems10.1145/2989081.2989090(429-436)Online publication date: 3-Oct-2016
  • (2016)Accelerating Linked-list Traversal Through Near-Data ProcessingProceedings of the 2016 International Conference on Parallel Architectures and Compilation10.1145/2967938.2967958(113-124)Online publication date: 11-Sep-2016
  • (2015)BSSyncProceedings of the 2015 International Conference on Parallel Architecture and Compilation (PACT)10.1109/PACT.2015.42(241-252)Online publication date: 18-Oct-2015

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media