skip to main content
10.1145/1023833.1023838acmconferencesArticle/Chapter ViewAbstractPublication PagesesweekConference Proceedingsconference-collections
Article

Dynamic on-chip memory management for chip multiprocessors

Published: 22 September 2004 Publication History

Abstract

One of the most important issues in designing a chip multiprocessor is to decide its on-chip memory organization. A poor on-chip memory design can have serious power and performance implications when running data-intensive embedded applications. While it is possible to design an application-specific memory architecture, this may not be the best option, in particular when storage demands of individual processors and/or their data sharing patterns can change from one point in execution to another for the same application. In this paper, we consider dynamic configuration of software-managed on-chip memory space to adapt runtime variations in data storage demand and interprocessor sharing patterns. The proposed framework is fully implemented using an optimizing compiler, a polyhedral tool, and a memory partitioner (based on integer linear programming), and tested using a suite of eight data-intensive embedded applications. Our experimental evaluation indicates that the proposed technique is very effective in practice and leads to much less energy consumption than all the alternate memory management schemes tested, including one that comes up with an application-specific memory.

References

[1]
S. G. Abraham and S. A. Mahlke. Automatic and Efficient Evaluation of Memory Hierarchies for Embedded Systems. In Proceedings of the 32nd Annual International Symposium on Microarchitecture, Haifa, Israel, November 1999.
[2]
S. P. Amarasinghe, J. M. Anderson, M. S. Lam, and C. W. Tseng. The SUIF Compiler for Scalable Parallel Machines. In Proceedings of the Seventh SIAM Conference on Parallel Processing for Scientific Computing, February, 1995.
[3]
F. Angiolini, L. Benini, and A. Caprara. Polynomial-Time Algorithm for On-Chip Scratch-Pad Memory Partitioning. In Proceedings of the International Conference on Compilers, Architectures and Synthesis for Embedded Systems, San Jose, CA, 2003.
[4]
U. Banerjee. Loop Parallelization. Kluwer Academic Publishers, 1994.
[5]
Y. Cao, H. Tomiyama, T. Okuma, and H. Yasuura. Data Memory Design Considering Effective Bitwidth for Low-Energy Embedded Systems. In Proceedings of the 15th International Symposium on System Synthesis, Kyoto, Japan, October 2002.
[6]
F. Catthoor, S. Wuytack, E. D. Greef, F. Balasa, L. Nachtergaele, and A. Vandecappelle. Custom Memory Management Methodology -- Exploration of Memory Organization for Embedded Multimedia System Design. Kluwer Academic Publishers, 1998.
[7]
S. Cotterell and F. Vahid. Tuning of Loop Cache Architectures to Programs in Embedded System Design. In Proceedings of the 15th international Symposium on System Synthesis, Kyoto, Japan, October 2002.
[8]
F. Gharsalli, S. Meftali, F. Rousseau, and A. A. Jerraya. Automatic Generation of Embedded Memory Wrapper for Multiprocessor SoC. In Proceedings of the 39th Design Automation Conference, New Orleans, Louisiana, 1999.
[9]
M. Kandemir and A. Choudhary. Compiler-Directed Scratch-Pad Memory Hierarchy Design and Management. In Proceedings of the Design Automation Conference, New Orleans, LA, June 2002.
[10]
W. Kelly and W. Pugh. Finding Legal Reordering Transformations Using Mappings. In Proceedings of the Workshop on Languages and Compilers for Parallel Computing. pp. 107--124, 1994.
[11]
C. H. Koelbel, D. B. Loveman, and R. S. Schreiber. The High Performance Fortran Handbook. MIT Press, 1993.
[12]
V. Krishnan and J. Torrellas. A Chip Multiprocessor Architecture with Speculative Multi-threading. IEEE Transactions on Computers, Special Issue on Multi-threaded Architecture, September 1999.
[13]
C. Liu, A. Sivasubramaniam, and M. Kandemir. Organizing the Last Line of Defense Before Hitting the Memory Wall for CMPs. In Proceedings of the International Symposium on High-Performance Computer Architecture, Madrid, Spain, February 2004.
[14]
MAJC-5200. http://www.sun.com/microelectronics/MAJC/5200wp.html
[15]
S. Meftali, F. Gharsalli, F. Rousseau, and A. A. Jerraya. An Optimal Memory Allocation for Application-Specific Multiprocessor System-on-Chip. In Proceedings of the International Symposium on Systems Synthesis, Montreal, Canada, 2001.
[16]
MP98: A Mobile Processor. http://www.labs.nec.co.jp/MP98/top-e.htm.
[17]
B. A. Nayfeh, L. Hammond, and K. Olukotun. Evaluating Alternatives for a Multiprocessor Microprocessor. In Proceedings of the 23rd International Symposium on Computer Architecture, Philadelphia, PA, 1996.
[18]
The OpenMP Application Program Interface. http://www.openmp.org/.
[19]
P. R. Panda and L. Chitturi. An Energy-Conscious Algorithm for Memory Port Allocation. In Proceedings of the 2002 IEEE/ACM International Conference on Computer-Aided Design, San Jose, California, November 2002.
[20]
P. R. Panda, N. D. Dutt, and A. Nicolau. Architectural Exploration and Optimization of Local Memory in Embedded Systems. In Proceedings of the 10th international Symposium on System Synthesis, Antwerp, Belgium, September 1997.
[21]
A. Ramachandran and M. F. Jacome. Xtream-Fit: An Energy-Delay Efficient Data Memory Subsystem for Embedded Media Processing. In Proceedings of the 40th Design Automation Conference, Anaheim, CA, June 2003.
[22]
P. Ranganathan, S. V. Adve, and N. P. Jouppi. Reconfigurable Caches and Their Application to Media Processing. In Proceedings of the International Symposium on Computer Architecture, pages 214--224, 2000.
[23]
G. Reinman and N. P. Jouppi. CACTI 2.0: An Integrated Cache Timing and Power Model. Compaq, WRL, Research Report 2000/7, February 2000.
[24]
W.-T. Shiue and C. Chakrabarti. Memory Exploration for Low-Power Embedded Systems. In Proceedings of the 36th Design Automation Conferences, New Orleans, LA, 1999.
[25]
G. E. Suh, L. Rudolph, and S. Devadas. Dynamic Partitioning of Shared Cache Memory. Journal of Supercomputing, 2002.
[26]
S. Udayakumaran and R. Barua. Compiler-Decided Dynamic Memory Allocation for Scratch-Pad Based Embedded Systems. In Proceedings of the International Conference on Compilers, Architectures and Synthesis for Embedded Systems, San Jose, CA, 2003.
[27]
Y. Li and W. Wolf. Hardware/Software Co-Synthesis with Memory Hierarchies. IEEE Transactions on Computer-Aided Design of Integrated Circuit and Systems, October 1999.

Cited By

View all
  • (2021)Flexible Cache Partitioning for Multi-Mode Real-Time Systems2021 Design, Automation & Test in Europe Conference & Exhibition (DATE)10.23919/DATE51398.2021.9474240(1156-1161)Online publication date: 1-Feb-2021
  • (2018)What Your DRAM Power Models Are Not Telling YouProceedings of the ACM on Measurement and Analysis of Computing Systems10.1145/32244192:3(1-41)Online publication date: 21-Dec-2018
  • (2017)ROHOMIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2016.258404836:3(357-369)Online publication date: 1-Mar-2017
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
CASES '04: Proceedings of the 2004 international conference on Compilers, architecture, and synthesis for embedded systems
September 2004
324 pages
ISBN:1581138903
DOI:10.1145/1023833
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 22 September 2004

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. chip multiprocessors
  2. memory bank
  3. optimizing compiler

Qualifiers

  • Article

Conference

CASES04

Acceptance Rates

Overall Acceptance Rate 52 of 230 submissions, 23%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)1
  • Downloads (Last 6 weeks)0
Reflects downloads up to 16 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2021)Flexible Cache Partitioning for Multi-Mode Real-Time Systems2021 Design, Automation & Test in Europe Conference & Exhibition (DATE)10.23919/DATE51398.2021.9474240(1156-1161)Online publication date: 1-Feb-2021
  • (2018)What Your DRAM Power Models Are Not Telling YouProceedings of the ACM on Measurement and Analysis of Computing Systems10.1145/32244192:3(1-41)Online publication date: 21-Dec-2018
  • (2017)ROHOMIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2016.258404836:3(357-369)Online publication date: 1-Mar-2017
  • (2015)Dynamic Shared SPM Reuse for Real-Time Multicore Embedded SystemsACM Transactions on Architecture and Code Optimization10.1145/273805112:2(1-25)Online publication date: 11-May-2015
  • (2011)Static bus schedule aware scratchpad allocation in multiprocessorsACM SIGPLAN Notices10.1145/2016603.196768046:5(11-20)Online publication date: 11-Apr-2011
  • (2011)Static bus schedule aware scratchpad allocation in multiprocessorsProceedings of the 2011 SIGPLAN/SIGBED conference on Languages, compilers and tools for embedded systems10.1145/1967677.1967680(11-20)Online publication date: 11-Apr-2011
  • (2010)Scratchpad allocation for concurrent embedded softwareACM Transactions on Programming Languages and Systems10.1145/1734206.173421032:4(1-47)Online publication date: 22-Apr-2010
  • (2008)Scratchpad allocation for concurrent embedded softwareProceedings of the 6th IEEE/ACM/IFIP international conference on Hardware/Software codesign and system synthesis10.1145/1450135.1450145(37-42)Online publication date: 19-Oct-2008
  • (2006)Dynamic partitioning of processing and memory resources in embedded MPSoC architecturesProceedings of the conference on Design, automation and test in Europe: Proceedings10.5555/1131481.1131675(690-695)Online publication date: 6-Mar-2006
  • (2006)Selective code/data migration for reducing communication energy in embedded MpSoC architecturesProceedings of the 16th ACM Great Lakes symposium on VLSI10.1145/1127908.1127997(386-391)Online publication date: 30-Apr-2006
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media