Abstract
Previous researches show that a scratchpad memory device consumed less energy than a cache device with the same capacity. In this article, we locate the scratchpad memory (SPM) in the top level of the memory hierarchy to reduce the energy consumption. To take the advantage of a SPM, we address two issues of utilizing a SPM. First, the program's locality should be improved. The second issue is SPM management. To tackle these two issues, we present a hardware/software framework for dynamically allocating both instructions and data in SPM. The software flow could be divided into three phases: locality improving, locality extraction, and runtime SPM management. Without modifying the original compiler and the source code, we improve the locality of a program. An optimization algorithm is proposed to extract the SPM allocations. At runtime, an SPM management program is employed. In hardware, an address translation logic (ATL) is proposed to reduce the overhead of SPM management.
The results show that the proposed framework can reduce energy delay product (EDP) by 63%, on average, when compared with the traditional cache architecture. The reduction in EDP is contributed by properly allocating both instructions and data in SPM. By allocating only instructions in SPM, the EDPs are reduced by 45%, on average. By allocating only data in SPM, the EDPs are reduced by 14%, on average.
- Angiolini, F., Benini, L., and Caprara, A. 2005. An efficient profile-based algorithm for scratchpad memory partitioning. IEEE Trans. Comput. Aid. Des. Integr. Circuits Syst. 24, 1660--1676. Google ScholarDigital Library
- Avissar, O., Barua, R., and Stewart, D. 2001. Heterogeneous memory management for embedded systems. In Proceedings of the International Conference on Compilers, Architecture, and Synthesis for Embedded Systems. ACM, New York, 34--43. Google ScholarDigital Library
- Avissar, O., Barua, R., and Stewart, D. 2002. An optimal memory allocation scheme for scratch-pad-based embedded systems. ACM Trans. Embed. Comput. Syst. 1, 1, 6--26. Google ScholarDigital Library
- Banakar, R., Steinke, S., Lee, B.-S., Balakrishnan, M., and Marwedel, P. 2002. Scratchpad memory: Design alternative for cache on-chip memory in embedded systems. In Proceedings of the 10th International Symposium on Hardware/Software Codesign. ACM, New York, 73--78. Google ScholarDigital Library
- Brockmeyer, E., Miranda, M., Corporaal, H., and Catthoor, F. 2003. Layer assignment techniques for low energy in multi-layered memory organizations. In Proceedings of the Conference on Design, Automation and Test in Europe. IEEE, Los Alamitos, CA. Google ScholarDigital Library
- Chen, T.-F. and Baer, J.-L. 1995 Effective hardware-based data prefetching for high-performance processors. IEEE Trans. Comput. 44, 5, 609--623 Google ScholarDigital Library
- Chen, Z.-H. 2009. NCKU SPM Simulator. http://code.google.com/p/nckuspmsimulator/In Proceedings of the Conference on Measurement and Modeling of Computer Systems, ACM, 128--137.Google Scholar
- Dominguez, A., Udayakumaran, S., and Barua, R. 2005. Heap data allocation to scratch-pad memory in embedded systems. J. Embed. Comput. 1, 4, 521--540. Google ScholarDigital Library
- Egger, B., Lee, J., and Shin, H. 2006. Scratchpad memory management for portable systems with a memory management unit. In Proceedings of the 6th International Conference on Embedded Software. ACM, New York, 321--330. Google ScholarDigital Library
- Francesco, P., Marchal, P., Atienza, D., Benini, L., Catthoor, F., MENDIAS, J. M. 2004. An integrated hardware/software approach for runtime scratchpad management. In Proceedings of the 41st Annual Conference on Design Automation. ACM, New York, 238--243. Google ScholarDigital Library
- Fraser, C. W. 1991. A retargetable compiler for ANSI C. SIGPLAN Notice, 26, 29--43. Google ScholarDigital Library
- Guthaus, M. R., Ringenberg, J. S., Ernst, D., Austin, T. M., Mudge, T., and Brown, R. B. 2001. MiBench: A free, commercially representative embedded benchmark suite. In Proceedings of the International Workshop on Workload Characterization. IEEE, Los Alamitos, CA, 3--14. Google ScholarDigital Library
- Hallnor, E. G. and Reinhardt, S. K. 2000. A fully associative software-managed cache design. Comput. Archit. News 28, 2, 107--116. Google ScholarDigital Library
- Hatfield, D. J. and Gerald, J. G. 1971. Program restructuring for virtual memory. IBM Syst. J. 10, 3, 168--192. Google ScholarDigital Library
- Janapsatya, A., Parameswaran, S., and Ignjatovi, A. 2004. Hardware/software managed scratchpad memory for embedded system. In Proceedings of the International Conference on Computer-Aided Design. IEEE, Los Almitos, CA, 370--377. Google ScholarDigital Library
- Janapsatya, A., Ignjatovi, A., and Parameswaran, S. 2006. A novel instruction scratchpad memory optimization method based on concomitance metric. In Proceedings of the Conference on Asia South Pacific Design Automation. IEEE, Los Alamitos, CA, 612--627. Google ScholarDigital Library
- Kalamationos, J. and Kaeli, D. R. 1998. Temporal-based procedure reordering for improved instruction cache performance. In Proceedings of the 4th International Symposium on High-Performance Computer Architecture. IEEE, Los Alamitos, CA, 244--253. Google ScholarDigital Library
- Kandemir, M., Ramanujam, J., Irwin, M. J., Vijaykrishnan, N., Kadayif, I., and Parikh, A. 2001. Dynamic management of scratchpad memory space. In Proceedings of the 38th Annual Design Automation Conference. ACM, New York, 690--695. Google ScholarDigital Library
- Kandemir, M., Kadayif, I., and Sezer, U. 2001. Exploiting scratchpad memory using Presburger formulas. In Proceedings of the 14th International Symposium on Systems Synthesis. ACM, New York. Google ScholarDigital Library
- Kirovski, D., Lee, C., Potkonjak, M., and Mangione-Smith, W. H. 1999. Application-driven synthesis of memory-intensive systems-on-chip. IEEE Trans. Comput. Aid. Des. Integr. Circuits Syst. 18, 9, 1316--1326. Google ScholarDigital Library
- Micron Technique, Inc. 2009. System Power Calculator. http://www.micron.com/support/part_ info/powercalc.Google Scholar
- Nguyen, N., Dominguez, A., and Barua, R. 2005. Memory allocation for embedded systems with a compile-time-unknown scratchpad size. In Proceedings of the International Conference on Compilers, Architectures and Synthesis for Embedded Systems. ACM, New York, 115--125. Google ScholarDigital Library
- Panda, P. R., Dutt, N. D., and Nicolau, A. 1997. Efficient utilization of scratchpad memory in embedded processor applications. In Proceedings of the European Conference on Design and Test. IEEE, Los Alamitos, CA, 7--11. Google ScholarDigital Library
- Panda, P. R., Nikil, D. D., and Alexandru, N. 2000. On-chip vs. off-chip memory: The data partitioning problem in embedded processor-based systems. ACM Trans. Autom. Electron. Syst. 5, 3, 682--704. Google ScholarDigital Library
- Park, S., Park, H.-W., and Ha, S. 2007. A novel technique to use scratch-pad memory for stack management. In Proceedings of the Conference on Design, Automation and Test in Europe. ACM, New York, 1478--1483. Google ScholarDigital Library
- Pettis, K. and Hansen, R. C. 1990. Profile guided code positioning. SIGPLAN Notice, 25, 6, 16-27. Google ScholarDigital Library
- Pyka, R., Fabach, C., Verma, M., Falk, H., and Marwedel, P. 2007. Operating system integrated energy aware scratchpad allocation strategies for multiprocess applications. In Proceedings of the 10th International Workshop on Software and Compilers for Esystems. ACM, New York, 41--50 Google ScholarDigital Library
- Sjodin, J., Fr Derberg, B., and Lindgren, T. 1998. Allocation of global data objects in on-chip RAM. In Proceedings of the Conference on Compiler and Architecture Support for Embedded Computing Systems. ACM, New York.Google Scholar
- Sjodin, J. and Platen, C. V. 2001. Storage allocation for embedded processors. In Proceedings of the International Conference on Compilers, Architecture, and Synthesis for Embedded Systems. ACM, New York, 15--23 Google ScholarDigital Library
- Steinke, S., Grunwald, N., Wehmeyer, L., Banakar, R., Balakrishnan, M., and Marwedel, P. 2002a. Reducing energy consumption by dynamic copying of instructions onto on-chip memory. In Proceedings of the 15th International Symposium on System Synthesis. ACM, New York, 213--218. Google ScholarDigital Library
- Steinke, S., Wehmeyer, L., Lee, B.-S., and Marwedel, P. 2002b. Assigning program and data objects to scratchpad for energy reduction. In Proceedings of the Conference on Design, Automation and Test in Europe. IEEE, Los Alamitos, CA, 409--415. Google ScholarDigital Library
- Udayakumaran, S. and Barua, R. 2003. Compiler-decided dynamic memory allocation for scratchpad-based embedded systems. In Proceedings of the International Conference on Compilers, Architecture and Synthesis for Embedded Systems. ACM, New York, 276--286. Google ScholarDigital Library
- Udayakumaran, S., Dominguez, A., and Barua, R. 2006a. Dynamic allocation for scratchpad memory using compile-time decisions. ACM Trans. Embed. Comput. Syst. 5, 2, 472--511. Google ScholarDigital Library
- Udayakumaran, S. and Barua, R. 2006b. An integrated scratchpad allocator for affine and non-affine code. In Proceedings of the Conference on Design, Automation and Test in Europe. IEEE, Los Alamitos, CA, 925--930. Google ScholarDigital Library
- Verma, M., Wehmeyer, L., and Marwedel, P. 2004a. Cache-aware scratchpad allocation algorithm. In Proceedings Design, Automation and Test in Europe Conference and Exhibition, IEEE, Los Alamitos, CA, 1264--1269. Google ScholarDigital Library
- Verma, M., Wehmeyer, L., and Marwedel, P. 2004b. Dynamic overlay of scratchpad memory for energy minimization. In Proceedings of the 2nd International Conference on Hardware/Software Co-Design and System Synthesis. ACM, New York, 104--109. Google ScholarDigital Library
- Verma, M., Petzold, K., Wehmeyer, L., Falk, H., and Marwedel, P. 2005. Scratchpad sharing strategies for multiprocess embedded systems: A first approach. In Proceedings of the 3rd Workshop on Embedded Systems for Real-Time Multimedia. IEEE, Los Alamitos, CA, 115--120.Google Scholar
- Wehmeyer, L., Helmig, U., and Marwedel, P. 2004. Compiler-optimized usage of partitioned memories. In Proceedings of the 3rd Workshop on Memory Performance Issues: In Conjunction with the 31st International Symposium on Computer Architecture. ACM, New York, 114--120. Google ScholarDigital Library
- Wilton, S. and Jouppi, N. 1996. CACTI: An enhanced cache access and cycle time model. IEEE J. Solid State Circuits 31, 5, 677--688.Google ScholarCross Ref
- Wolf, M. E. and Lam, M. S. 1991. A data locality optimizing algorithm. SIGPLAN Notice 26, 6, 30--44. Google ScholarDigital Library
Index Terms
- A hardware/software framework for instruction and data scratchpad memory allocation
Recommendations
Compiler-directed scratchpad memory management via graph coloring
Scratchpad memory (SPM), a fast on-chip SRAM managed by software, is widely used in embedded systems. This article introduces a general-purpose compiler approach, called memory coloring, to assign static data aggregates, such as arrays and structs, in a ...
Dynamic data scratchpad memory management for a memory subsystem with an MMU
LCTES '07: Proceedings of the 2007 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systemsIn this paper, we propose a dynamic scratchpad memory (SPM)management technique for a horizontally-partitioned memory subsystem with an MMU. The memory subsystem consists of a relatively cheap direct-mapped data cache and SPM. Our technique loads ...
A reuse-aware prefetching scheme for scratchpad memory
DAC '11: Proceedings of the 48th Design Automation ConferenceScratchpad memory (SPM) has been utilized as prefetch buffer in embedded systems and parallel architectures to hide memory access latency. However, the impact of reuse pattern on SPM prefetching has not been fully investigated. In this paper we quantify ...
Comments