|
ABSTRACT
In this research, we propose a highly predictable, low overhead, and, yet, dynamic, memory-allocation strategy for embedded systems with scratch pad memory. A scratch pad is a fast compiler-managed SRAM memory that replaces the hardware-managed cache. It is motivated by its better real-time guarantees versus cache and by its significantly lower overheads in energy consumption, area, and overall runtime, even with a simple allocation scheme. Primarily scratch pad allocation methods are of two types. First, software-caching schemes emulate the workings of a hardware cache in software. Instructions are inserted before each load/store to check the software-maintained cache tags. Such methods incur large overheads in runtime, code size, energy consumption, and SRAM space for tags and deliver poor real-time guarantees just like hardware caches. A second category of algorithms partitions variables at compile-time into the two banks. However, a drawback of such static allocation schemes is that they do not account for dynamic program behavior. It is easy to see why a data allocation that never changes at runtime cannot achieve the full locality benefits of a cache. We propose a dynamic allocation methodology for global and stack data and program code that; (i) accounts for changing program requirements at runtime, (ii) has no software-caching tags, (iii) requires no runtime checks, (iv) has extremely low overheads, and (v) yields 100% predictable memory access times. In this method, data that is about to be accessed frequently is copied into the scratch pad using compiler-inserted code at fixed and infrequent points in the program. Earlier data is evicted if necessary. When compared to a provably optimal static allocation, results show that our scheme reduces runtime by up to 39.8% and energy by up to 31.3%, on average, for our benchmarks, depending on the SRAM size used. The actual gain depends on the SRAM size, but our results show that close to the maximum benefit in runtime and energy is achieved for a substantial range of small SRAM sizes commonly found in embedded systems. Our comparison with a direct mapped cache shows that our method performs roughly as well as a cached architecture.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Nawaaz Ahmed , Nikolay Mateev , Keshav Pingali, Tiling imperfectly-nested loop nests, Proceedings of the 2000 ACM/IEEE conference on Supercomputing (CDROM), p.31-es, November 04-10, 2000, Dallas, Texas, United States
|
| |
2
|
|
 |
3
|
Federico Angiolini , Luca Benini , Alberto Caprara, Polynomial-time algorithm for on-chip scratchpad memory partitioning, Proceedings of the 2003 international conference on Compilers, architecture and synthesis for embedded systems, October 30-November 01, 2003, San Jose, California, USA
[doi> 10.1145/951710.951751]
|
 |
4
|
Federico Angiolini , Francesco Menichelli , Alberto Ferrero , Luca Benini , Mauro Olivieri, A post-compiler approach to scratchpad mapping of code, Proceedings of the 2004 international conference on Compilers, architecture, and synthesis for embedded systems, September 22-25, 2004, Washington DC, USA
[doi> 10.1145/1023833.1023869]
|
| |
5
|
|
 |
6
|
Oren Avissar , Rajeev Barua , Dave Stewart, Heterogeneous memory management for embedded systems, Proceedings of the 2001 international conference on Compilers, architecture, and synthesis for embedded systems, November 16-17, 2001, Atlanta, Georgia, USA
[doi> 10.1145/502217.502223]
|
 |
7
|
|
 |
8
|
Rajeshwari Banakar , Stefan Steinke , Bo-Sik Lee , M. Balakrishnan , Peter Marwedel, Scratchpad memory: design alternative for cache on-chip memory in embedded systems, Proceedings of the tenth international symposium on Hardware/software codesign, May 06-08, 2002, Estes Park, Colorado
[doi> 10.1145/774789.774805]
|
 |
9
|
Kathleen Baynes , Chris Collins , Eric Fiterman , Brinda Ganesh , Paul Kohout , Christine Smit , Tiebing Zhang , Bruce Jacob, The performance and energy consumption of three embedded real-time operating systems, Proceedings of the 2001 international conference on Compilers, architecture, and synthesis for embedded systems, November 16-17, 2001, Atlanta, Georgia, USA
[doi> 10.1145/502217.502253]
|
| |
10
|
Kathleen Baynes , Chris Collins , Eric Fiterman , Brinda Ganesh , Paul Kohout , Christine Smit , Tiebing Zhang , Bruce Jacob, The Performance and Energy Consumption of Embedded Real-Time Operating Systems, IEEE Transactions on Computers, v.52 n.11, p.1454-1469, November 2003
[doi> 10.1109/TC.2003.1244943]
|
| |
11
|
Belady, L. 1966. A study of replacement algorithms for virtual storage. In IBM Systems Journal 5, 78--101.
|
| |
12
|
Bringmann, R. A. 1995. Compiler-controlled speculation. Ph.D. thesis, Department of Computer Science, University of Illinois, Urbana, IL.
|
 |
13
|
|
| |
14
|
Dinero Cache Simulator Revised. 2004. DineroIV Cache Simulator. http://www.cs.wisc.edu/markhill/DineroIV/.
|
| |
15
|
Dominguez, A., Udayakumaran, S., and Barua, R. 2005. Heap data allocation to scratch-pad memory in embedded systems. Journal of Embedded Computing(JEC) 1, 4, IOS Press, Amsterdam, Netherlands.
|
| |
16
|
Eisenbeis, C., Jalby, W. D., and Fran, C. 1990. A strategy for array management in local memory. In Technical Report 1262, INRIA, Domaine de Voluceau, France.
|
 |
17
|
Poletti Francesco , Paul Marchal , David Atienza , Luca Benini , Francky Catthoor , Jose M. Mendias, An integrated hardware/software approach for run-time scratchpad management, Proceedings of the 41st annual conference on Design automation, June 07-11, 2004, San Diego, CA, USA
[doi> 10.1145/996566.996634]
|
| |
18
|
|
 |
19
|
|
 |
20
|
Jason D. Hiser , Jack W. Davidson, EMBARC: an efficient memory bank assignment algorithm for retargetable compilers, Proceedings of the 2004 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems, June 11-13, 2004, Washington, DC, USA
|
| |
21
|
ILOG Corporation. 2001. The CPLEX optimization suite. http://www.ilog.com/products/cplex/.
|
| |
22
|
Janzen, J. 2001. Calculating memory system power for DDR SDRAM. In DesignLine Journal 10(2). Micron Technology Inc. http://www.micron.com/publications/designline.html.
|
 |
23
|
M. Kandemir , J. Ramanujam , J. Irwin , N. Vijaykrishnan , I. Kadayif , A. Parikh, Dynamic management of scratch-pad memory space, Proceedings of the 38th conference on Design automation, p.690-695, June 2001, Las Vegas, Nevada, United States
[doi> 10.1145/378239.379049]
|
| |
24
|
|
 |
25
|
|
| |
26
|
Lctes Panel. 2003. Compilation challenges for network processors. Industrial Panel, ACM Conference on Languages, Compilers and Tools for Embedded Systems (LCTES). Slides at http://www.cs.purdue.edu/s3/LCTES03/.
|
 |
27
|
|
| |
28
|
|
| |
29
|
Micron-flash data sheet. 128Mb Q-Flash memory. Micron technology Inc. http://www.micron.com/products/nor/qflash/partlist.aspx.
|
| |
30
|
Micron-datasheet. 2003. 128Mb DDR SDRAM data sheet. (Dual data-rate synchronous DRAM) Micron Technology Inc. http://www.micron.com/products/dram/ddrsdram/.
|
| |
31
|
|
 |
32
|
|
| |
33
|
Schreiber, R. D. C. 2004. Near-optimal allocation of local memory arrays. In HPL-2004-24.
|
 |
34
|
|
 |
35
|
Jan Sjödin , Carl von Platen, Storage allocation for embedded processors, Proceedings of the 2001 international conference on Compilers, architecture, and synthesis for embedded systems, November 16-17, 2001, Atlanta, Georgia, USA
[doi> 10.1145/502217.502221]
|
| |
36
|
Sjodin, J., Froderberg, B., and Lindgren, T. 1998. Allocation of global data objects in on-chip RAM. Compiler and Architecture Support for Embedded Computing Systems.
|
 |
37
|
Stefan Steinke , Nils Grunwald , Lars Wehmeyer , Rajeshwari Banakar , M. Balakrishnan , Peter Marwedel, Reducing energy consumption by dynamic copying of instructions onto onchip memory, Proceedings of the 15th international symposium on System Synthesis, October 02-04, 2002, Kyoto, Japan
[doi> 10.1145/581199.581247]
|
| |
38
|
|
| |
39
|
|
| |
40
|
Tiwari, V. and Lee, M. T. C. 1998. Power analysis of a 32-bit embedded microcontroller. VLSI Design Journal 7, 33.
|
| |
41
|
Udayakumaran, S., Narahari, B., and Simha, R. 2002. Application specific memory partitioning for low power. In Proceedings of ACM COLP 2002 (Compiler and Operating Systems for Low Power. ACM Press, New York.
|
 |
42
|
|
| |
43
|
|
| |
44
|
|
 |
45
|
|
| |
46
|
Wehmeyer, L. and Marwedel, P. 2004. Influence of onchip scratch-pad memories on wcet prediction. In Proceedings of the 4th International Workshop on Worst-Case Execution Time (WCET) Analysis.
|
 |
47
|
|
| |
48
|
Wilton, S. and Jouppi, N. 1996. Cacti: An enhanced cache access and cycle time model. In IEEE Journal of Solid-State Circuits.
|
CITED BY 7
|
|
|
|
Tong Chen , Tao Zhang , Zehra Sura , Mar Gonzales Tallada, Prefetching irregular references for software cache on cell, Proceedings of the sixth annual IEEE/ACM international symposium on Code generation and optimization, April 05-09, 2008, Boston, MA, USA
|
|
|
|
|
|
|
|
Angel Dominguez , Nghi Nguyen , Rajeev K. Barua, Recursive function data allocation to scratch-pad memory, Proceedings of the 2007 international conference on Compilers, architecture, and synthesis for embedded systems, September 30-October 03, 2007, Salzburg, Austria
|
|
Nghi Nguyen , Angel Dominguez , Rajeev Barua, Scratch-pad memory allocation without compiler support for java applications, Proceedings of the 2007 international conference on Compilers, architecture, and synthesis for embedded systems, September 30-October 03, 2007, Salzburg, Austria
|
|