Abstract
Efficient utilization of on-chip memory space is extremely important in modern embedded system applications based on processor cores. In addition to a data cache that interfaces with slower off-chip memory, a fast on-chip SRAM, called Scratch-Pad memory, is often used in several applications, so that critical data can be stored there with a guaranteed fast access time. We present a technique for efficiently exploiting on-chip Scratch-Pad memory by partitioning the application's scalar and arrayed variables into off-chip DRAM and on-chip Scratch-Pad SRAM, with the goal of minimizing the total execution time of embedded applications. We also present extensions of our proposed memory assignment strategy to handle context switching between multiple programs, as well as a generalized memory hierarchy. Our experiments on code kernels from typical applications show that our technique results in significant performance improvements.
- AHMAD, I. AND CHEN, C. Y. R. 1991. Post-processor for data path synthesis using multiport memories. In Proceedings of the IEEE/ACM International Conference on Computer-Aided Design (ICCAD '91, Santa Clara, CA, Nov. 11-14, 1991), IEEE Computer Society Press, Los Alamitos, CA, 276-279.Google ScholarCross Ref
- AHO, A., SETHI, R., AND ULLMAN, J. 1986. Compilers: Principles, Techniques, and Tools. Addison-Wesley, Reading, MA. Google ScholarDigital Library
- BAKSHI, S. AND GAJSKI, D. D. 1995. A memory selection algorithm for high-performance pipelines. In Proceedings of the European Conference EURO-DAC '95 with EURO-VHDL '95 on Design Automation (Brighton, UK, Sept. 18-22), G. Musgrave, Ed. IEEE Computer Society Press, Los Alamitos, CA, 124-129. Google Scholar
- BALAKRISHNAN, M., BANERJI, D. K., MAJUMDAR, A. K., LINDERS, J. G., AND MAJITHIA, J. C. 1990. Allocation of multiport memories in data path synthesis. IEEE Trans. Comput.- Aided Des. 7, 4 (Apr. 1990), 536-540.Google Scholar
- BALASA, F., CATTHOOR, F., AND DE MAN, H. 1995. Background memory area estimation for multidimensional signal processing systems. IEEE Trans. Very Large Scale Integr. Syst. 3, 2 (June 1995), 157-172. Google ScholarDigital Library
- CATTHOOR, F. AND SVENSSON, L. 1993. Application-Driven Architecture Synthesis. Kluwer Academic Publishers, Hingham, MA. Google Scholar
- GAJSKI, D. D., DUTT, N. D., Wu, A. C.-H., AND LIN, S.Y.-L. 1992. High-Level Synthesis: Introduction to Chip and System Design. Kluwer Academic Publishers, Hingham, MA. Google Scholar
- LE GALL, D. 1991. MPEG: a video compression standard for multimedia applications. Commun. ACM 34, 4 (Apr. 1991), 46-58. Google ScholarDigital Library
- GAREY, M. AND JOHNSON, D. 1979. Computers and Intractability: A Guide to the Theory of NP-Completeness. W. H. Freeman and Co., New York, NY. Google Scholar
- JHA, P. K. AND DUTT, N. D. 2000. High-level library mapping for memories. ACM Trans. Des. Autom. Electron. Syst. 5, 3 (July), 566-603. Google ScholarDigital Library
- KARCHMER, D. AND ROSE, J. 1994. Definition and solution of the memory packing problem for field-programmable systems. In Proceedings of the IEEE /ACM International Conference on Computer Aided Design (Nov. 1994), 20-26. Google Scholar
- KIM, T. AND LIU, C. L. 1993. Utilization of multiport memories in data path synthesis. In Proceedings of the 30th ACM/IEEE International Conference on Design Automation (DAC '93, Dallas, TX, June 14-18), A. E. Dunlop, Ed. ACM Press, New York, NY, 298-302. Google Scholar
- LAM, M., ROTHBERG, E., AND WOLF, M. 1991. The cache performance and optimizations of blocked algorithms. In Proceedings of the 4th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-IV, Santa Clara, CA, Apr. 8-11), D. A. Patterson, Ed. ACM Press, New York, NY, 63-74. Google Scholar
- LIAO, S., DEVADAS, S., KEUTZER, K., TJIANG, S., AND WANG, A. 1995. Code optimization techniques for embedded DSP microprocessors. In Proceedings of the 32nd ACM/IEEE Conference on Design Automation (DAC '95, San Francisco, CA, June 12-16, 1995), B. T. Preas, Ed. ACM Press, New York, NY, 599-604. Google ScholarDigital Library
- LIAO, S., DEVADAS, S., KEUTZER, K., TJIANG, S., AND WANG, A. 1995. Storage assignment to decrease code size. In Proceedings of the Conference on Programming Language Design and Implementation (SIGPLAN '95, La Jolla, CA, June 18-21), D. W. Wall, Ed. ACM Press, New York, NY, 186-195. Google Scholar
- LIPPENS, P. E. R., VAN MEERBERGEN, J. L., VERHAEGH, W. F. J., AND VAN DER WERF, A. 1993. Allocation of multiport memories for hierarchical data stream. In Proceedings of the International Conference on Computer-Aided Design (ICCAD '93, Santa Clara, CA, Nov. 7-11), M. Lightner and J. A. G. Jess, Eds. IEEE Computer Society Press, Los Alamitos, CA, 728-735. Google Scholar
- LSI LOGIC CORPORATION. 1992. CW33000 MIPS Embedded Processor User's Manual. VLSI Technologies, Inc.Google Scholar
- MARGOLIN, B. 1997. Embedded systems to benefit from advances in dram technology. Comput. Des., 76-86.Google Scholar
- MARWEDEL, P. AND GOOSENS, J., Eds. 1995. Code Generation for Embedded Processors. Kluwer Academic Publishers, Hingham, MA. Google Scholar
- PANDA, P. R. 1998. Memory optimizations and exploration for embedded systems. Ph.D. Dissertation. University of California at Irvine, Irvine, CA. Google Scholar
- PANDA, P. R. AND DUTT, N. D. 1995. 1995 high level synthesis design repository. In Proceedings of the Eighth International Symposium on System Synthesis (Cannes, France, Sept. 13-15, 1995), P. G. Paulin and F. Mavaddat, Eds. ACM Press, New York, NY, 170-174. Google ScholarCross Ref
- PANDA, P. R., DUTT, N. D., AND NICOLAU, A. 1996. Memory organization for improved data cache performance in embedded processors. In Proceedings of the ACM/IEEE International Symposium on System Synthesis (Nov. 1996), ACM Press, New York, NY, 90-95. Google ScholarCross Ref
- PATTERSON, D. A. AND HENNESSY, J. L. 1994. Computer Organization & Design--The Hardware ~Software Interface. Morgan Kaufmann Publishers Inc., San Francisco, CA Google Scholar
- PRESS, W. H., FLANNERY, B. P., TEUKOLSKY, S. A., AND VETTERLING, W. T. 1988. Numerical Recipes in C: The Art of Scientific Computing. Cambridge University Press, New York, NY. Google Scholar
- RAMACHANDRAN, L., GAJSKI, D., AND CHAIYAKUL, V. 1994. An algorithm for array variable clustering. In Proceedings of the European Conference on Design Automation (Feb. 1994),Google ScholarCross Ref
- RAWAT, J. 1993. Static analysis of cache performance for real-time programming. Master's Thesis. Iowa State Univ., Ames, IA.Google Scholar
- SAGHIR, M. A. R., CHOW, P., AND LEE, C. G. 1996. Exploiting dual data-memory banks in digital signal processors. In Proceedings of the 7th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-VII, Cambridge, MA, Oct. 1-5, 1996), B. Dally and S. Eggets, Eds. ACM Press, New York, NY, 234 -243. Google ScholarDigital Library
- SCHMIT, H. AND THOMAS, D. E. 1995. Address generation for memories containing multiple arrays. In Proceedings of the 1995 IEEE /ACM International Conference on Computer-Aided Design (ICCAD-95, San Jose, CA, Nov. 5-9), R. Rudell, Ed. IEEE Computer Society Press, Los Alamitos, CA, 510-514. Google Scholar
- STOK, L. AND JESS, J. A. G. 1992. Foreground memory management in data path synthesis. Int. J. Circuits Theor. Appl. 20, 3, 235-255.Google ScholarCross Ref
- SUDARSANAM, A. AND MALIK, S. 1995. Memory bank and register allocation in software synthesis for ASIPs. In Proceedings of the 1995 IEEE/ACM International Conference on Computer-Aided Design (ICCAD-95, San Jose, CA, Nov. 5-9), R. Rudell, Ed. IEEE Computer Society Press, Los Alamitos, CA, 388-392. Google Scholar
- TOMIYAMA, H. AND YASUURA, H. 1996. Optimal code placement of embedded software for instruction caches. In Proceedings of the European Conference on Design and Test (Paris, France, Mar. 1996), 96-101. Google ScholarCross Ref
- TOMIYAMA, H. AND YASUURA, H. 1996. Size-constrained code placement for cache miss rate reduction. In Proceedings of the ACM/IEEE International Symposium on System Synthesis (Nov. 1996), ACM Press, New York, NY, 96-101. Google ScholarCross Ref
- TSENG, C. AND SIEWIOREK, D. P. 1986. Automated synthesis of data paths in digital systems. IEEE Trans. Comput.-Aided Des. 5, 3 (July 1986), 379-395.Google Scholar
- TURLEY, J. L. 1994. New processor families join embedded fray. Microprocessor Report 8, 17 (Dec.), 1-8.Google Scholar
- VANHOOF, g., BOLSENS, I., AND MAN, H. D. 1991. Compiling multi-dimensional data streams into distributed DSP ASIC memory. In Proceedings of the IEEE/ACM International Conference on Computer-Aided Design (ICCAD '91, Santa Clara, CA, Nov. 11-14, 1991), IEEE Computer Society Press, Los Alamitos, CA, 272-275.Google ScholarCross Ref
- VERBAUWHEDE, I. M., SCHEERS, C. J., AND RABAEY, J. M. 1994. Memory estimation for high level synthesis. In Proceedings of the 31st Annual Conference on Design Automation (DAC '94, San Diego, CA, June 6-10, 1994), M. Lorenzetti, Ed. ACM Press, New York, NY, 143-148. Google ScholarDigital Library
- WILSON, R. 1997. Graphics IC vendors take a shot at embedded DRAM. Elec. Eng. Times 938 (Jan.), 41-57.Google Scholar
- WUYTACK, S., CATTHOOR, F., DE JONG, G., LIN, G. B., AND MAN, H. D. 1996. Flow graph balancing for minimizing the required memory bandwidth. In Proceedings of the ACM/ IEEE International Symposium on System Synthesis (Nov. 1996), ACM Press, New York, NY, 127-132. Google ScholarCross Ref
Index Terms
- On-chip vs. off-chip memory: the data partitioning problem in embedded processor-based systems
Recommendations
Compiler-assisted dynamic scratch-pad memory management with space overlapping for embedded systems
Scratch-pad memory (SPM), a small, fast, software-managed on-chip SRAM (Static Random Access Memory) is widely used in embedded systems. With the ever-widening performance gap between processors and main memory, it is very important to reduce the ...
Exploiting off-chip memory access modes in high-level synthesis
ICCAD '97: Proceedings of the 1997 IEEE/ACM international conference on Computer-aided designMemory-intensive behaviors often contain large arrays that are synthesized into off-chip memories. With the increasing gap between on-chip and off-chip memory access delays, it is imperative to exploit the efficient access mode features of modern-day ...
Reducing off-chip memory access costs using data recomputation in embedded chip multi-processors
DAC '07: Proceedings of the 44th annual Design Automation ConferenceThere have been numerous efforts on Scratch-Pad Memory (SPM) management in the context of single CPU systems and, more recently, multi-processor architectures. This paper presents a novel SPM space utilization strategy, for embedded chip multi-processor ...
Comments