skip to main content
article
Free Access

On-chip vs. off-chip memory: the data partitioning problem in embedded processor-based systems

Published:01 July 2000Publication History
Skip Abstract Section

Abstract

Efficient utilization of on-chip memory space is extremely important in modern embedded system applications based on processor cores. In addition to a data cache that interfaces with slower off-chip memory, a fast on-chip SRAM, called Scratch-Pad memory, is often used in several applications, so that critical data can be stored there with a guaranteed fast access time. We present a technique for efficiently exploiting on-chip Scratch-Pad memory by partitioning the application's scalar and arrayed variables into off-chip DRAM and on-chip Scratch-Pad SRAM, with the goal of minimizing the total execution time of embedded applications. We also present extensions of our proposed memory assignment strategy to handle context switching between multiple programs, as well as a generalized memory hierarchy. Our experiments on code kernels from typical applications show that our technique results in significant performance improvements.

References

  1. AHMAD, I. AND CHEN, C. Y. R. 1991. Post-processor for data path synthesis using multiport memories. In Proceedings of the IEEE/ACM International Conference on Computer-Aided Design (ICCAD '91, Santa Clara, CA, Nov. 11-14, 1991), IEEE Computer Society Press, Los Alamitos, CA, 276-279.Google ScholarGoogle ScholarCross RefCross Ref
  2. AHO, A., SETHI, R., AND ULLMAN, J. 1986. Compilers: Principles, Techniques, and Tools. Addison-Wesley, Reading, MA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. BAKSHI, S. AND GAJSKI, D. D. 1995. A memory selection algorithm for high-performance pipelines. In Proceedings of the European Conference EURO-DAC '95 with EURO-VHDL '95 on Design Automation (Brighton, UK, Sept. 18-22), G. Musgrave, Ed. IEEE Computer Society Press, Los Alamitos, CA, 124-129. Google ScholarGoogle Scholar
  4. BALAKRISHNAN, M., BANERJI, D. K., MAJUMDAR, A. K., LINDERS, J. G., AND MAJITHIA, J. C. 1990. Allocation of multiport memories in data path synthesis. IEEE Trans. Comput.- Aided Des. 7, 4 (Apr. 1990), 536-540.Google ScholarGoogle Scholar
  5. BALASA, F., CATTHOOR, F., AND DE MAN, H. 1995. Background memory area estimation for multidimensional signal processing systems. IEEE Trans. Very Large Scale Integr. Syst. 3, 2 (June 1995), 157-172. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. CATTHOOR, F. AND SVENSSON, L. 1993. Application-Driven Architecture Synthesis. Kluwer Academic Publishers, Hingham, MA. Google ScholarGoogle Scholar
  7. GAJSKI, D. D., DUTT, N. D., Wu, A. C.-H., AND LIN, S.Y.-L. 1992. High-Level Synthesis: Introduction to Chip and System Design. Kluwer Academic Publishers, Hingham, MA. Google ScholarGoogle Scholar
  8. LE GALL, D. 1991. MPEG: a video compression standard for multimedia applications. Commun. ACM 34, 4 (Apr. 1991), 46-58. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. GAREY, M. AND JOHNSON, D. 1979. Computers and Intractability: A Guide to the Theory of NP-Completeness. W. H. Freeman and Co., New York, NY. Google ScholarGoogle Scholar
  10. JHA, P. K. AND DUTT, N. D. 2000. High-level library mapping for memories. ACM Trans. Des. Autom. Electron. Syst. 5, 3 (July), 566-603. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. KARCHMER, D. AND ROSE, J. 1994. Definition and solution of the memory packing problem for field-programmable systems. In Proceedings of the IEEE /ACM International Conference on Computer Aided Design (Nov. 1994), 20-26. Google ScholarGoogle Scholar
  12. KIM, T. AND LIU, C. L. 1993. Utilization of multiport memories in data path synthesis. In Proceedings of the 30th ACM/IEEE International Conference on Design Automation (DAC '93, Dallas, TX, June 14-18), A. E. Dunlop, Ed. ACM Press, New York, NY, 298-302. Google ScholarGoogle Scholar
  13. LAM, M., ROTHBERG, E., AND WOLF, M. 1991. The cache performance and optimizations of blocked algorithms. In Proceedings of the 4th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-IV, Santa Clara, CA, Apr. 8-11), D. A. Patterson, Ed. ACM Press, New York, NY, 63-74. Google ScholarGoogle Scholar
  14. LIAO, S., DEVADAS, S., KEUTZER, K., TJIANG, S., AND WANG, A. 1995. Code optimization techniques for embedded DSP microprocessors. In Proceedings of the 32nd ACM/IEEE Conference on Design Automation (DAC '95, San Francisco, CA, June 12-16, 1995), B. T. Preas, Ed. ACM Press, New York, NY, 599-604. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. LIAO, S., DEVADAS, S., KEUTZER, K., TJIANG, S., AND WANG, A. 1995. Storage assignment to decrease code size. In Proceedings of the Conference on Programming Language Design and Implementation (SIGPLAN '95, La Jolla, CA, June 18-21), D. W. Wall, Ed. ACM Press, New York, NY, 186-195. Google ScholarGoogle Scholar
  16. LIPPENS, P. E. R., VAN MEERBERGEN, J. L., VERHAEGH, W. F. J., AND VAN DER WERF, A. 1993. Allocation of multiport memories for hierarchical data stream. In Proceedings of the International Conference on Computer-Aided Design (ICCAD '93, Santa Clara, CA, Nov. 7-11), M. Lightner and J. A. G. Jess, Eds. IEEE Computer Society Press, Los Alamitos, CA, 728-735. Google ScholarGoogle Scholar
  17. LSI LOGIC CORPORATION. 1992. CW33000 MIPS Embedded Processor User's Manual. VLSI Technologies, Inc.Google ScholarGoogle Scholar
  18. MARGOLIN, B. 1997. Embedded systems to benefit from advances in dram technology. Comput. Des., 76-86.Google ScholarGoogle Scholar
  19. MARWEDEL, P. AND GOOSENS, J., Eds. 1995. Code Generation for Embedded Processors. Kluwer Academic Publishers, Hingham, MA. Google ScholarGoogle Scholar
  20. PANDA, P. R. 1998. Memory optimizations and exploration for embedded systems. Ph.D. Dissertation. University of California at Irvine, Irvine, CA. Google ScholarGoogle Scholar
  21. PANDA, P. R. AND DUTT, N. D. 1995. 1995 high level synthesis design repository. In Proceedings of the Eighth International Symposium on System Synthesis (Cannes, France, Sept. 13-15, 1995), P. G. Paulin and F. Mavaddat, Eds. ACM Press, New York, NY, 170-174. Google ScholarGoogle ScholarCross RefCross Ref
  22. PANDA, P. R., DUTT, N. D., AND NICOLAU, A. 1996. Memory organization for improved data cache performance in embedded processors. In Proceedings of the ACM/IEEE International Symposium on System Synthesis (Nov. 1996), ACM Press, New York, NY, 90-95. Google ScholarGoogle ScholarCross RefCross Ref
  23. PATTERSON, D. A. AND HENNESSY, J. L. 1994. Computer Organization & Design--The Hardware ~Software Interface. Morgan Kaufmann Publishers Inc., San Francisco, CA Google ScholarGoogle Scholar
  24. PRESS, W. H., FLANNERY, B. P., TEUKOLSKY, S. A., AND VETTERLING, W. T. 1988. Numerical Recipes in C: The Art of Scientific Computing. Cambridge University Press, New York, NY. Google ScholarGoogle Scholar
  25. RAMACHANDRAN, L., GAJSKI, D., AND CHAIYAKUL, V. 1994. An algorithm for array variable clustering. In Proceedings of the European Conference on Design Automation (Feb. 1994),Google ScholarGoogle ScholarCross RefCross Ref
  26. RAWAT, J. 1993. Static analysis of cache performance for real-time programming. Master's Thesis. Iowa State Univ., Ames, IA.Google ScholarGoogle Scholar
  27. SAGHIR, M. A. R., CHOW, P., AND LEE, C. G. 1996. Exploiting dual data-memory banks in digital signal processors. In Proceedings of the 7th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-VII, Cambridge, MA, Oct. 1-5, 1996), B. Dally and S. Eggets, Eds. ACM Press, New York, NY, 234 -243. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. SCHMIT, H. AND THOMAS, D. E. 1995. Address generation for memories containing multiple arrays. In Proceedings of the 1995 IEEE /ACM International Conference on Computer-Aided Design (ICCAD-95, San Jose, CA, Nov. 5-9), R. Rudell, Ed. IEEE Computer Society Press, Los Alamitos, CA, 510-514. Google ScholarGoogle Scholar
  29. STOK, L. AND JESS, J. A. G. 1992. Foreground memory management in data path synthesis. Int. J. Circuits Theor. Appl. 20, 3, 235-255.Google ScholarGoogle ScholarCross RefCross Ref
  30. SUDARSANAM, A. AND MALIK, S. 1995. Memory bank and register allocation in software synthesis for ASIPs. In Proceedings of the 1995 IEEE/ACM International Conference on Computer-Aided Design (ICCAD-95, San Jose, CA, Nov. 5-9), R. Rudell, Ed. IEEE Computer Society Press, Los Alamitos, CA, 388-392. Google ScholarGoogle Scholar
  31. TOMIYAMA, H. AND YASUURA, H. 1996. Optimal code placement of embedded software for instruction caches. In Proceedings of the European Conference on Design and Test (Paris, France, Mar. 1996), 96-101. Google ScholarGoogle ScholarCross RefCross Ref
  32. TOMIYAMA, H. AND YASUURA, H. 1996. Size-constrained code placement for cache miss rate reduction. In Proceedings of the ACM/IEEE International Symposium on System Synthesis (Nov. 1996), ACM Press, New York, NY, 96-101. Google ScholarGoogle ScholarCross RefCross Ref
  33. TSENG, C. AND SIEWIOREK, D. P. 1986. Automated synthesis of data paths in digital systems. IEEE Trans. Comput.-Aided Des. 5, 3 (July 1986), 379-395.Google ScholarGoogle Scholar
  34. TURLEY, J. L. 1994. New processor families join embedded fray. Microprocessor Report 8, 17 (Dec.), 1-8.Google ScholarGoogle Scholar
  35. VANHOOF, g., BOLSENS, I., AND MAN, H. D. 1991. Compiling multi-dimensional data streams into distributed DSP ASIC memory. In Proceedings of the IEEE/ACM International Conference on Computer-Aided Design (ICCAD '91, Santa Clara, CA, Nov. 11-14, 1991), IEEE Computer Society Press, Los Alamitos, CA, 272-275.Google ScholarGoogle ScholarCross RefCross Ref
  36. VERBAUWHEDE, I. M., SCHEERS, C. J., AND RABAEY, J. M. 1994. Memory estimation for high level synthesis. In Proceedings of the 31st Annual Conference on Design Automation (DAC '94, San Diego, CA, June 6-10, 1994), M. Lorenzetti, Ed. ACM Press, New York, NY, 143-148. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. WILSON, R. 1997. Graphics IC vendors take a shot at embedded DRAM. Elec. Eng. Times 938 (Jan.), 41-57.Google ScholarGoogle Scholar
  38. WUYTACK, S., CATTHOOR, F., DE JONG, G., LIN, G. B., AND MAN, H. D. 1996. Flow graph balancing for minimizing the required memory bandwidth. In Proceedings of the ACM/ IEEE International Symposium on System Synthesis (Nov. 1996), ACM Press, New York, NY, 127-132. Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. On-chip vs. off-chip memory: the data partitioning problem in embedded processor-based systems

      Recommendations

      Reviews

      Neil Robert Karl

      An algorithm is outlined that allocates program variables to fast on-chip static random access memory (SRAM) in order to optimize program CPU execution time. The algorithm needs to be implemented in a precompiler for a target language like C++, which allocates variable addresses (the C language syntax & n = location of n ). Considerations in adjusting the precompiler algorithm are the size of the SRAM, the size of the cache, the use of bootstrap or operating system reserve of a portion of the SRAM, and the number of programs (one or more) to execute sequentially or concurrently sharing SRAM space. The memory assign algorithm (and SRAM partition for multiple programs) is developed and discussed in detail. The algorithm allocates scalar variables and constants to SRAM. Arrays smaller than SRAM are mapped to SRAM. Consideration is given to the lifetime of scalars and arrays, access frequency, various interference access counts, and nested and adjacent loops. By scanning the source language, the algorithm generates clusters of all combinations of compatible arrays that can share the same SRAM space. Then the arrays that can share the same SRAM space are grouped. Finally, a nonlinear program is used to rapidly allocate memory locations by variable value density, which is relative to the size of the largest array of the cluster. Statistics are compared for sample programs such as radar beamformer, MPEG decoder, fast Fourier transform, and matrix multiplication.

      Access critical reviews of Computing literature here

      Become a reviewer for Computing Reviews.

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Transactions on Design Automation of Electronic Systems
        ACM Transactions on Design Automation of Electronic Systems  Volume 5, Issue 3
        July 2000
        483 pages
        ISSN:1084-4309
        EISSN:1557-7309
        DOI:10.1145/348019
        Issue’s Table of Contents

        Copyright © 2000 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 1 July 2000
        Published in todaes Volume 5, Issue 3

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • article

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader