skip to main content
research-article
Free Access

A hardware/software framework for instruction and data scratchpad memory allocation

Published:07 May 2010Publication History
Skip Abstract Section

Abstract

Previous researches show that a scratchpad memory device consumed less energy than a cache device with the same capacity. In this article, we locate the scratchpad memory (SPM) in the top level of the memory hierarchy to reduce the energy consumption. To take the advantage of a SPM, we address two issues of utilizing a SPM. First, the program's locality should be improved. The second issue is SPM management. To tackle these two issues, we present a hardware/software framework for dynamically allocating both instructions and data in SPM. The software flow could be divided into three phases: locality improving, locality extraction, and runtime SPM management. Without modifying the original compiler and the source code, we improve the locality of a program. An optimization algorithm is proposed to extract the SPM allocations. At runtime, an SPM management program is employed. In hardware, an address translation logic (ATL) is proposed to reduce the overhead of SPM management.

The results show that the proposed framework can reduce energy delay product (EDP) by 63%, on average, when compared with the traditional cache architecture. The reduction in EDP is contributed by properly allocating both instructions and data in SPM. By allocating only instructions in SPM, the EDPs are reduced by 45%, on average. By allocating only data in SPM, the EDPs are reduced by 14%, on average.

References

  1. Angiolini, F., Benini, L., and Caprara, A. 2005. An efficient profile-based algorithm for scratchpad memory partitioning. IEEE Trans. Comput. Aid. Des. Integr. Circuits Syst. 24, 1660--1676. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Avissar, O., Barua, R., and Stewart, D. 2001. Heterogeneous memory management for embedded systems. In Proceedings of the International Conference on Compilers, Architecture, and Synthesis for Embedded Systems. ACM, New York, 34--43. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Avissar, O., Barua, R., and Stewart, D. 2002. An optimal memory allocation scheme for scratch-pad-based embedded systems. ACM Trans. Embed. Comput. Syst. 1, 1, 6--26. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Banakar, R., Steinke, S., Lee, B.-S., Balakrishnan, M., and Marwedel, P. 2002. Scratchpad memory: Design alternative for cache on-chip memory in embedded systems. In Proceedings of the 10th International Symposium on Hardware/Software Codesign. ACM, New York, 73--78. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Brockmeyer, E., Miranda, M., Corporaal, H., and Catthoor, F. 2003. Layer assignment techniques for low energy in multi-layered memory organizations. In Proceedings of the Conference on Design, Automation and Test in Europe. IEEE, Los Alamitos, CA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Chen, T.-F. and Baer, J.-L. 1995 Effective hardware-based data prefetching for high-performance processors. IEEE Trans. Comput. 44, 5, 609--623 Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Chen, Z.-H. 2009. NCKU SPM Simulator. http://code.google.com/p/nckuspmsimulator/In Proceedings of the Conference on Measurement and Modeling of Computer Systems, ACM, 128--137.Google ScholarGoogle Scholar
  8. Dominguez, A., Udayakumaran, S., and Barua, R. 2005. Heap data allocation to scratch-pad memory in embedded systems. J. Embed. Comput. 1, 4, 521--540. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Egger, B., Lee, J., and Shin, H. 2006. Scratchpad memory management for portable systems with a memory management unit. In Proceedings of the 6th International Conference on Embedded Software. ACM, New York, 321--330. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Francesco, P., Marchal, P., Atienza, D., Benini, L., Catthoor, F., MENDIAS, J. M. 2004. An integrated hardware/software approach for runtime scratchpad management. In Proceedings of the 41st Annual Conference on Design Automation. ACM, New York, 238--243. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Fraser, C. W. 1991. A retargetable compiler for ANSI C. SIGPLAN Notice, 26, 29--43. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Guthaus, M. R., Ringenberg, J. S., Ernst, D., Austin, T. M., Mudge, T., and Brown, R. B. 2001. MiBench: A free, commercially representative embedded benchmark suite. In Proceedings of the International Workshop on Workload Characterization. IEEE, Los Alamitos, CA, 3--14. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Hallnor, E. G. and Reinhardt, S. K. 2000. A fully associative software-managed cache design. Comput. Archit. News 28, 2, 107--116. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Hatfield, D. J. and Gerald, J. G. 1971. Program restructuring for virtual memory. IBM Syst. J. 10, 3, 168--192. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Janapsatya, A., Parameswaran, S., and Ignjatovi, A. 2004. Hardware/software managed scratchpad memory for embedded system. In Proceedings of the International Conference on Computer-Aided Design. IEEE, Los Almitos, CA, 370--377. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Janapsatya, A., Ignjatovi, A., and Parameswaran, S. 2006. A novel instruction scratchpad memory optimization method based on concomitance metric. In Proceedings of the Conference on Asia South Pacific Design Automation. IEEE, Los Alamitos, CA, 612--627. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Kalamationos, J. and Kaeli, D. R. 1998. Temporal-based procedure reordering for improved instruction cache performance. In Proceedings of the 4th International Symposium on High-Performance Computer Architecture. IEEE, Los Alamitos, CA, 244--253. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Kandemir, M., Ramanujam, J., Irwin, M. J., Vijaykrishnan, N., Kadayif, I., and Parikh, A. 2001. Dynamic management of scratchpad memory space. In Proceedings of the 38th Annual Design Automation Conference. ACM, New York, 690--695. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Kandemir, M., Kadayif, I., and Sezer, U. 2001. Exploiting scratchpad memory using Presburger formulas. In Proceedings of the 14th International Symposium on Systems Synthesis. ACM, New York. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Kirovski, D., Lee, C., Potkonjak, M., and Mangione-Smith, W. H. 1999. Application-driven synthesis of memory-intensive systems-on-chip. IEEE Trans. Comput. Aid. Des. Integr. Circuits Syst. 18, 9, 1316--1326. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Micron Technique, Inc. 2009. System Power Calculator. http://www.micron.com/support/part_ info/powercalc.Google ScholarGoogle Scholar
  22. Nguyen, N., Dominguez, A., and Barua, R. 2005. Memory allocation for embedded systems with a compile-time-unknown scratchpad size. In Proceedings of the International Conference on Compilers, Architectures and Synthesis for Embedded Systems. ACM, New York, 115--125. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Panda, P. R., Dutt, N. D., and Nicolau, A. 1997. Efficient utilization of scratchpad memory in embedded processor applications. In Proceedings of the European Conference on Design and Test. IEEE, Los Alamitos, CA, 7--11. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Panda, P. R., Nikil, D. D., and Alexandru, N. 2000. On-chip vs. off-chip memory: The data partitioning problem in embedded processor-based systems. ACM Trans. Autom. Electron. Syst. 5, 3, 682--704. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Park, S., Park, H.-W., and Ha, S. 2007. A novel technique to use scratch-pad memory for stack management. In Proceedings of the Conference on Design, Automation and Test in Europe. ACM, New York, 1478--1483. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Pettis, K. and Hansen, R. C. 1990. Profile guided code positioning. SIGPLAN Notice, 25, 6, 16-27. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Pyka, R., Fabach, C., Verma, M., Falk, H., and Marwedel, P. 2007. Operating system integrated energy aware scratchpad allocation strategies for multiprocess applications. In Proceedings of the 10th International Workshop on Software and Compilers for Esystems. ACM, New York, 41--50 Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Sjodin, J., Fr Derberg, B., and Lindgren, T. 1998. Allocation of global data objects in on-chip RAM. In Proceedings of the Conference on Compiler and Architecture Support for Embedded Computing Systems. ACM, New York.Google ScholarGoogle Scholar
  29. Sjodin, J. and Platen, C. V. 2001. Storage allocation for embedded processors. In Proceedings of the International Conference on Compilers, Architecture, and Synthesis for Embedded Systems. ACM, New York, 15--23 Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Steinke, S., Grunwald, N., Wehmeyer, L., Banakar, R., Balakrishnan, M., and Marwedel, P. 2002a. Reducing energy consumption by dynamic copying of instructions onto on-chip memory. In Proceedings of the 15th International Symposium on System Synthesis. ACM, New York, 213--218. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Steinke, S., Wehmeyer, L., Lee, B.-S., and Marwedel, P. 2002b. Assigning program and data objects to scratchpad for energy reduction. In Proceedings of the Conference on Design, Automation and Test in Europe. IEEE, Los Alamitos, CA, 409--415. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Udayakumaran, S. and Barua, R. 2003. Compiler-decided dynamic memory allocation for scratchpad-based embedded systems. In Proceedings of the International Conference on Compilers, Architecture and Synthesis for Embedded Systems. ACM, New York, 276--286. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Udayakumaran, S., Dominguez, A., and Barua, R. 2006a. Dynamic allocation for scratchpad memory using compile-time decisions. ACM Trans. Embed. Comput. Syst. 5, 2, 472--511. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Udayakumaran, S. and Barua, R. 2006b. An integrated scratchpad allocator for affine and non-affine code. In Proceedings of the Conference on Design, Automation and Test in Europe. IEEE, Los Alamitos, CA, 925--930. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Verma, M., Wehmeyer, L., and Marwedel, P. 2004a. Cache-aware scratchpad allocation algorithm. In Proceedings Design, Automation and Test in Europe Conference and Exhibition, IEEE, Los Alamitos, CA, 1264--1269. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Verma, M., Wehmeyer, L., and Marwedel, P. 2004b. Dynamic overlay of scratchpad memory for energy minimization. In Proceedings of the 2nd International Conference on Hardware/Software Co-Design and System Synthesis. ACM, New York, 104--109. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Verma, M., Petzold, K., Wehmeyer, L., Falk, H., and Marwedel, P. 2005. Scratchpad sharing strategies for multiprocess embedded systems: A first approach. In Proceedings of the 3rd Workshop on Embedded Systems for Real-Time Multimedia. IEEE, Los Alamitos, CA, 115--120.Google ScholarGoogle Scholar
  38. Wehmeyer, L., Helmig, U., and Marwedel, P. 2004. Compiler-optimized usage of partitioned memories. In Proceedings of the 3rd Workshop on Memory Performance Issues: In Conjunction with the 31st International Symposium on Computer Architecture. ACM, New York, 114--120. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Wilton, S. and Jouppi, N. 1996. CACTI: An enhanced cache access and cycle time model. IEEE J. Solid State Circuits 31, 5, 677--688.Google ScholarGoogle ScholarCross RefCross Ref
  40. Wolf, M. E. and Lam, M. S. 1991. A data locality optimizing algorithm. SIGPLAN Notice 26, 6, 30--44. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. A hardware/software framework for instruction and data scratchpad memory allocation

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM Transactions on Architecture and Code Optimization
          ACM Transactions on Architecture and Code Optimization  Volume 7, Issue 1
          April 2010
          151 pages
          ISSN:1544-3566
          EISSN:1544-3973
          DOI:10.1145/1736065
          Issue’s Table of Contents

          Copyright © 2010 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 7 May 2010
          • Accepted: 1 December 2009
          • Revised: 1 November 2009
          • Received: 1 November 2009
          Published in taco Volume 7, Issue 1

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article
          • Research
          • Refereed

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader