ABSTRACT
Most programming languages support a call stack in the programming model and also in the runtime system.We show that for applications targeting low-power embedded microcontrollers (MCUs), RAM usage can be significantly decreased by partially or completely eliminating the runtime callstack. We present flattening, a transformation that absorbs a function into its caller, replacing function invocations and returns with jumps. Unlike inlining, flattening does not duplicate the bodies of functions that have multiple callsites. Applied aggressively, flattening results in stack elimination. Flattening is most useful in conjunction with a lifting transformation that moves global variables into a local scope.
Flattening and lifting can save RAM. However, even more benefit can be obtained by adapting the compiler to cope with properties of flattened code. First, we show that flattening adds false paths that confuse a standard live variables analysis. The resulting problems can be mitigated by breaking spurious live-range conflicts between variables using information from the unflattened callgraph. Second, we show that the impact of high register pressure due to flattened and lifted code, and consequent spills out of the register allocator, can be mitigated by improving a compiler's stack layout optimizations. We have implemented both of these improvements in GCC, and have implemented flattening and lifting as source-to-source transformations. On a collection of applications for the AVR family of 8-bit MCUs, we show that total RAM usage can be reduced by 20% by compiling flattened and lifted programs with our improved GCC.
- Martn Abadi, Mihai Budiu, Úlfar Erlingsson, and Jay Ligatti. Control flow integrity: Principles, implementations, and applications. In Proc. of the 12th ACM Conf. on Computer and Communications Security (CCS), Alexandria, VA, November 2005. Google ScholarDigital Library
- C. Scott Ananian and Martin Rinard. Data size optimizations for Java programs. In Proc. of the 2003 Conf. on Languages, Compilers, and Tools for Embedded Systems (LCTES), pages 59--68, San Diego, CA, June 2003. Google ScholarDigital Library
- Andrew W. Appel. Compiling with Continuations. Cambridge University Press, Cambridge, MA, 1992. Google ScholarDigital Library
- Lan S. Bai, Lei Yang, and Robert P. Dick. Automated compile-time and run-time techniques to increase usable memory in MMU-less embedded systems. In Proc. of the Intl. Conf. on Compilers, Architecture, and Synthesis for Embedded Systems (CASES), Seoul, Korea, October 2006. Google ScholarDigital Library
- Volker Barthelmann. Inter-task register-allocation for static operating systems. In Proc. of the Joint Conf. on Languages, Compilers, and Tools for Embedded Systems (LCTES) and Software and Compilers for Embedded Systems (SCOPES), pages 149--154, Berlin, Germany, June 2002. Google ScholarDigital Library
- Surupa Biswas, Matthew Simpson, and Rajeev Barua. Memory overflow protection for embedded systems using run-time checks, reuse and compression. In Proc. of the Conf. on Compilers, Architecture, and Synthesis for Embedded Systems (CASES), Washington, DC, September 2004. Google ScholarDigital Library
- Dennis Brylow, Niels Damgaard, and Jens Palsberg. Static checking of interrupt-driven software. In Proc. of the 23rd Intl. Conf. on Software Engineering (ICSE), pages 47--56, Toronto, Canada, May 2001. Google ScholarDigital Library
- Dominique Chanet, Bjorn De Sutter, Bruno De Bus, Ludo Van Put, and Koen De Bosschere. System-wide compaction and specialization of the Linux kernel. In Proc. of the 2005 Conf. on Languages, Compilers, and Tools for Embedded Systems (LCTES), Chicago, IL, June 2005. Google ScholarDigital Library
- Trishul M. Chilimbi, Bob Davidson, and James R. Larus. Cache-conscious structure definition. In Proc. of the ACM SIGPLAN 1999 Conf. on Programming Language Design and Implementation (PLDI), pages 13--24, Atlanta, GA, May 1999. Google ScholarDigital Library
- Keith D. Cooper, Mary W. Hall, and Linda Torczon. An experiment with inline substitution. Software-Practice and Experience, 21(6):581--601, June 1991. Google ScholarDigital Library
- Keith D. Cooper, Mary W. Hall, and Ken Kennedy. A methodology for procedure cloning. Computer Languages, 19(2):105--118, April 1993.Google ScholarDigital Library
- Nathan Cooprider and John Regehr. Offline compression for on-chip RAM. In Proc. of the ACM SIGPLAN 2007 Conf. on Programming Language Design and Implementation (PLDI), pages 363--372, San Diego, CA, June 2007. Google ScholarDigital Library
- Jakob Engblom. Static properties of commercial embedded real-time programs, and their implication for worst-case execution time analysis. In Proc. of the 5th IEEE Real-Time Technology and Applications Symp. (RTAS), Vancouver, Canada, June 1999. Google ScholarDigital Library
- Aurélien Francillon and Claude Castelluccia. Code injection attacks on Harvard--architecture devices. In Proc. of the 15th ACM Conf. on Computer and Communications Security (CCS), Alexandria, VA, October 2008. Google ScholarDigital Library
- Jack Ganssle. The Art of Designing Embedded Systems. Newnes, 1999. Google ScholarDigital Library
- David Gay, Phil Levis, Robert von Behren, Matt Welsh, Eric Brewer, and David Culler. The nesC language: A holistic approach to networked embedded systems. In Proc. of the ACM SIGPLAN 2003 Conf. on Programming Language Design and Implementation (PLDI), pages 1--11, San Diego, CA, June 2003. Google ScholarDigital Library
- Dirk Grunwald and Richard Neves. Whole-program optimization for time and space efficient threads. In Proc. of the 7th Intl. Conf. on Architectural Support for Programming Languages and Operating Systems (ASPLOS), pages 50--59, Cambridge, MA, October 1996. Google ScholarDigital Library
- Håkon A. Hjortland. Laser video projector. http://heim.ifi.uio.no/haakoh/avr/.Google Scholar
- Chris Lattner and Vikram Adve. Transparent pointer compression for linked data structures. In Proc. of the ACM Workshop on Memory System Performance (MSP), Chicago, IL, June 2005. Google ScholarDigital Library
- Vladimir N. Makarov. The integrated register allocator for GCC. In Proc. of the GCC Developers' Summit, pages 77--90, Ottawa, Canada, July 2007. URL http://ols.fedoraproject.org/GCC/Reprints-2007/makarov-reprint.pdf.Google Scholar
- William P. McCartney and Nigamanth Sridhar. Abstractions for safe concurrent programming in networked embedded systems. In Proc. of the 4th ACM Conf. on Embedded Networked Sensor Systems (SenSys 2006), pages 167--180, Boulder, Colorado, October 2006. Google ScholarDigital Library
- Bhuvan Middha, Matthew Simpson, and Rajeev Barua. MTSS: Multi task stack sharing for embedded systems. In Proc. of the Intl. Conf. on Compilers, Architecture, and Synthesis for Embedded Systems (CASES), San Francisco, CA, September 2005. Google ScholarDigital Library
- George C. Necula, Scott McPeak, S. P. Rahul, and Westley Weimer. CIL: Intermediate language and tools for analysis and transformation of C programs. In Proc. of the Intl. Conf. on Compiler Construction (CC), pages 213--228, Grenoble, France, April 2002. Google ScholarDigital Library
- Ozcan Ozturk, Mahmut Kandemir, and Mary Jane Irwin. Increasing on-chip memory space utilization for embedded chip multiprocessors through data compression. In Proc. of the 3rd Intl. Conf. on Hardware/Software Codesign and System Synthesis (CODES+ISSS), pages 87--92, Jersey City, NJ, September 2005. Google ScholarDigital Library
- Paparazzi. The Paparazzi project, 2006. http://www.nongnu.org/paparazzi.Google Scholar
- Rodric M. Rabbah and Krishna V. Palem. Data remapping for design space optimization of embedded memory systems. ACM Trans. Embedded Computing Systems (TECS), 2(2):1--32, May 2003. Google ScholarDigital Library
- John Regehr, Alastair Reid, and Kirk Webb. Eliminating stack overflow by abstract interpretation. In Proc. of the 3rd Intl. Conf. on Embedded Software (EMSOFT), pages 306--322, Philadelphia, PA, October 2003.Google ScholarCross Ref
- TinyOS. tinyos.net. http://www.tinyos.net.Google Scholar
- Ben L. Titzer. Virgil: Objects on the head of a pin. In Proc. of the ACM Conf. on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA), pages 191--208, Portland, OR, October 2006. Google ScholarDigital Library
- Ben L. Titzer, Daniel Lee, and Jens Palsberg. Avrora: Scalable sensor network simulation with precise timing. In Proc. of the 4th Intl. Conf. on Information Processing in Sensor Networks (IPSN), pages 44--53, Los Angeles, CA, April 2005. Google ScholarDigital Library
- Christian Wawersich, Michael Stilkerich, and Wolfgang Schröder-Preikschat. An OSEK/VDX-based Multi-JVM for automotive appliances. In Proc. of the Intl. Embedded Systems Symposium, Irvine, CA, 2007.Google Scholar
- Lei Yang, Robert P. Dick, Haris Lekatsas, and Srimat Chakradhar. On-line memory compression for embedded systems. ACM Trans. Embedded Computing Systems (TECS), 2006.Google Scholar
- Youtao Zhang and Rajiv Gupta. Compressing heap data for improved memory performance. Software--Practice and Experience, 36(10):1081--1111, August 2006. Google ScholarDigital Library
Index Terms
- Eliminating the call stack to save RAM
Recommendations
Eliminating the call stack to save RAM
LCTES '09Most programming languages support a call stack in the programming model and also in the runtime system.We show that for applications targeting low-power embedded microcontrollers (MCUs), RAM usage can be significantly decreased by partially or ...
Tuning Compiler Optimizations for Simultaneous Multithreading
Special issue on the 30th annual ACM/IEEE international symposium on microarchitecture, part IISimultaneous Multithreading (SMT) is a processor architectural technique that promises to significantly improve the utilization and performance of modern wide-issue superscalar processors. An SM T processor is capable of issuing multiple instructions ...
Vectorization past dependent branches through speculation
PACT '13: Proceedings of the 22nd international conference on Parallel architectures and compilation techniquesModern architectures increasingly rely on SIMD vectorization to improve performance for floating point intensive scientific applications. However, existing compiler optimization techniques for automatic vectorization are inhibited by the presence of ...
Comments