research-article

Eliminating the call stack to save RAM

Authors:
Xuejun Yang

University of Utah, Salt Lake City, UT, USA

University of Utah, Salt Lake City, UT, USA
View Profile

,
Nathan Cooprider

BAE Systems AIT, Burlington, MA, USA

BAE Systems AIT, Burlington, MA, USA
View Profile

,
John Regehr

University of Utah, Salt Lake City, UT, USA

University of Utah, Salt Lake City, UT, USA
View Profile

LCTES '09: Proceedings of the 2009 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systemsJune 2009Pages 60–69https://doi.org/10.1145/1542452.1542461

Published:19 June 2009Publication History

LCTES '09: Proceedings of the 2009 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems

Pages 60–69

ABSTRACT

Most programming languages support a call stack in the programming model and also in the runtime system.We show that for applications targeting low-power embedded microcontrollers (MCUs), RAM usage can be significantly decreased by partially or completely eliminating the runtime callstack. We present flattening, a transformation that absorbs a function into its caller, replacing function invocations and returns with jumps. Unlike inlining, flattening does not duplicate the bodies of functions that have multiple callsites. Applied aggressively, flattening results in stack elimination. Flattening is most useful in conjunction with a lifting transformation that moves global variables into a local scope.

Flattening and lifting can save RAM. However, even more benefit can be obtained by adapting the compiler to cope with properties of flattened code. First, we show that flattening adds false paths that confuse a standard live variables analysis. The resulting problems can be mitigated by breaking spurious live-range conflicts between variables using information from the unflattened callgraph. Second, we show that the impact of high register pressure due to flattened and lifted code, and consequent spills out of the register allocator, can be mitigated by improving a compiler's stack layout optimizations. We have implemented both of these improvements in GCC, and have implemented flattening and lifting as source-to-source transformations. On a collection of applications for the AVR family of 8-bit MCUs, we show that total RAM usage can be reduced by 20% by compiling flattened and lifted programs with our improved GCC.

References

Martn Abadi, Mihai Budiu, Úlfar Erlingsson, and Jay Ligatti. Control flow integrity: Principles, implementations, and applications. In Proc. of the 12th ACM Conf. on Computer and Communications Security (CCS), Alexandria, VA, November 2005. Google ScholarDigital Library
C. Scott Ananian and Martin Rinard. Data size optimizations for Java programs. In Proc. of the 2003 Conf. on Languages, Compilers, and Tools for Embedded Systems (LCTES), pages 59--68, San Diego, CA, June 2003. Google ScholarDigital Library
Andrew W. Appel. Compiling with Continuations. Cambridge University Press, Cambridge, MA, 1992. Google ScholarDigital Library
Lan S. Bai, Lei Yang, and Robert P. Dick. Automated compile-time and run-time techniques to increase usable memory in MMU-less embedded systems. In Proc. of the Intl. Conf. on Compilers, Architecture, and Synthesis for Embedded Systems (CASES), Seoul, Korea, October 2006. Google ScholarDigital Library
Volker Barthelmann. Inter-task register-allocation for static operating systems. In Proc. of the Joint Conf. on Languages, Compilers, and Tools for Embedded Systems (LCTES) and Software and Compilers for Embedded Systems (SCOPES), pages 149--154, Berlin, Germany, June 2002. Google ScholarDigital Library
Surupa Biswas, Matthew Simpson, and Rajeev Barua. Memory overflow protection for embedded systems using run-time checks, reuse and compression. In Proc. of the Conf. on Compilers, Architecture, and Synthesis for Embedded Systems (CASES), Washington, DC, September 2004. Google ScholarDigital Library
Dennis Brylow, Niels Damgaard, and Jens Palsberg. Static checking of interrupt-driven software. In Proc. of the 23rd Intl. Conf. on Software Engineering (ICSE), pages 47--56, Toronto, Canada, May 2001. Google ScholarDigital Library
Dominique Chanet, Bjorn De Sutter, Bruno De Bus, Ludo Van Put, and Koen De Bosschere. System-wide compaction and specialization of the Linux kernel. In Proc. of the 2005 Conf. on Languages, Compilers, and Tools for Embedded Systems (LCTES), Chicago, IL, June 2005. Google ScholarDigital Library
Trishul M. Chilimbi, Bob Davidson, and James R. Larus. Cache-conscious structure definition. In Proc. of the ACM SIGPLAN 1999 Conf. on Programming Language Design and Implementation (PLDI), pages 13--24, Atlanta, GA, May 1999. Google ScholarDigital Library
Keith D. Cooper, Mary W. Hall, and Linda Torczon. An experiment with inline substitution. Software-Practice and Experience, 21(6):581--601, June 1991. Google ScholarDigital Library
Keith D. Cooper, Mary W. Hall, and Ken Kennedy. A methodology for procedure cloning. Computer Languages, 19(2):105--118, April 1993.Google ScholarDigital Library
Nathan Cooprider and John Regehr. Offline compression for on-chip RAM. In Proc. of the ACM SIGPLAN 2007 Conf. on Programming Language Design and Implementation (PLDI), pages 363--372, San Diego, CA, June 2007. Google ScholarDigital Library
Jakob Engblom. Static properties of commercial embedded real-time programs, and their implication for worst-case execution time analysis. In Proc. of the 5th IEEE Real-Time Technology and Applications Symp. (RTAS), Vancouver, Canada, June 1999. Google ScholarDigital Library
Aurélien Francillon and Claude Castelluccia. Code injection attacks on Harvard--architecture devices. In Proc. of the 15th ACM Conf. on Computer and Communications Security (CCS), Alexandria, VA, October 2008. Google ScholarDigital Library
Jack Ganssle. The Art of Designing Embedded Systems. Newnes, 1999. Google ScholarDigital Library
David Gay, Phil Levis, Robert von Behren, Matt Welsh, Eric Brewer, and David Culler. The nesC language: A holistic approach to networked embedded systems. In Proc. of the ACM SIGPLAN 2003 Conf. on Programming Language Design and Implementation (PLDI), pages 1--11, San Diego, CA, June 2003. Google ScholarDigital Library
Dirk Grunwald and Richard Neves. Whole-program optimization for time and space efficient threads. In Proc. of the 7th Intl. Conf. on Architectural Support for Programming Languages and Operating Systems (ASPLOS), pages 50--59, Cambridge, MA, October 1996. Google ScholarDigital Library
Håkon A. Hjortland. Laser video projector. http://heim.ifi.uio.no/haakoh/avr/.Google Scholar
Chris Lattner and Vikram Adve. Transparent pointer compression for linked data structures. In Proc. of the ACM Workshop on Memory System Performance (MSP), Chicago, IL, June 2005. Google ScholarDigital Library
Vladimir N. Makarov. The integrated register allocator for GCC. In Proc. of the GCC Developers' Summit, pages 77--90, Ottawa, Canada, July 2007. URL http://ols.fedoraproject.org/GCC/Reprints-2007/makarov-reprint.pdf.Google Scholar
William P. McCartney and Nigamanth Sridhar. Abstractions for safe concurrent programming in networked embedded systems. In Proc. of the 4th ACM Conf. on Embedded Networked Sensor Systems (SenSys 2006), pages 167--180, Boulder, Colorado, October 2006. Google ScholarDigital Library
Bhuvan Middha, Matthew Simpson, and Rajeev Barua. MTSS: Multi task stack sharing for embedded systems. In Proc. of the Intl. Conf. on Compilers, Architecture, and Synthesis for Embedded Systems (CASES), San Francisco, CA, September 2005. Google ScholarDigital Library
George C. Necula, Scott McPeak, S. P. Rahul, and Westley Weimer. CIL: Intermediate language and tools for analysis and transformation of C programs. In Proc. of the Intl. Conf. on Compiler Construction (CC), pages 213--228, Grenoble, France, April 2002. Google ScholarDigital Library
Ozcan Ozturk, Mahmut Kandemir, and Mary Jane Irwin. Increasing on-chip memory space utilization for embedded chip multiprocessors through data compression. In Proc. of the 3rd Intl. Conf. on Hardware/Software Codesign and System Synthesis (CODES+ISSS), pages 87--92, Jersey City, NJ, September 2005. Google ScholarDigital Library
Paparazzi. The Paparazzi project, 2006. http://www.nongnu.org/paparazzi.Google Scholar
Rodric M. Rabbah and Krishna V. Palem. Data remapping for design space optimization of embedded memory systems. ACM Trans. Embedded Computing Systems (TECS), 2(2):1--32, May 2003. Google ScholarDigital Library
John Regehr, Alastair Reid, and Kirk Webb. Eliminating stack overflow by abstract interpretation. In Proc. of the 3rd Intl. Conf. on Embedded Software (EMSOFT), pages 306--322, Philadelphia, PA, October 2003.Google ScholarCross Ref
TinyOS. tinyos.net. http://www.tinyos.net.Google Scholar
Ben L. Titzer. Virgil: Objects on the head of a pin. In Proc. of the ACM Conf. on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA), pages 191--208, Portland, OR, October 2006. Google ScholarDigital Library
Ben L. Titzer, Daniel Lee, and Jens Palsberg. Avrora: Scalable sensor network simulation with precise timing. In Proc. of the 4th Intl. Conf. on Information Processing in Sensor Networks (IPSN), pages 44--53, Los Angeles, CA, April 2005. Google ScholarDigital Library
Christian Wawersich, Michael Stilkerich, and Wolfgang Schröder-Preikschat. An OSEK/VDX-based Multi-JVM for automotive appliances. In Proc. of the Intl. Embedded Systems Symposium, Irvine, CA, 2007.Google Scholar
Lei Yang, Robert P. Dick, Haris Lekatsas, and Srimat Chakradhar. On-line memory compression for embedded systems. ACM Trans. Embedded Computing Systems (TECS), 2006.Google Scholar
Youtao Zhang and Rajiv Gupta. Compressing heap data for improved memory performance. Software--Practice and Experience, 36(10):1081--1111, August 2006. Google ScholarDigital Library

Index Terms

Eliminating the call stack to save RAM
1. Computer systems organization
  1. Embedded and cyber-physical systems
  2. Real-time systems
2. Software and its engineering
  1. Software notations and tools
    1. Compilers

Recommendations

Eliminating the call stack to save RAM
LCTES '09

Most programming languages support a call stack in the programming model and also in the runtime system.We show that for applications targeting low-power embedded microcontrollers (MCUs), RAM usage can be significantly decreased by partially or ...
Read More
Tuning Compiler Optimizations for Simultaneous Multithreading
Special issue on the 30th annual ACM/IEEE international symposium on microarchitecture, part II

Simultaneous Multithreading (SMT) is a processor architectural technique that promises to significantly improve the utilization and performance of modern wide-issue superscalar processors. An SM T processor is capable of issuing multiple instructions ...
Read More
Vectorization past dependent branches through speculation
PACT '13: Proceedings of the 22nd international conference on Parallel architectures and compilation techniques

Modern architectures increasingly rely on SIMD vectorization to improve performance for floating point intensive scientific applications. However, existing compiler optimization techniques for automatic vectorization are inhibited by the presence of ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
LCTES '09: Proceedings of the 2009 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
June 2009
188 pages
ISBN:9781605583563
DOI:10.1145/1542452
General Chair:
Christoph Kirsch
University of Salzburg, Austria
,
Program Chair:
Mahmut Kandemir
Pennsylvania State University, USA
ACM SIGPLAN Notices Volume 44, Issue 7
LCTES '09
July 2009
176 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/1543136
Issue’s Table of Contents
Copyright © 2009 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 19 June 2009
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
compiler optimization
embedded software
memory allocation
memory optimizations
sensor networks
stack liveness
Qualifiers
- research-article
Conference

Acceptance Rates
LCTES '09 Paper Acceptance Rate18of81submissions,22%Overall Acceptance Rate116of438submissions,26%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 12
  Total Citations
  View Citations
- 901
  Total Downloads
- Downloads (Last 12 months)9
- Downloads (Last 6 weeks)3
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Eliminating the call stack to save RAM

LCTES '09: Proceedings of the 2009 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems

ABSTRACT

References

Cited By

Index Terms

Recommendations

Eliminating the call stack to save RAM

Tuning Compiler Optimizations for Simultaneous Multithreading

Vectorization past dependent branches through speculation