Abstract
Energy transparency is a concept that makes a program’s energy consumption visible, from hardware up to software, through the different system layers. Such transparency can enable energy optimizations at each layer and between layers, as well as help both programmers and operating systems make energy-aware decisions. In this article, we focus on deeply embedded devices, typically used for Internet of Things (IoT) applications, and demonstrate how to enable energy transparency through existing static resource analysis (SRA) techniques and a new target-agnostic profiling technique, without hardware energy measurements. Our novel mapping technique enables software energy consumption estimations at a higher level than the Instruction Set Architecture (ISA), namely the LLVM intermediate representation (IR) level, and therefore introduces energy transparency directly to the LLVM optimizer. We apply our energy estimation techniques to a comprehensive set of benchmarks, including single- and multithreaded embedded programs from two commonly used concurrency patterns: task farms and pipelines. Using SRA, our LLVM IR results demonstrate a high accuracy with a deviation in the range of 1% from the ISA SRA. Our profiling technique captures the actual energy consumption at the LLVM IR level with an average error of 3%.
Supplemental Material
Available for Download
Slide deck associated with this paper
- E. Albert, P. Arenas, S. Genaim, and G. Puebla. 2011. Closed-form upper bounds in static cost analysis. Journal of Automated Reasoning 46, 2, 161--203. Google ScholarDigital Library
- D. E. Alonso-Blas and S. Genaim. 2012. On the limits of the classical approach to cost analysis. 7460, 405--421. Google ScholarDigital Library
- ARM. 2016. Cortex-M Series Family. Retrieved February 17, 2017, from http://www.arm.com/products/processors/cortex-m.Google Scholar
- C. Blackmore, O. Ray, and K. Eder. 2015. A logic programming approach to predict effective compiler settings for embedded software. Theory and Practice of Logic Programming 15, 4--5, 481--494.Google ScholarCross Ref
- A. Bogliolo, L. Benini, G. D. Micheli, and B. Ricc. 1997. Gate-level power and current simulation of CMOS integrated circuits. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 5, 4, 473--488. Google ScholarDigital Library
- C. Brandolese, S. Corbetta, and W. Fornaciari. 2011. Software energy estimation based on statistical characterization of intermediate compilation code. In Proceedings of the 2011 International Symposium on Low Power Electronics and Design (ISLPED’11). 333--338. Google ScholarDigital Library
- G. Brat, J. A. Navas, N. Shi, and A. Venet. 2014. IKOS: A framework for static analysis based on abstract interpretation. In Software Engineering and Formal Methods. Springer, 271--277.Google Scholar
- D. Brooks, V. Tiwari, and M. Martonosi. 2000. Wattch: A framework for architectural-level power analysis and optimizations. ACM SIGARCH Computer Architecture News 28, 2, 83--94. Google ScholarDigital Library
- G. Contreras and M. Martonosi. 2005. Power prediction for Intel XScale processors using performance monitoring unit events. In Proceedings of the 2005 International Symposium on Low Power Electronics and Design (ISLPED’05). IEEE, Los Alamitos, CA, 221--226. Google ScholarDigital Library
- S. K. Debray, P. López-García, M. Hermenegildo, and N.-W. Lin. 1997. Lower bound cost estimation for logic programs. In Proceedings of the 1997 International Logic Programming Symposium. 291--305. Google ScholarDigital Library
- DWARF. 2013. The DWARF Debugging Standard. Retrieved February 17, 2017, from http://dwarfstd.org/.Google Scholar
- K. Eder, J. P. Gallagher, P. López-García, H. Muller, Z. Banković, K. Georgiou, R. Haemmerlé, et al. 2016. ENTRA: Whole-systems energy transparency. Microprocessors and Microsystems, 278--286. Google ScholarDigital Library
- K. Eder, K. Georgiou, and N. Grech (Eds.). 2013. Common Assertion Language. ENTRA Project: Whole-Systems Energy Transparency (FET project 318337). Deliverable 2.1. Available at http://entraproject.eu.Google Scholar
- J. Engblom, A. Ermedahl, and F. Stappert. 2000. Comparing different worst-case execution time analysis methods. In Proceedings of the Work-in-Progress Session of the 21st IEEE Real-Time Systems Symposium (RTSS’00).Google Scholar
- M. Field. 2014. Binary division. Retrieved February 17, 2017, from https://github.com/kg8280/EnergyTransparency/blob/master/Benchmarks/radix4Division.zip.Google Scholar
- K. Georgiou. 2016a. Energy Transparency for Deeply Embedded Programs—Benchmarks. Retrieved February 17, 2017, from https://github.com/kg8280/EnergyTransparency.Google Scholar
- K. Georgiou. 2016b. Energy Transparency for Deeply Embedded Programs—FNOP Modeling. Retrieved February 17, 2017, from https://github.com/kg8280/EnergyTransparency/blob/master/FnopModeling.pdf.Google Scholar
- N. Grech, K. Georgiou, J. Pallister, S. Kerrison, J. Morse, and K. Eder. 2015. Static analysis of energy consumption for LLVM IR programs. In Proceedings of the 18th International Workshop on Software and Compilers for Embedded Systems (SCOPES’15). ACM, New York, NY. Google ScholarDigital Library
- J. Gustafsson, A. Betts, A. Ermedahl, and B. Lisper. 2010. The Mälardalen WCET benchmarks—past, present and future. In Proceedings of the 10th International Workshop on Worst-Case Execution Time Analysis.Google Scholar
- M. Hermenegildo, G. Puebla, F. Bueno, and P. López-García. 2005. Integrated program debugging, verification, and optimization using abstract interpretation (and the Ciao system preprocessor). Science of Computer Programming 58, 1--2, 115--140. Google ScholarDigital Library
- J. Hoffmann, K. Aehlig, and M. Hofmann. 2012. Multivariate amortized resource analysis. ACM Transactions on Programming Languages and Systems 34, 3, 14. Google ScholarDigital Library
- S. J. Hollis and S. Kerrison. 2016. Swallow: Building an energy-transparent many-core embedded real-time system. In Proceedings of the 2016 Design, Automation, and Test in Europe Conference and Exhibition (DATE’16). 73--78. Google ScholarDigital Library
- J. Hauser. 2014. Berkeley SoftFloat. Retrieved February 17, 2017, from http://www.jhauser.us/arithmetic/SoftFloat.html.Google Scholar
- M. E. A. Ibrahim, M. Rupp, and H. A. H. Fahmy. 2008. Power estimation methodology for VLIW digital signal processors. In Proceedings of the 2008 42nd Asilomar Conference on Signals, Systems, and Computers. IEEE, Los Alamitos, CA, 1840--1844.Google ScholarCross Ref
- R. Jayaseelan, T. Mitra, and X. Li. 2006. Estimating the worst-case energy consumption of embedded software. In Proceedings of the 12th IEEE Real-Time and Embedded Technology and Applications Symposium. 81--90. Google ScholarDigital Library
- S. Kerrison and K. Eder. 2015. Energy modeling of software for a hardware multithreaded embedded microprocessor. ACM Transactions on Embedded Computing Systems 14, 3, 56:1--56:25. Google ScholarDigital Library
- P. C. Kocher, J. Jaffe, and B. Jun. 1999. Differential power analysis. In Proceedings of the 19th Annual International Cryptology Conference on Advances in Cryptology (CRYPTO’99). 388--397. Google ScholarDigital Library
- C. Lattner and V. S. Adve. 2004. LLVM: A compilation framework for lifelong program analysis and transformation. In Proceedings of the International Symposium on Code Generation and Optimization: Feedback-Directed and Runtime Optimization (CGO’04). 75--88. Google ScholarDigital Library
- C. Lattner and D. Patel. 2014. Extensible Metadata in LLVM IR. Retrieved February 17, 2017, from http://blog.llvm.org/2010/04/extensible-metadata-in-llvm-ir.html.Google Scholar
- Y.-T. S. Li and S. Malik. 1995. Performance analysis of embedded software using implicit path enumeration. In Proceedings of the 32nd Annual ACM/IEEE Design Automation Conference (DAC’95). ACM, New York, NY, 456--461. Google ScholarDigital Library
- Y.-T. S. Li and S. Malik. 1997. Performance analysis of embedded software using implicit path enumeration. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 16, 12, 1477--1487. Google ScholarDigital Library
- C.-Y. Lin, C.-B. Kuan, W.-L. Shih, and J. K. Lee. 2015. Compilers for low power with design patterns on embedded multicore systems. Journal of Signal Processing Systems 80, 3, 277--293. Google ScholarDigital Library
- U. Liqat, S. Kerrison, A. Serrano, K. Georgiou, P. Lopez-Garcia, N. Grech, M. V. Hermenegildo, and K. Eder. 2014. Energy consumption analysis of programs based on XMOS ISA-level models. In Proceedings of the 23rd International Symposium on Logic-Based Program Synthesis and Transformation (LOPSTR’13).Google Scholar
- LLVMorg. 2014. The LLVM Compiler Infrastructure. Retrieved February 17, 2017, from http://www.llvm.org/.Google Scholar
- D. May. 2009. The XMOS XS1 Architecture. Retrieved February 17, 2017, from https://www.xmos.com/download/private/The-XMOS-XS1-Architecture%281.0%29.pdf.Google Scholar
- J. Navas, M. Méndez-Lojo, and M. Hermenegildo. 2008. Safe upper-bounds inference of energy consumption for Java bytecode applications. In Proceedings of the 6th NASA Langley Formal Methods Workshop (LFM’08).Google Scholar
- J. Navas, M. Méndez-Lojo, and M. Hermenegildo. 2009. User-definable resource usage bounds analysis for Java bytecode. Electronic Notes in Theoretical Computer Science 253, 5, 65--82. Google ScholarDigital Library
- J. Navas, E. Mera, P. López-García, and M. Hermenegildo. 2007. User-definable resource bounds analysis for logic programs. In Logic Programs. Lecture Notes in Computer Science, Vol. 4670. Springer, 348--363. Google ScholarDigital Library
- G. Ottosson and M. Sjodin. 1997. Worst-case execution time analysis for modern hardware architectures. In Proceedings of the ACM SIGPLAN Workshop on Languages, Compilers, and Tools for Real-Time Systems (LCT-RTS’97 ). 47--55.Google Scholar
- O. Ozturk, K. Mahmut, and C. Guangyu. 2013. Compiler-directed energy reduction using dynamic voltage scaling and voltage islands for embedded systems. IEEE Transactions on Computers 62, 2, 268--278. Google ScholarDigital Library
- J. Pallister, J. Bennett, and S. Hollis. 2013. The BEEBS Benchmark Suite. Retrieved February 17, 2017, from http://www.cs.bris.ac.uk/Research/Micro/beebs.jsp.Google Scholar
- J. Pallister, K. Eder, and S. J. Hollis. 2015a. Optimizing the flash-RAM energy trade-off in deeply embedded systems. In Proceedings of the 2015 IEEE/ACM International Symposium on Code Generation and Optimization (CGO’15). 115--124. Google Scholar
- J. Pallister, K. Eder, S. J. Hollis, and J. Bennett. 2014. A high-level model of embedded flash energy consumption. In Proceedings of the 2014 International Conference on Compilers, Architecture, and Synthesis for Embedded Systems (CASES’14). ACM, New York, NY, Article No. 20. Google ScholarDigital Library
- J. Pallister, S. Kerrison, J. Morse, and K. Eder. 2015b. Data dependent energy modelling: A worst case perspective. arXiv:1505.03374.Google Scholar
- D. Potop-Butucaru and I. Puaut. 2013. Integrated worst-case execution time estimation of multicore applications. In Proceedings of the 13th International Workshop on Worst-Case Execution Time Analysis. 21--31.Google Scholar
- S. Rele, S. Pande, S. Onder, and R. Gupta. 2002. Optimizing Static Power Dissipation by Functional Units in Superscalar Processors. Springer, Berlin, Germany, 261--275. Google ScholarDigital Library
- M. Rosendahl. 1989. Automatic complexity analysis. In Proceedings of the 4th International Conference on Functional Programming Languages and Computer Architecture (FPCA’89). ACM, New York, NY, 144--156. Google ScholarDigital Library
- M. Sami, D. Sciuto, C. Silvano, and V. Zaccaria. 2002. An instruction-level energy model for embedded VLIW architectures. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 21, 9, 998--1010. Google ScholarDigital Library
- D. Sarta, D. Trifone, and G. Ascia. 1999. A data dependent approach to instruction level power estimation. In Proceedings of the 1999 IEEE Alessandro Volta Memorial Workshop on Low-Power Design. 182--190. Google ScholarDigital Library
- M. P. Schellekens. 2010. MOQA: Unlocking the potential of compositional static average-case analysis. Journal of Logic and Algebraic Programming 79, 1, 61--83.Google ScholarCross Ref
- S. Schubert, D. Kostic, W. Zwaenepoel, and K. G. Shin. 2012. Profiling software for energy consumption. In Proceedings of the 2012 IEEE International Conference on Green Computing and Communications (GreenCom’12). 515--522. Google ScholarDigital Library
- Y. S. Shao and D. Brooks. 2013. Energy characterization and instruction-level energy model of Intel’s Xeon Phi processor. In Proceedings of the International Symposium on Low Power Electronics and Design (ISLPED’13). IEEE, Los Alamitos, CA, 389--394. Google ScholarDigital Library
- W. Shih, Y.-P. You, C.-W. Huang, and J. K. Lee. 2014. Compiler optimization for reducing leakage power in multithread BSP programs. ACM Transactions on Design Automation of Electronic Systems 20, 1, Article No. 9. Google ScholarDigital Library
- A. Srivastava and A. Eustace. 1994. ATOM: A system for building customized program analysis tools. In Proceedings of the ACM SIGPLAN 1994 Conference on Programming Language Design and Implementation (PLDI’94). ACM, New York, NY, 196--205. Google ScholarDigital Library
- S. Steinke, M. Knauer, L. Wehmeyer, and P. Marwedel. 2001. An accurate and fine grain instruction-level energy model supporting software optimizations. In Proc.eedings of the 2001 International Workshop on Power and Timing Modeling, Optimization, and Simulation (PATMOS’01).Google Scholar
- H. Theiling and C. Ferdinand. 1998. Combining abstract interpretation and ILP for microarchitecture modelling and program path analysis. In Proceedings of the 19th IEEE Real-Time Systems Symposium. 144--153. Google ScholarDigital Library
- V. Tiwari, S. Malik, A. Wolfe, and M. Tien-Chien Lee. 1996. Instruction level power analysis and optimization of software. Journal of VLSI Signal Processing Systems for Signal, Image, and Video Technology 13, 2--3, 223--238.Google ScholarCross Ref
- J. M. Townley. 2013. Practical Programming for Static Average-Case Analysis: The MOQA Investigation. Ph.D. Dissertation. University College Cork.Google Scholar
- P. Vasconcelos and K. Hammond. 2003. Inferring cost equations for recursive, polymorphic and higher-order functional programs. In Implementation of Functional Languages. Lecture Notes in Computer Science, Vol. 3145. Springer, 86--101. Google ScholarDigital Library
- P. Wägemann, T. Distler, T. Hönig, H. Janker, R. Kapitza, and W. Schröder-Preikschat. 2015. Worst-case energy consumption analysis for energy-constrained embedded systems. In Proceedings of the 2015 27th Euromicro Conference on Real-Time Systems (ECRTS’15). Google ScholarDigital Library
- D. Watt. 2009. Programming XC on XMOS Devices. XMOS Limited. http://books.google.co.uk/books?id=81klKQEACAAJ.Google Scholar
- V. M. Weaver, M. Johnson, K. Kasichayanula, J. Ralph, P. Luszczek, D. Terpstra, and S. Moore. 2012. Measuring energy and power with PAPI. In Proceedings of the 2012 41st International Conference on Parallel Processing Workshops (ICPPW’12). IEEE, Los Alamitos, CA, 262--268. Google ScholarDigital Library
- B. Wegbreit. 1975. Mechanical program analysis. Communications of the ACM 18, 9, 528--539. Google ScholarDigital Library
- T. Wei, J. Mao, W. Zou, and Y. Chen. 2007. A new algorithm for identifying loops in decompilation. In Static Analysis. Lecture Notes in Computer Science, Vol. 4634. Springer, 170--183. Google ScholarDigital Library
- R. Wilhelm, J. Engblom, A. Ermedahl, N. Holsti, S. Thesing, D. Whalley, G. Bernat, et al. 2008. The worst-case execution-time problem—overview of methods and survey of tools. ACM Transactions on Embedded Computing Systems 7, 3, Article No. 36. Google ScholarDigital Library
- XMOS. 2014a. Code to perform standard DSP functions, such as Biquads, FIRs, sample rate conversion. Retrieved February 17, 2017, from https://github.com/xcore/sc_dsp_filters.Google Scholar
- XMOS. 2014b. xTIMEcomposer. Retrieved February 17, 2017, from https://www.xmos.com/support/xtools.Google Scholar
- XMOS. 2016a. Debug with Printf in Real-Time. Retrieved February 17, 2017, from https://www.xmos.com/support/tools/other?subcategory=8component=14774.Google Scholar
- XMOS. 2016b. xCORE-Analog SliceKIT. Retrieved February 17, 2017, from https://www.xmos.com/support/boards?product=17554.Google Scholar
- F. Zuleger, S. Gulwani, M. Sinn, and H. Veith. 2012. Bound analysis of imperative programs with the size-change abstraction (extended version). arXiv:1203.5303. Google ScholarDigital Library
Index Terms
- Energy Transparency for Deeply Embedded Programs
Recommendations
Evaluation of processor code efficiency for embedded systems
ICS '01: Proceedings of the 15th international conference on SupercomputingThis paper evaluates the code efficiency of the ARM, Java, and x86 instruction sets by compiling the SPEC CPU95/ CPU2000/JVM98 and CaffeineMark benchmarks, in terms of code sizes, basic block sizes, instruction distributions, and average instruction ...
Register Allocation for Compressed ISAs in LLVM
CC 2023: Proceedings of the 32nd ACM SIGPLAN International Conference on Compiler ConstructionWe present an adaptation to the LLVM greedy register allocator to improve code density for compressed RISC ISAs.
Many RISC architectures have extensions defining smaller encodings for common instructions, typically 16 rather than 32 bits wide. However,...
WCET-aware re-scheduling register allocation for real-time embedded systems with clustered VLIW architecture
LCTES '12Worst-Case Execution Time (WCET) is one of the most important metrics in real-time embedded system design. For embedded systems with clustered VLIW architecture, register allocation, instruction scheduling, and cluster assignment are three key ...
Comments