Abstract
Today multicore platforms are already prevalent solutions for modern embedded systems. In the future, embedded platforms will have an even more increased processor core count, composing many-core platforms. In addition, applications are becoming more complex and dynamic and try to efficiently utilize the amount of available resources on the embedded platforms. Efficient memory utilization is a key challenge for application developers, especially since memory is a scarce resource and often becomes the system's bottleneck. To cope with this dynamism and achieve better memory footprint utilization (low memory fragmentation) application developers resort to the usage of dynamic memory (heap) management techniques, by allocating and deallocating data at runtime. Moreover, overall power consumption is another key challenge that needs to be taken into consideration. Towards this, designers employ the usage of Dynamic Voltage and Frequency Scaling (DVFS) mechanisms, adapting to the application's computational demands at runtime. In this article, we propose the combination of dynamic memory management techniques with DVFS ones. This is performed by integrating, within the memory manager, runtime monitoring mechanisms that steer the DVFS mechanisms to adjust clock frequency and voltage supply based on heap performance. The proposed approach has been evaluated on a distributed shared-memory many-core platform composed of multiple LEON3 processors interconnected by a Network-on-Chip infrastructure, supporting DVFS. Experimental results show that by using the proposed method for monitoring and applying DVFS mechanisms the power consumption concerning dynamic memory management was reduced by approximately 37%. In addition we present the trade-offs the proposed approach. Last, by combining the developed method with heap fragmentation-aware dynamic memory managers, we achieve low heap fragmentation values combined with low power consumption.
- Aeroflex Gaisler. 2012. Leon3 processor. online.Google Scholar
- Agarwala, S., Rajagopal, A., et al. 2007. A 65nm c64x+ multi-core dsp platform for communications infrastructure. In Proceedings of the IEEE International Solid-State Circuits Conference. 262--601.Google Scholar
- Anagnostopoulos, I., Xydis, S., Bartzas, A., Lu, Z., Soudris, D., and Jantsch, A. 2011. Custom microcoded dynamic memory management for distributed on-chip memory organizations. IEEE Embedded Sys. Lett. 3, 2, 66--69. Google ScholarDigital Library
- Beigné, E., Clermidy, F., Miermont, S., and Vivet, P. 2008. Dynamic voltage and frequency scaling architecture for units integration within a GALS NoC. In Proceedings of the 2nd ACM/IEEE International Symposium on Networks-on-Chip. IEEE, 129--138. Google ScholarDigital Library
- Berger, E. D., McKinley, K. S., Blumofe, R. D., and Wilson, P. R. 2000. Hoard: A scalable memory allocator for multithreaded applications. SIGPLAN Not. 35, 11. Google ScholarDigital Library
- Bhatti, M., Belleudi, C., and Auguin, M. 2010. An inter-task real time DVFS scheme for multiprocessor embedded systems. In Proceedings of the Conference on Design and Architectures for Signal and Image Processing. 136--143.Google Scholar
- Borkar, S. 2007. Thousand core chips: A technology perspective. In Proceedings of the IEEE/ACM Design Automation Conference. 746--749. Google ScholarDigital Library
- Chabloz, J.-M. and Hemani, A. 2009. A flexible communication scheme for rationally-related clock frequencies. In Proceedings of the IEEE International Conference on Computer Design. IEEE. 109--116. Google ScholarDigital Library
- Chabloz, J.-M. and Hemani, A. 2010a. Distributed dvfs using rationally-related frequencies and discrete voltage levels. In Proceedings of the International Symposium on Low-Power Electronics and Design. ACM, 247--252. Google ScholarDigital Library
- Chabloz, J.-M. and Hemani, A. 2010b. Lowering the latency of interfaces for rationally-related frequencies. In Proceedings of the IEEE International Conference on Computer Design. 23--30.Google Scholar
- Chabloz, J.-M. and Hemani, A. 2012. Power Management Architecture in McNoC. Springer, 55.Google Scholar
- Chang, J. M. and Gehringer, E. F. 1996. A high-performance memory allocator for object-oriented systems. IEEE Trans. Comput. 45, 3, 357--366. Google ScholarDigital Library
- Chapiro, D. M. 1985. Globally-asynchronous locally-synchronous systems (performance, reliability, digital). Ph.D. thesis. AAI8506166. Google ScholarDigital Library
- Chen, X., Lu, Z., Jantsch, A., and Chen, S. 2010. Supporting distributed shared memory on multi-core network-on-chips using a dual microcoded controller. In Proceedings of the Conference and Exhibition on Design, Automation and Test in Europe. 39--44. Google ScholarDigital Library
- Dean, J. and Ghemawat, S. 2008. MapReduce: Simplified data processing on large clusters. Commun. ACM 51, 1, 107--113. Google ScholarDigital Library
- Gutnik, V. and Chandrakasan, A. P. 1997. Embedded power supply for low-power dsp. IEEE Trans. Very Large Scale Integr. Syst. 5, 425--435. Google ScholarDigital Library
- Herbert, S. and Marculescu, D. 2007. Analysis of dynamic voltage/frequency scaling in chipmultiprocessors. In Proceedings of the International Symposium on Low-Power Electronics and Design. ACM, 38--43. Google ScholarDigital Library
- Hirata, K. and Goodacre, J. 2007. ARM MPCore; The streamlined and scalable ARM11 processor core. In Proceedings of the Asia and South Pacific Design Automation Conference. IEEE, 747--748. Google ScholarDigital Library
- Horowitz, M., Indermaur, T., and Gonzalez, R. 1994. Low-power digital design. In Proceedings of the IEEE Symposium on Low Power Electronics. 8--11.Google Scholar
- Iyengar, A. K. 1993. Parallel dynamic storage allocation algorithms. In Proceedings of the 5th IEEE Symposium on Parallel and Distributed Processing. Google ScholarDigital Library
- Larson, P. and Krishnan, M. 1998. Memory allocation for long-running server applications. In Proceedings of the International Symposium on Memory Management. Google ScholarDigital Library
- Lea, D. 2007. A memory allocator. online, http://gee.cs.oswego.edu/dl/html/malloc.Google Scholar
- Mamagkakis, S., Atienza, D., Poucet, C., Catthoor, F., and Soudris, D. 2006. Energy-efficient dynamic memory allocators at the middleware level of embedded systems. In Proceedings of the ACM & IEEE International Conference on Embedded Software. ACM, 215--222. Google ScholarDigital Library
- Mendias, J. M., Mamagkakis, S., Soudris, D., and Catthoor, F. 2006. Systematic dynamic memory management design methodology for reduced memory footprint. ACM Trans. Des. Autom. Electron. Syst. 11, 2, 465--489. Google ScholarDigital Library
- Monchiero, M., Palermo, G., Silvano, C., and Villa, O. 2007. Exploration of distributed shared memory architectures for NoC-based multiprocessors. J. Syst. Archit. 53, 10, 719--732. Google ScholarDigital Library
- Sakurai, T. and Newton, A. 1990. Alpha-power law mosfet model and its applications to cmos inverter delay and other formulas. IEEE J. Solid-State Circ. 25, 2, 584--594.Google ScholarCross Ref
- Shalan, M. and Mooney, V. J. 2002. Hardware support for real-time embedded multiprocessor system-on-a-chip memory management. In Proceedings of the International Workshop on Hardware/Software Codesign. ACM, 79--84. Google ScholarDigital Library
- Shin, Y., Choi, K., and Sakurai, T. 2000. Power optimization of real-time embedded systems on variable speed processors. In Proceedings of the IEEE International Conference on Computer-Aided Design. IEEE, 365--368. Google ScholarDigital Library
- SIA. 2011. International Technology Roadmap for Semiconductors. Semiconductor Industry Association.Google Scholar
- Talbot, J., Yoo, R. M., and Kozyrakis, C. 2011. Phoenix++: Modular MapReduce for shared-memory systems. In Proceedings of the 2nd International Workshop on MapReduce. ACM, 9--16. Google ScholarDigital Library
- Teehan, P., Greenstreet, M., and Lemieux, G. 2007. A survey and taxonomy of GALS design styles. IEEE Des. Test 24, 418--428. Google ScholarDigital Library
- Tran, A. T., Truong, D. N., and Baas, B. M. 2009. A GALS many-core heterogeneous DSP platform with source-synchronous on-chip interconnection network. In Proceedings of the 3rd ACM/IEEE International Symposium on Networks-on-Chip. IEEE, 214--223. Google ScholarDigital Library
- Vee, V.-Y. and Hsu, W.-J. 1999. A scalable and efficient storage allocator on shared memory multiprocessors. In Proceedings of the International Symposium on Pervasive Systems, Algorithms, and Networks. 230--235. Google ScholarDigital Library
- Vo, K. P. 1996. Vmalloc: A general and efficient memory allocator. Softw. Pract. Exper. 26, 1--18.Google ScholarCross Ref
- Wilson, P., Johnstone, M. S., Neely, M., and Boles, D. 1995. Dynamic storage allocation: A survey and critical review. In Memory Management, Lecture Notes in Computer Science, vol. 986. Springer, 1--116. Google ScholarDigital Library
- Xydis, S., Bartzas, A., Anagnostopoulos, I., Soudris, D., and Pekmestzi, K. 2010. Custom mutli-threaded dynamic memory management for multiprocessor system-on-chip platforms. In Proceedings of the International Conference on Embedded Computer Systems. 102--109.Google Scholar
- Yoo, R. M., Roamno, A., and Kozurakis, C. 2009. Phoenix rebirth: Scalable mapreduce on a large-scale shared-memory system. In Proceedings of the IEEE International Symposium on Workload Characterization. IEEE, 198--207. Google ScholarDigital Library
Index Terms
- Power-aware dynamic memory management on many-core platforms utilizing DVFS
Recommendations
Optimal DPM and DVFS for frame-based real-time systems
Special Issue on High-Performance Embedded Architectures and CompilersDynamic Power Management (DPM) and Dynamic Voltage and Frequency Scaling (DVFS) are popular techniques for reducing energy consumption. Algorithms for optimal DVFS exist, but optimal DPM and the optimal combination of DVFS and DPM are not yet solved.
In ...
Latency-aware DVFS for efficient power state transitions on many-core architectures
Energy efficiency is quickly becoming a first-class design constraint in high-performance computing (HPC). We need more efficient power management solutions to save energy costs and carbon footprint of HPC systems. Dynamic voltage and frequency scaling (...
Efficient system-level prototyping of power-aware dynamic memory managers for embedded systems
Special issue: Low-power design techniquesIn the near future, portable embedded devices must run multimedia and wireless network applications with enormous computational performance (1-40GOPS) requirements at a low energy consumption (0.1-2W). In these applications, the dynamic memory subsystem ...
Comments