ABSTRACT
Many domain-specific MPSoCs are heterogeneous and tiled by nature. For evaluating important architectural decisions such as tile structure and core selection within each tile for future 100--1000 core designs, fast and flexible simulation approaches are mandatory. Thus, cycle-accurate simulation techniques or co-simulation approaches using simulator coupling are improper. In this paper, we evaluate heterogeneous tiled MPSoCs using a timing-approximate simulation approach. This simulation approach takes particularly into account applications with highly dynamic thread and workload distributions and resource-aware program behavior. Here, the application itself may decide which set of resources is claimed in dependence on run-time status information of the resources (e. g., temperature, load). In order to verify performance goals of the heterogeneous MPSoC apart from functional correctness, we propose a timing-approximate simulation approach, which is based on a discrete-event host-compiled simulation and a time-warping mechanism to scale the elapsed execution times on the simulation host to the simulated target. It allows the investigation of phases of thread (re-)distribution and resource-awareness with an appropriate accuracy. For selected case studies, it is shown how architectural parameters may be varied very fast enabling the exploration of different designs for cost, performance, and other design objectives.
- J. Teich, J. Henkel, A. Herkersdorf, D. Schmitt-Landsiedel, W. Schröder-Preikschat, and G. Snelting. "Multiprocessor System-on-Chip: Hardware Design and Tool Integration." In: ed. by M. Hübner and J. Becker. Springer, 2011. Chap. 11, Invasive Computing: An Overview, pp. 241--268.Google Scholar
- F. Hannig, S. Roloff, G. Snelting, J. Teich, and A. Zwinkau. "Resource-Aware Programming and Simulation of MPSoC Architectures through Extension of X10." In: Proceedings of the 14th International Workshop on Software and Compilers for Embedded Systems (SCOPES). St. Goar, Germany, June 27--28, 2011, pp. 48--55. Google ScholarDigital Library
- J. Teich. "Invasive Algorithms and Architectures." In: it - Information Technology 50.5 (2008), pp. 300--310.Google ScholarCross Ref
- P. Charles, C. Grothoff, V. Saraswat, C. Donawa, A. Kielsstra, K. Ebcioglu, C. von Praun, and V. Sarkar. "X10: An Object-Oriented Approach to Non-Uniform Cluster Computing." In: Proceedings of the 20th annual ACM SIGPLAN Conference on Object-Oriented Programming, Systems, Languages, and Applications. 2005, pp. 519--538. Google ScholarDigital Library
- D. Kissler, F. Hannig, A. Kupriyanov, and J. Teich. "A Highly Parameterizable Parallel Processor Array Architecture." In: Proceedings of the IEEE International Conference on Field Programmable Technology (FPT). Bangkok, Thailand: IEEE, Dec. 13--15, 2006, pp. 105--112.Google Scholar
- V. Lari, F. Hannig, and J. Teich. "Distributed Resource Reservation in Massively Parallel Processor Arrays." In: Proceedings of the International Parallel and Distributed Processing Symposium Workshops (IPDPSW). Anchorage, AK, USA: IEEE Computer Society, May 16--17, 2011, pp. 313--316. Google ScholarDigital Library
- S. Eranian. "Perfmon2: A Flexible Performance Monitoring Interface for Linux." In: Proceedings of the Ottawa Linux Symposium (OLS). Ottawa, Canada, July 19--22, 2006, pp. 269--288.Google Scholar
- A. Kupriyanov, F. Hannig, and J. Teich. "High-Speed Event-Driven RTL Compiled Simulation." In: Computer Systems: Architectures, Modeling, and Simulation, 4th International Samos Workshop (SAMOS), Proceedings. Ed. by A. Pimentel and S. Vassiliadis. Vol. 3133. Lecture Notes in Computer Science (LNCS). Island of Samos, Greece: Springer, July 19--21, 2004, pp. 519--529.Google Scholar
- T. Austin, E. Larson, and D. Ernst. "SimpleScalar: An Infrastructure for Computer System Modeling." In: Computer 35.2 (Feb. 2002), pp. 59--67. Google ScholarDigital Library
- M. Dales. SWARM 0.44 Documentation. Tech. rep. Department of Computer Science, University of Glasgow, Nov. 10, 2000. URL: http://www.cl.cam.ac.uk/~mwd24/phd/swarm.html.Google Scholar
- M. Yourst. "PTLsim: A Cycle Accurate Full System x86-64 Microarchitectural Simulator." In: Proceedings of the IEEE International Symposium on Performance Analysis of Systems & Software (ISPASS). San Jose, CA, USA, Apr. 25--27, 2007, pp. 23--34.Google Scholar
- L. Benini, D. Bertozzi, A. Bogliolo, F. Menichelli, and M. Olivieri. "MPARM: Exploring the Multi-Processor SoC Design Space with SystemC." In: Journal of VLSI Signal Processing Systems 41.2 (Sept. 2005), pp. 169--182. Google ScholarDigital Library
- F. Bellard. "QEMU, a Fast and Portable Dynamic Translator." In: Proceedings of the Conference on USENIX Annual Technical Conference (ATEC). Anaheim, CA, USA, Apr. 10--15, 2005, pp. 41--46. Google ScholarDigital Library
- S. Farfeleder, A. Krall, and N. Horspool. "Ultra Fast Cycle-accurate Compiled Emulation of Inorder Pipelined Architectures." In: Journal of Systems Architecture 53.8 (Aug. 2007), pp. 501--510. Google ScholarDigital Library
- R. Wunderlich, T. Wenisch, B. Falsafi, and J. Hoe. "Statistical Sampling of Microarchitecture Simulation." In: ACM Transactions on Modeling and Computer Simulation (TOMACS) 16.3 (July 2006), pp. 197--224. Google ScholarDigital Library
- A. Falcon, P. Faraboschi, and D. Ortega. "Combining Simulation and Virtualization through Dynamic Sampling." In: Proceedings of the IEEE International Symposium on Performance Analysis of Systems & Software (ISPASS). San Jose, CA, USA, Apr. 25--27, 2007, pp. 72--83.Google Scholar
- T. Wild, A. Herkersdorf, and G. Lee. "TAPES---Trace-based Architecture Performance Evaluation with SystemC." In: Design Automation for Embedded Systems 10.2--3 (Sept. 2005), pp. 157--179.Google ScholarDigital Library
- A. Pimentel, M. Thompson, S. Polstra, and C. Erbas. "Calibration of Abstract Performance Models for System-Level Design Space Exploration." In: Journal of Signal Processing Systems 50.2 (Feb. 2008), pp. 99--114. Google ScholarDigital Library
- T. Givargis, F. Vahid, and J. Henkel. "Trace-Driven System-Level Power Evaluation of System-on-a-Chip Peripheral Cores." In: Proceedings of the Asia and South Pacific Design Automation Conference (ASP-DAC). Yokohama, Japan, Jan. 30-Feb. 2, 2001, pp. 306--312. Google ScholarDigital Library
- E. Argollo, A. Falcón, P. Faraboschi, M. Monchiero, and D. Ortega. "COTSon: Infrastructure for Full System Simulation." In: ACM SIGOPS Operating Systems Review 43.1 (Jan. 2009), pp. 52--61. Google ScholarDigital Library
- M. Monchiero, J. Ahn, A. Falcón, D. Ortega, and P. Faraboschi. "How to Simulate 1000 Cores." In: ACM SIGARCH Computer Architecture News 37.2 (May 2009), pp. 10--19. Google ScholarDigital Library
- M. Peter S., M. Christensson, J. Eskilson, D. Forsgren, G. Hållberg, J. Högberg, F. Larsson, A. Moestedt, and B. Werner. "Simics: A Full System Simulation Platform." In: Computer 35.2 (2002), pp. 50--58. Google ScholarDigital Library
- Synopsys Inc. URL: http://www.synopsys.com.Google Scholar
- CoFluent Design. URL: http://www.cofluentdesign.com.Google Scholar
- Magillem Design Services. URL: http://www.magillem.com.Google Scholar
- Open Virtual Platforms, Imperas Software Ltd. URL: http://www.ovpworld.org.Google Scholar
- Y. Hwang, S. Abdi, and D. Gajski. "Cycle-Approximate Retargetable Performance Estimation at the Transaction Level." In: Proceedings of the Conference on Design, Automation and Test in Europe (DATE). Munich, Germany, Mar. 10--14, 2008, pp. 3--8. Google ScholarDigital Library
- J. Schnerr, O. Bringmann, A. Viehl, and W. Rosenstiel. "High-Performance Timing Simulation of Embedded Software." In: Proceedings of the 45th Annual Design Automation Conference (DAC). Anaheim, CA, USA, June 8--13, 2008, pp. 290--295. Google ScholarDigital Library
- Z. Wang and A. Herkersdorf. "An Efficient Approach for System-Level Timing Simulation of Compiler-Optimized Embedded Software." In: Proceedings of the 46th Annual Design Automation Conference (DAC). San Francisco, CA, USA, July 26--31, 2009, pp. 220--225. Google ScholarDigital Library
- Z. Wang, K. Lu, and A. Herkersdorf. "An Approach to Improve Accuracy of Source-Level TLMs of Embedded Software." In: Proceedings of the Conference on Design, Automation and Test in Europe (DATE). Grenoble, France, Mar. 14--18, 2011, pp. 216--221.Google Scholar
- S. Stattelmann, O. Bringmann, and W. Rosenstiel. "Fast and Accurate Resource Conflict Simulation for Performance Analysis of Multi-Core Systems." In: Proceedings of the Conference on Design, Automation and Test in Europe (DATE). Grenoble, France, Mar. 14--18, 2011, pp. 210--215.Google Scholar
- A. Gerstlauer. "Host-Compiled Simulation of Multi-Core Platforms." In: Proceedings of the 21st IEEE International Symposium on Rapid System Prototyping (RSP). Fairfax, VA, USA, June 8--11, 2010, pp. 1--6.Google Scholar
- A. Kupriyanov, F. Hannig, D. Kissler, J. Teich, J. Lallet, O. Sentieys, and S. Pillement. "Modeling of Interconnection Networks in Massively Parallel Processor Architectures." In: Proceedings of the 20th International Conference on Architecture of Computing Systems (ARCS). Ed. by P. Lukowicz, L. Thiele, and G. Tröster. Vol. 4415. Lecture Notes in Computer Science (LNCS). Zurich, Switzerland: Springer, Mar. 12--15, 2007, pp. 268--282. Google ScholarDigital Library
- A. Weichslgartner, S. Wildermann, and J. Teich. "Dynamic Decentralized Mapping of Tree-Structured Applications on NoC Architectures." In: Proceedings of the Fifth ACM/IEEE International Symposium on Networks-on-Chip (NOCS). Pittsburgh, PA, USA, May 1--4, 2011, pp. 201--208. Google ScholarDigital Library
Index Terms
- Fast architecture evaluation of heterogeneous MPSoCs by host-compiled simulation
Recommendations
Fast cache simulation for host-compiled simulation of embedded software
DATE '13: Proceedings of the Conference on Design, Automation and Test in EuropeHost-compiled simulation has been proposed for software performance estimation, because of its high simulation speed. However, the simulation speed may be significantly lowered due to the cache simulation overhead. In this paper, we propose an approach ...
Host-Compiled Multicore System Simulation for Early Real-Time Performance Evaluation
Special Issue on Risk and Trust in Embedded Critical Systems, Special Issue on Real-Time, Embedded and Cyber-Physical Systems, Special Issue on Virtual Prototyping of Parallel and Embedded Systems (ViPES)With increasing complexity and software content, modern embedded platforms employ a heterogeneous mix of multicore processors along with hardware accelerators in order to provide high performance in limited power budgets. To evaluate real-time ...
Fast parallel simulation of a manycore architecture with a flit-level on-chip network model
SAMOS '18: Proceedings of the 18th International Conference on Embedded Computer Systems: Architectures, Modeling, and SimulationThe reliance on simulation for the design of a new architecture continues to increase as the complexity of the architecture increases with more cores integrated and a complex communication fabric. A good compromise between the simulation speed and ...
Comments