ABSTRACT
The complexity of the design-space exploration of large-scale NoCs is exacerbated not only by the ever-increasing number of cores, but also by the increased runtime uncertainties in both the scale and task structure of the emerging applications. Consequently, it is crucial to develop rigorous mathematical frameworks for capturing the task dependencies of varied applications to foster the generation of realistic benchmarks that can guide the NoC design. However, the current NoC benchmark suites either lack portability and poorly scale as they require intensive development efforts on specific architectures and simulation time, or are synthesized based on purely stochastic models that are disconnected with real applications, which may easily lead to biased and/or delayed design choices. To overcome these drawbacks, we propose a benchmark synthesis framework that i) not only allows extraction of dynamical task dependencies of the application and synthesize traffic workloads spatio-temporally consistent with realistic traffic behavior, ii) but can also be easily scaled by the proposed complex-network inspired algorithm for large benchmark generation while preserving key structural features that governs application communication behaviors. We validate the proposed framework on a large-scale simulation environment by running a set of real applications. Experimental results show that the synthesized benchmarks respect the traffic patterns of the original applications and preserve key features of application task structures.
- V. Advea and R. Sakellariou. Compiler synthesis of task graphs for parallel program performance prediction. In Languages and Compilers for Parallel Computing, pages 208--226. Springer, 2001. Google ScholarDigital Library
- K. Agrawal, C. E. Leiserson, and J. Sukha. Executing task graphs using work-stealing. In Parallel & Distributed Processing (IPDPS), 2010 IEEE International Symposium on, pages 1--12. IEEE, 2010.Google ScholarCross Ref
- A. R. Alameldeen and D. A. Wood. Variability in architectural simulations of multi-threaded workloads. In High-Performance Computer Architecture, 2003. HPCA-9 2003. Proceedings. The Ninth International Symposium on, pages 7--18. IEEE, 2003. Google ScholarDigital Library
- N. Barrow-Williams, C. Fensch, and S. Moore. A communication characterisation of splash-2 and parsec. In Workload Characterization, 2009. IISWC 2009. IEEE Int'l Symp. on, pages 86--97. IEEE, 2009. Google ScholarDigital Library
- C. Bienia, S. Kumar, and K. Li. Parsec vs. splash-2: A quantitative comparison of two multithreaded benchmark suites on chip-multiprocessors. In Workload Characterization, 2008. IISWC 2008. IEEE International Symposium on, pages 47--56. IEEE, 2008.Google ScholarCross Ref
- C. Bienia, S. Kumar, J. P. Singh, and K. Li. The parsec benchmark suite: Characterization and architectural implications. Technical Report TR-811-08, Princeton University, January 2008.Google ScholarDigital Library
- J. Cong and B. Yuan. Energy-efficient scheduling on heterogeneous multi-core architectures. In Proceedings of the 2012 ACM/IEEE international symposium on Low power electronics and design, pages 345--350. Google ScholarDigital Library
- W. J. Dally and B. P. Towles. Principles and practices of interconnection networks. Elsevier, 2004. Google ScholarDigital Library
- R. P. Dick, D. L. Rhodes, and W. Wolf. Tgff: task graphs for free. In Proceedings of the 6th international workshop on Hardware/software codesign, pages 97--101. IEEE Computer Society, 1998. Google ScholarDigital Library
- K. Ganeshpure and S. Kundu. On run time task graph extraction of soc. In SoC Design Conference (ISOCC), 2010 International, pages 380--383. IEEE, 2010.Google ScholarCross Ref
- C. Grecu and et. al. Towards open network-on-chip benchmarks. In Networks-on-Chip, 2007. NOCS 2007. First Int'l Symp. on, pages 205--205. IEEE. Google ScholarDigital Library
- J. Hestness, B. Grot, and S. W. Keckler. Netrace: dependency-driven trace-based network-on-chip simulation. In Proc. of the Third Int'l Workshop on Network on Chip Architectures. ACM, 2010. Google ScholarDigital Library
- T. Kempf, K. Karuri, S. Wallentowitz, G. Ascheid, R. Leupers, and H. Meyr. A sw performance estimation framework for early system-level-design using fine-grained instrumentation. In Design, Automation and Test in Europe, 2006. DATE'06. Proceedings, volume 1, pages 6--pp. IEEE, 2006. Google ScholarDigital Library
- V. Kumar, A. Grama, A. Gupta, and G. Karypis. Introduction to parallel computing: design and analysis of algorithms. Addison Wesley, 2003.Google Scholar
- Y.-K. Kwok and I. Ahmad. Benchmarking and comparison of the task graph scheduling algorithms. Journal of Parallel and Distributed Computing, 59(3):381--422, 1999. Google ScholarDigital Library
- C. Lattner and V. Adve. Llvm: A compilation framework for lifelong program analysis & transformation. In Code Generation and Optimization, CGO 2004. Int'l Symp. on, pages 75--86. IEEE, 2004. Google ScholarDigital Library
- W. Liu and et. al. A noc traffic suite based on real applications. In VLSI (ISVLSI), IEEE Computer Society Annual Symposium on. IEEE, 2011. Google ScholarDigital Library
- R. Namballa, N. Ranganathan, and A. Ejnioui. Control and data flow graph extraction for high-level synthesis. In VLSI, 2004. Proc.. IEEE Computer society Annual Symp. on, pages 187--192. IEEE, 2004.Google ScholarCross Ref
- E. Pekkarinen, L. Lehtonen, E. Salminen, and T. D. Hämäläinen. A set of traffic models for network-on-chip benchmarking. In System on Chip (SoC), 2011 Int'l Symp. on. IEEE, 2011.Google ScholarCross Ref
- K. Pruhs, J. Sgall, and E. Torng. Online scheduling. pages 115--124. CRC Press, 2003.Google Scholar
- B. P. Railing, E. R. Hein, and T. M. Conte. Contech: Efficiently generating dynamic task graphs for arbitrary parallel programs. ACM Trans. on Architecture and Code Optimization (TACO), 12(2):25, 2015. Google ScholarDigital Library
- D. Ron, I. Safro, and A. Brandt. Relaxation-based coarsening and multiscale graph organization. Multiscale Modeling & Simulation, 9(1):407--423, 2011.Google ScholarCross Ref
- E. Salminen and et. al. Requirements for network-on-chip benchmarking. In NORCHIP Conference, 2005. 23rd. IEEE, 2005.Google ScholarCross Ref
- E. Salminen, C. Grecu, T. D. Hämäläinen, and A. Ivanov. Network-on-chip benchmarks specifications part i: application modeling and hardware description.Google Scholar
- V. Soteriou, H. Wang, and L.-S. Peh. A statistical traffic model for on-chip interconnection networks. In Modeling, Analysis, and Simulation of Computer and Telecommunication Systems, 2006. MASCOTS 2006. 14th IEEE Int'l Symp. on. IEEE, 2006. Google ScholarDigital Library
- K. S. Vallerio and N. K. Jha. Task graph extraction for embedded system synthesis. In VLSI Design, Proc. 16th Int'l Conf. on, pages 480--486. IEEE, 2003. Google ScholarDigital Library
- Z. Wang and et. al. A systematic network-on-chip traffic modeling and generation methodology. In Circuits and Systems (APCCAS), 2014 IEEE Asia Pacific Conference on, pages 675--678. IEEE, 2014.Google ScholarCross Ref
- S. C. Woo, M. Ohara, E. Torrie, J. P. Singh, and A. Gupta. The splash-2 programs: Characterization and methodological considerations. In ACM SIGARCH computer architecture news, volume 23, pages 24--36. ACM, 1995. Google ScholarDigital Library
Recommendations
Scalable benchmark synthesis for performance evaluation of NoC core mapping
AbstractNetwork On Chip is determined as an On-chip packet-switched communication in the study of multi-processors. As the number of components is proliferating, there were many issues regarding the performance of a network. By taking into consideration ...
Performance evaluation of modified mesh-based NoC architecture
AbstractWith the advancement of technology in the field of VLSI, it is possible to integrate several computing elements onto a single chip. The performance of these single bus-based models still suffers from scalability on large-scale ...
Graphical abstractDisplay Omitted
Highlights- In this paper, an optimized core mapping algorithm is proposed, and a modified 2-D mesh NoC architecture is introduced.
P-NoC: Performance Evaluation and Design Space Exploration of NoCs for Chip Multiprocessor Architecture Using FPGA
AbstractThe network-on-chip (NoC) has emerged as an efficient and scalable communication fabric for chip multiprocessors (CMPs) and multiprocessor system on chips (MPSoCs). The NoC architecture, the routers micro-architecture and links influence the ...
Comments