ABSTRACT
Complex signal processing algorithms targeted on architectures with increasingly high numbers of parallel processing units require high performance core-interconnections (i.e., low latencies, high throughput, no pinch-offs or bottlenecks). Therefore, assisting techniques, exploring characteristics of diverse topologies of common as well as innovative Network-on-Chips (NoCs), are necessary for the development of chips with massive parallel processing cores. In contrast to analytic NoC models, event driven NoC simulations can handle even complex task graphs, but however feature long simulation times. Enabling the simulation of even complex task graphs, in this work, we propose to use FPGA accelerated simulation. While we extend such a simulator in order to imitate cache coherence communication-behavior, we also present a translation of real measured profiles to task graphs for in-depth simulation of the communication behavior of an existing NoC-based manycore. Therefore, this approach is able to not only deal with synthetic scenarios, but analyse the communication behavior of real world applications. Additionally, a simulation of the Histograms of Oriented Gradients algorithm, running on the Intel Xeon Phi manycore, exhibiting a 70-stop ring-bus, exemplifies this approach.
- D. Molka, D. Hackenberg, R. Schöne, and W. E. Nagel. Cache Coherence Protocol and Memory Performance of the Intel Haswell-EP Architecture. In Intl. Conf. Parallel Processing, pages 739--748. IEEE, 2015. Google ScholarDigital Library
- GR740: The ESA Next Generation Microprocessor (NGMP). http://microelectronics.esa.int/ngmp, 2017.Google Scholar
- W. J. Dally and B. Towles. Route packets, not wires: on-chip interconnection networks. In Design Automation Conf., pages 684--689, 2001. Google ScholarCross Ref
- A. Abbas, M. Ali, A. Fayyaz, A. Ghosh, A. Kalra, S. U. Khan, M. Usman S. Khan, T. De Menezes, S. Pattanayak, A. Sanyal, and S. Usman. A survey on energy-efficient methodologies and architectures of network-on-chip. Computers and Electrical Engineering, pages 333--347, 2014. Google ScholarDigital Library
- N. Dalal and B. Triggs. Histograms of Oriented Gradients for Human Detection. In Intl. Conf. Computer Vision and Pattern Recognition (CVPR), volume 1, pages 886--893. IEEE, 2005. Google ScholarDigital Library
- R. Membarth, F. Hannig, J. Teich, M. Körner, and W. Eckert. Comparison of Parallelization Frameworks for Shared Memory Multi-Core Architectures. In Proc. Embedded World Conference, Nuremberg, Germany. IEEE, 2010.Google Scholar
- M. C. Neuenhahn, J. Schleifer, H. Blume, and T. G. Noll. Quantitative comparison of performance analysis techniques for modular and generic network-on-chip. Adv. Radio Science, 7(C. 4):107--112, 2009.Google Scholar
- N. Genko, D. Atienza, G. De Micheli, J. M. Mendias, R. Hermida, and F. Catthoor. A complete network-on-chip emulation framework. In Design, Automation and Test in Europe, pages 246--251 Vol. 1, March 2005. Google ScholarDigital Library
- M. Eggenberger and M. Radetzki. Scalable parallel simulation of networks on chip. In 2013 Seventh IEEE/ACM International Symposium on Networks-on-Chip (NoCS), pages 1--8, April 2013. Google ScholarCross Ref
- A. Y. Weldezion, M. Grange, A. Jantsch, H. Tenhunen, and D. Pamunuwa. Zero-load predictive model for performance analysis in deflection routing NoCs. Microprocessors and Microsystems, 39(8):634--647, 2015. Google ScholarDigital Library
- E. Fischer, A. Fehske, and G. P. Fettweis. A Flexible Analytic Model for the Design Space Exploration of Many-Core Network-on-Chips Based on Queueing Theory. In Intl. Conf. Advances in System Simulation, ser. SIMUL, 2012.Google Scholar
- D. Pfefferkorn, A. Schmider, G. Payá-Vayá, M. Neuenhahn, and H. Blume. FNO-CEE: A Framework for NoC Evaluation by FPGA-based Emulation. In Intl. Conf. Embedded Computer Systems (SAMOS), pages 86--95, 2015.Google Scholar
- S. Chai, Y. Li, J. Wang, and C. Wu. A List Simulated Annealing Algorithm for Task Scheduling on Network-on-Chip. JCP, 9(1):176--182, 2014. Google ScholarCross Ref
- E. Salminen, T. Kangas, J. Riihimaki, and T. D. Hamalainen. Requirements for Network-on-Nhip Benchmarking. In NORCHIP, pages 82--85, 2005.Google Scholar
- J. Xu, W. Wolf, J. Henkel, and S. Chakradhar. A Methodology for Design, Modeling, and Analysis of Networks-on-Chip. In Intl. Symp. Circuits and Systems, pages 1778--1781 Vol. 2. IEEE, 2005.Google Scholar
- O.J. Arndt, D. Becker, F. Giesemann, G. Payá-Vayá,C. Bartels, and H. Blume. Performance Evaluation of the Intel Xeon Phi Manycore Architecture Using Parallel Video-Based Driver Assistance Algorithms. In Intl. Conf. Embedded Computer Systems (SAMOS XIV), pages 125--132. IEEE, 2014. Google ScholarCross Ref
- O. J. Arndt, T. Lefherz, and H. Blume. Abstracting Parallel Programming and Its Analysis Towards Framework Independent Development. In Intl. Symp. Embedded Multicore/Many-core Systems-on-Chip (MCSoC), pages 96--103. IEEE, 2015.Google Scholar
- Intel Press Kit - Intel Xeon Phi Coprocessor 5110P/3000 Series. https://newsroom.intel.com/press-kits/intel-xeon-phi-coprocessor-5110p3000-series, 2012.Google Scholar
Recommendations
Nuclear Reactor Simulation on OpenCL FPGA: a Case Study of RSBench
IWOCL '18: Proceedings of the International Workshop on OpenCLField-programmable gate arrays (FPGAs) are becoming a promising choice as a heterogeneous computing component for scientific computing when floating-point optimized architectures are added to the current FPGAs. The emerging high-level synthesis tools ...
GPU–FPGA-accelerated Radiative Transfer Simulation with Inter-FPGA Communication
HPCAsia '23: Proceedings of the International Conference on High Performance Computing in Asia-Pacific RegionThe complementary use of graphics processing units (GPUs) and field programmable gate arrays (FPGAs) is a major topic of interest in the high-performance computing (HPC) field. GPU–FPGA-accelerated computing is an effective tool for multiphysics ...
GPU accelerated biochemical network simulation
Motivation: Mathematical modelling is central to systems and synthetic biology. Using simulations to calculate statistics or to explore parameter space is a common means for analysing these models and can be computationally intensive. However, in ...
Comments