research-article

FPGA Accelerated NoC-Simulation: A Case Study on the Intel Xeon Phi Ringbus Topology

Authors:
Oliver Jakob Arndt

Leibniz Universität Hannover, Institute of Microelectronic Systems, Hanover, Germany

Leibniz Universität Hannover, Institute of Microelectronic Systems, Hanover, Germany
View Profile

,
Christian Spindeldreier

Leibniz Universität Hannover, Institute of Microelectronic Systems, Hanover, Germany

Leibniz Universität Hannover, Institute of Microelectronic Systems, Hanover, Germany
View Profile

,
Kevin Wohnrade

Leibniz Universität Hannover, Institute of Microelectronic Systems, Hanover, Germany

Leibniz Universität Hannover, Institute of Microelectronic Systems, Hanover, Germany
View Profile

,
Daniel Pfefferkorn

Leibniz Universität Hannover, Institute of Microelectronic Systems, Hanover, Germany

Leibniz Universität Hannover, Institute of Microelectronic Systems, Hanover, Germany
View Profile

,
Martin Neuenhahn

Leibniz Universität Hannover, Institute of Microelectronic Systems, Hanover, Germany

Leibniz Universität Hannover, Institute of Microelectronic Systems, Hanover, Germany
View Profile

,
Holger Blume

Leibniz Universität Hannover, Institute of Microelectronic Systems, Hanover, Germany

Leibniz Universität Hannover, Institute of Microelectronic Systems, Hanover, Germany
View Profile

HEART '17: Proceedings of the 8th International Symposium on Highly Efficient Accelerators and Reconfigurable TechnologiesJune 2017Article No.: 21Pages 1–6https://doi.org/10.1145/3120895.3120916

Published:07 June 2017Publication History

HEART '17: Proceedings of the 8th International Symposium on Highly Efficient Accelerators and Reconfigurable Technologies

Pages 1–6

ABSTRACT

Complex signal processing algorithms targeted on architectures with increasingly high numbers of parallel processing units require high performance core-interconnections (i.e., low latencies, high throughput, no pinch-offs or bottlenecks). Therefore, assisting techniques, exploring characteristics of diverse topologies of common as well as innovative Network-on-Chips (NoCs), are necessary for the development of chips with massive parallel processing cores. In contrast to analytic NoC models, event driven NoC simulations can handle even complex task graphs, but however feature long simulation times. Enabling the simulation of even complex task graphs, in this work, we propose to use FPGA accelerated simulation. While we extend such a simulator in order to imitate cache coherence communication-behavior, we also present a translation of real measured profiles to task graphs for in-depth simulation of the communication behavior of an existing NoC-based manycore. Therefore, this approach is able to not only deal with synthetic scenarios, but analyse the communication behavior of real world applications. Additionally, a simulation of the Histograms of Oriented Gradients algorithm, running on the Intel Xeon Phi manycore, exhibiting a 70-stop ring-bus, exemplifies this approach.

References

D. Molka, D. Hackenberg, R. Schöne, and W. E. Nagel. Cache Coherence Protocol and Memory Performance of the Intel Haswell-EP Architecture. In Intl. Conf. Parallel Processing, pages 739--748. IEEE, 2015. Google ScholarDigital Library
GR740: The ESA Next Generation Microprocessor (NGMP). http://microelectronics.esa.int/ngmp, 2017.Google Scholar
W. J. Dally and B. Towles. Route packets, not wires: on-chip interconnection networks. In Design Automation Conf., pages 684--689, 2001. Google ScholarCross Ref
A. Abbas, M. Ali, A. Fayyaz, A. Ghosh, A. Kalra, S. U. Khan, M. Usman S. Khan, T. De Menezes, S. Pattanayak, A. Sanyal, and S. Usman. A survey on energy-efficient methodologies and architectures of network-on-chip. Computers and Electrical Engineering, pages 333--347, 2014. Google ScholarDigital Library
N. Dalal and B. Triggs. Histograms of Oriented Gradients for Human Detection. In Intl. Conf. Computer Vision and Pattern Recognition (CVPR), volume 1, pages 886--893. IEEE, 2005. Google ScholarDigital Library
R. Membarth, F. Hannig, J. Teich, M. Körner, and W. Eckert. Comparison of Parallelization Frameworks for Shared Memory Multi-Core Architectures. In Proc. Embedded World Conference, Nuremberg, Germany. IEEE, 2010.Google Scholar
M. C. Neuenhahn, J. Schleifer, H. Blume, and T. G. Noll. Quantitative comparison of performance analysis techniques for modular and generic network-on-chip. Adv. Radio Science, 7(C. 4):107--112, 2009.Google Scholar
N. Genko, D. Atienza, G. De Micheli, J. M. Mendias, R. Hermida, and F. Catthoor. A complete network-on-chip emulation framework. In Design, Automation and Test in Europe, pages 246--251 Vol. 1, March 2005. Google ScholarDigital Library
M. Eggenberger and M. Radetzki. Scalable parallel simulation of networks on chip. In 2013 Seventh IEEE/ACM International Symposium on Networks-on-Chip (NoCS), pages 1--8, April 2013. Google ScholarCross Ref
A. Y. Weldezion, M. Grange, A. Jantsch, H. Tenhunen, and D. Pamunuwa. Zero-load predictive model for performance analysis in deflection routing NoCs. Microprocessors and Microsystems, 39(8):634--647, 2015. Google ScholarDigital Library
E. Fischer, A. Fehske, and G. P. Fettweis. A Flexible Analytic Model for the Design Space Exploration of Many-Core Network-on-Chips Based on Queueing Theory. In Intl. Conf. Advances in System Simulation, ser. SIMUL, 2012.Google Scholar
D. Pfefferkorn, A. Schmider, G. Payá-Vayá, M. Neuenhahn, and H. Blume. FNO-CEE: A Framework for NoC Evaluation by FPGA-based Emulation. In Intl. Conf. Embedded Computer Systems (SAMOS), pages 86--95, 2015.Google Scholar
S. Chai, Y. Li, J. Wang, and C. Wu. A List Simulated Annealing Algorithm for Task Scheduling on Network-on-Chip. JCP, 9(1):176--182, 2014. Google ScholarCross Ref
E. Salminen, T. Kangas, J. Riihimaki, and T. D. Hamalainen. Requirements for Network-on-Nhip Benchmarking. In NORCHIP, pages 82--85, 2005.Google Scholar
J. Xu, W. Wolf, J. Henkel, and S. Chakradhar. A Methodology for Design, Modeling, and Analysis of Networks-on-Chip. In Intl. Symp. Circuits and Systems, pages 1778--1781 Vol. 2. IEEE, 2005.Google Scholar
O.J. Arndt, D. Becker, F. Giesemann, G. Payá-Vayá,C. Bartels, and H. Blume. Performance Evaluation of the Intel Xeon Phi Manycore Architecture Using Parallel Video-Based Driver Assistance Algorithms. In Intl. Conf. Embedded Computer Systems (SAMOS XIV), pages 125--132. IEEE, 2014. Google ScholarCross Ref
O. J. Arndt, T. Lefherz, and H. Blume. Abstracting Parallel Programming and Its Analysis Towards Framework Independent Development. In Intl. Symp. Embedded Multicore/Many-core Systems-on-Chip (MCSoC), pages 96--103. IEEE, 2015.Google Scholar
Intel Press Kit - Intel Xeon Phi Coprocessor 5110P/3000 Series. https://newsroom.intel.com/press-kits/intel-xeon-phi-coprocessor-5110p3000-series, 2012.Google Scholar

Recommendations

Nuclear Reactor Simulation on OpenCL FPGA: a Case Study of RSBench
IWOCL '18: Proceedings of the International Workshop on OpenCL

Field-programmable gate arrays (FPGAs) are becoming a promising choice as a heterogeneous computing component for scientific computing when floating-point optimized architectures are added to the current FPGAs. The emerging high-level synthesis tools ...
Read More
GPU–FPGA-accelerated Radiative Transfer Simulation with Inter-FPGA Communication
HPCAsia '23: Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region

The complementary use of graphics processing units (GPUs) and field programmable gate arrays (FPGAs) is a major topic of interest in the high-performance computing (HPC) field. GPU–FPGA-accelerated computing is an effective tool for multiphysics ...
Read More
GPU accelerated biochemical network simulation

Motivation: Mathematical modelling is central to systems and synthetic biology. Using simulations to calculate statistics or to explore parameter space is a common means for analysing these models and can be computationally intensive. However, in ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

HEART '17: Proceedings of the 8th International Symposium on Highly Efficient Accelerators and Reconfigurable Technologies
June 2017
172 pages
ISBN:9781450353168
DOI:10.1145/3120895

Copyright © 2017 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 7 June 2017
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- research-article
- Research
- Refereed limited
Conference

Acceptance Rates
Overall Acceptance Rate22of50submissions,44%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 60
  Total Downloads
- Downloads (Last 12 months)3
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

FPGA Accelerated NoC-Simulation: A Case Study on the Intel Xeon Phi Ringbus Topology

HEART '17: Proceedings of the 8th International Symposium on Highly Efficient Accelerators and Reconfigurable Technologies

ABSTRACT

References

Cited By

Recommendations

Nuclear Reactor Simulation on OpenCL FPGA: a Case Study of RSBench

GPU–FPGA-accelerated Radiative Transfer Simulation with Inter-FPGA Communication

GPU accelerated biochemical network simulation

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

FPGA Accelerated NoC-Simulation: A Case Study on the Intel Xeon Phi Ringbus Topology

HEART '17: Proceedings of the 8th International Symposium on Highly Efficient Accelerators and Reconfigurable Technologies

ABSTRACT

References

Cited By

Recommendations

Nuclear Reactor Simulation on OpenCL FPGA: a Case Study of RSBench

GPU–FPGA-accelerated Radiative Transfer Simulation with Inter-FPGA Communication

GPU accelerated biochemical network simulation

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media