ABSTRACT
Routing of nets is one of the most time-consuming steps in the FPGA design flow. Existing works have described ways of accelerating the process through parallelization. However, only some of them are deterministic, and determinism is often achieved at the cost of speedup. In this paper, we propose ParaDRo, a parallel FPGA router based on spatial partitioning that achieves deterministic results while maintaining reasonable speedup. Existing spatial partitioning based routers do not scale well because the number of nets that can fully utilize all processors reduces as the number of processors increases. In addition, they route nets that are within a spatial partition sequentially. ParaDRo mitigates this problem by scheduling nets within a spatial partition to be routed in parallel if they do not have overlapping bounding boxes. Further parallelism is extracted by decomposing multi-sink nets into single-sink nets to minimize the amount of bounding box overlaps and increase the number of nets that can be routed in parallel. These improvements enable ParaDRo to achieve an average speedup of 5.4X with 8 threads with minimal impact on the quality of results.
- Luc'ıdio AF Cabral, Júlio S Aude, and Nelson Maculan . 2002. TDR: A distributed-memory parallel routing algorithm for FPGAs. Field-Programmable Logic and Applications: Reconfigurable Computing Is Going Mainstream. Springer, 263--270. Google ScholarDigital Library
- Marcel Gort and Jason H Anderson . 2010. Deterministic multi-core parallel routing for FPGAs FPT. IEEE, 78--86.Google Scholar
- Marcel Gort and Jason H Anderson . 2012. Accelerating FPGA routing through parallelization and engineering enhancements, special section on PAR-CAD 2010. IEEE TCAD, Vol. 31, 1 (2012), 61--74. Google ScholarDigital Library
- Adrian Ludwin and Vaughn Betz . 2011. Efficient and deterministic parallel placement for FPGAs. TODAES, Vol. 16, 3 (2011), 22. Google ScholarDigital Library
- Jason Luu, Jeffrey Goeders, Michael Wainberg, Andrew Somerville, Thien Yu, Konstantin Nasartschuk, Miad Nasr, Sen Wang, Tim Liu, Nooruddin Ahmed, et almbox. . 2014. VTR 7.0: Next generation architecture and CAD system for FPGAs. TRETS, Vol. 7, 2 (2014), 6. Google ScholarDigital Library
- L. McMurchie and C. Ebeling . 1995. PathFinder: a negotiation-based performance-driven router for FPGAs FPGA. Google ScholarDigital Library
- Kevin E Murray, Scott Whitty, Suya Liu, Jason Luu, and Vaughn Betz . 2013. Titan: Enabling large and complex benchmarks in academic CAD FPL. IEEE, 1--8.Google Scholar
- Kevin E Murray, Scott Whitty, Suya Liu, Jason Luu, and Vaughn Betz . 2015. Timing-driven Titan: Enabling large benchmarks and exploring the gap between academic and commercial CAD. TRETS, Vol. 8, 2 (2015), 10. Google ScholarDigital Library
- Keshav Pingali, Donald Nguyen, Milind Kulkarni, Martin Burtscher, M Amber Hassaan, Rashid Kaleem, Tsung-Hsien Lee, Andrew Lenharth, Roman Manevich, Mario Méndez-Lojo, et almbox. . 2011. The tao of parallelism in algorithms. ACM Sigplan Notices, Vol. 46, 6 (2011), 12--25. Google ScholarDigital Library
- Minghua Shen and Guojie Luo . 2015. Accelerate FPGA routing with parallel recursive partitioning ICCAD. IEEE, 118--125. Google ScholarDigital Library
- Minghua Shen and Guojie Luo . 2017. Corolla: GPU-Accelerated FPGA routing based on subgraph dynamic expansion FPGA. 105--114. Google ScholarDigital Library
Index Terms
- ParaDRo: A Parallel Deterministic Router Based on Spatial Partitioning and Scheduling
Recommendations
Scalable mpNoC for massively parallel systems - Design and implementation on FPGA
The high chip-level integration enables the implementation of large-scale parallel processing architectures with 64 and more processing nodes on a single chip or on an FPGA device. These parallel systems require a cost-effective yet high-performance ...
Co-processing with dynamic reconfiguration on heterogeneous MPSoC: practices and design tradeoffs (abstract only)
FPGA '14: Proceedings of the 2014 ACM/SIGDA international symposium on Field-programmable gate arraysReconfiguration technique has been considered as one of the most promising electronic design automation (EDA) technologies in MPSoC design paradigms. However, due to the unavoidable latency in the reconfiguration procedure, it still poses a significant ...
Coprocessor design to support MPI primitives in configurable multiprocessors
The Message Passing Interface (MPI) is a widely used standard for interprocessor communications in parallel computers and PC clusters. Its functions are normally implemented in software due to their enormity and complexity, thus resulting in large ...
Comments