ABSTRACT
How should we perform component-specific adaptation for FPGAs? Prior work has demonstrated that the negative effects of variation can be largely mitigated using complete knowledge of device characteristics and full per-FPGA CAD flow. However, the cost of per-FPGA characterization and mapping could be prohibitively expensive. We explore light-weight options for per-FPGA mapping that avoid the need for a priori device characterization and perform less expensive per FPGA customization work. We characterize the tradeoff between Quality-of-Results (energy, delay) and per-device mapping costs for 7 design points ranging from complete mapping based on knowledge to no per-device mapping. We show that it is possible to get 48-77% of the component-specific mapping delay benefit or 57% of the energy benefit with a mapping that takes less than 20 seconds per FPGA. An incremental solution can start execution after a 21 ms bitstream load and converge to 77% delay benefit after 18 seconds of runtime.
- A. Asenov. Random dopant induced threshold voltage lowering and fluctuations in sub-0.1,μm MOSFET's: A 3-D "atomistic" simulation study. IEEE Trans. Electron Devices, 45(12):2505--2513, December 1998. Google ScholarCross Ref
- A. Asenov. Intrinsic threshold voltage fluctuations in decanano MOSFETs due to local oxide thickness variation. IEEE Trans. Electron Devices, 49(1):112--119, January 2002. Google ScholarCross Ref
- A. Asenov, S. Kaya, and A. R. Brown. Intrinsic parameter fluctuations in decananometer MOSFETs introduced by gate line edge roughness. IEEE Trans. Electron Devices, 50(5):1254--1260, May 2003. Google ScholarCross Ref
- K. Bernstein, D. J. Frank, A. E. Gattiker, W. Haensch, B. L. Ji, S. R. Nassif, E. J. Nowak, D. J. Pearson, and N. J. Rohrer. High-performance CMOS variability in the 65-nm regime and beyond. IBM J. Res. and Dev., 50(4/5):433--449, July/September 2006. Google ScholarDigital Library
- V. Betz and J. Rose. http://www.eecg.toronto.edu/vaughn/challenge/challenge.htmlFPGA Place-and-Route Challenge. http://www.eecg.toronto.edu/vaughn/challenge/challenge.html, 1999.Google Scholar
- V. Betz, J. Rose, and A. Marquardt. Architecture and CAD for Deep-Submicron FPGAs. Kluwer Academic Publishers, Norwell, Massachusetts, 02061 USA, 1999. Google ScholarCross Ref
- D. Bol, R. Ambroise, D. Flandre, and J.-D. Legat. Interests and limitations of technology scaling for subthreshold logic. IEEE Trans. VLSI Syst., 17(10):1508--1519, 2009. Google ScholarDigital Library
- C. T. Chow, L. S. M. Tsui, P. H. W. Leong, W. Luk, and S. J. E. Wilton. Dynamic voltage scaling for commercial FPGAs. In ICFPT, pages 173--180, 2005. Google ScholarCross Ref
- W. B. Culbertson, R. Amerson, R. Carter, P. Kuekes, and G. Snider. Defect tolerance on the TERAMAC custom computer. In FCCM, pages 116--123, April 1997. Google ScholarCross Ref
- S. Devadas, A. Ghosh, and K. Keutzer. Logic Synthesis. McGraw-Hill, New York, 1994.Google ScholarDigital Library
- S. Ghiasi, E. Bozorgzadeh, S. Choudhuri, and M. Sarrafzadeh. A unified theory of timing budget management. In ICCAD, pages 653--659, 2004. Google ScholarDigital Library
- H. Giesen, B. Gojman, R. Rubin, and A. DeHon. Continuous online self-monitoring introspection circuitry for timing repair by incremental partial-reconfiguration (COSMIC TRIP). In FCCM, pages 111--118, 2016.Google ScholarCross Ref
- B. Gojman and A. DeHon. GROK-INT: Generating real on-chip knowledge for interconnect delays using timing extraction. In FCCM, pages 88--95, 2014.Google ScholarCross Ref
- B. Gojman, S. Nalmela, N. Mehta, N. Howarth, and A. DeHon. GROK-LAB: Generating real on-chip knowledge for intra-cluster delays using timing extraction. ACM Tr. Reconfig. Tech. and Sys., 7(4):5:1--5:23, Dec. 2014.Google Scholar
- R. Graham. Bounds on multiprocessor timing anomalies. SIAM J. Appl. Math, 7:416--429, 1969. Google ScholarCross Ref
- C. He, M. F. Jacome, and G. de Veciana. A reconfiguration-based defect-tolerant design paradigm for nanotechnologies. IEEE Design and Test of Computers, 22(4):316--326, July-August 2005. Google ScholarDigital Library
- D. L. How and S. Atsatt. Sectors: Divide conquer and softwarization in the design and validation of the Stratix 10 FPGA. In FCCM, pages 119--126, May 2016.Google ScholarCross Ref
- K. J. Kuhn. Reducing variation in advanced logic technologies: Approaches to process and design for manufacturability of nanoscale cmos. In IEDM, pages 471--474, 2007.Google ScholarCross Ref
- J. Lach, W. H. Mangione-Smith, and M. Potkonjak. Low overhead fault-tolerant FPGA systems. IEEE Trans. VLSI Syst., 6(2):212--221, June 1998. Google ScholarDigital Library
- J. M. Levine, E. Stott, G. A. Constantinides, and P. Y. Cheung. Online measurement of timing in circuits: for health monitoring and dynamic voltage & frequency scaling. In FCCM, pages 109--116, 2012. Google ScholarDigital Library
- D. Lewis, E. Ahmed, D. Cashman, T. Vanderhoek, C. Lane, A. Lee, and P. Pan. Architectural enhancements in Stratix-III and Stratix-IV. In FPGA, pages 33--42, 2009. Google ScholarDigital Library
- T. A. Linscott, B. Gojman, R. Rubin, and A. DeHon. Pitfalls and tradeoffs in simultaneous, on-chip FPGA delay measurement. In FPGA, pages 100--104, February 2016. Google ScholarDigital Library
- J. Luu, I. Kuon, P. Jamieson, T. Campbell, A. Ye, W. M. Fang, and J. Rose. VPR 5.0: FPGA CAD and architecture exploration tools with single-driver routing, heterogeneity and process scaling. In FPGA, pages 133--142, 2009.Google ScholarDigital Library
- M. I. Masud and S. Wilton. A new switch block for segmented FPGAs. In FPL, pages 274--281, 1999. Google ScholarCross Ref
- L. McMurchie and C. Ebeling. http://www.cs.washington.edu/research/projects/lis/www/papers/postscript/mcmurchie-FPGA95.psPathFinder: A Negotiation-Based Performance-Driven Router for FPGAs. In FPGA, pages 111--117, 1995.Google ScholarDigital Library
- N. Mehta, R. Rubin, and A. DeHon. http://ic.ese.upenn.edu/abstracts/cspec_limit_fpga2012.htmlLimit Study of Energy & Delay Benefits of Component-Specific Routing. In FPGA, pages 97--106, 2012.Google ScholarDigital Library
- K. Minkovich and J. Cong. Mapping for better than worst-case delays in LUT-based FPGA designs. In FPGA, pages 56--64, 2008.Google ScholarDigital Library
- R. Rubin and A. DeHon. http://ic.ese.upenn.edu/abstracts/cya_trets2011.htmlChoose-Your-Own-Adventure Routing: Lightweight Load-Time Defect Avoidance. ACM Tr. Reconfig. Tech. and Sys., 4(4), December 2011.Google Scholar
- R. Rubin and A. DeHon. http://ic.ese.upenn.edu/abstracts/pathfinder_noise_fpga2011.html Timing-Driven Pathfinder Pathology and Remediation: Quantifying and Reducing Delay Noise in VPR-Pathfinder. In FPGA, pages 173--176, 2011.Google Scholar
- P. Sedcole and P. Y. K. Cheung. Parametric yield modeling and simulations of FPGA circuits considering within-die delay variations. ACM Tr. Reconfig. Tech. and Sys., 1(2), June 2008.Google Scholar
- E. A. Stott, J. S. J. Wong, P. Sedcole, and P. Y. K. Cheung. Degradation in FPGAs: measurement and modelling. In FPGA, page 229, 2010. Google ScholarDigital Library
- T. Tuan, A. Lesea, C. Kingsley, and S. Trimberger. Analysis of within-die process variation in 65nm FPGAs. In ISQED, pages 1--5, March 2011.Google ScholarCross Ref
- J. S. Wong, P. Sedcole, and P. Y. K. Cheung. Self-measurement of combinatorial circuit delays in FPGAs. ACM Tr. Reconfig. Tech. and Sys., 2(2):1--22, June 2009. Google ScholarDigital Library
- Xilinx, Inc., 2100 Logic Drive, San Jose, CA 95124. http://www.xilinx.com/bvdocs/userguides/ug191.pdf Virtex-5 FPGA Configuration User Guide, September 2008. UG191 http://www.xilinx.com/bvdocs/userguides/ug191.pdf.Google Scholar
- W. Zhao and Y. Cao. New generation of predictive technology model for sub-45 nm early design exploration. IEEE Trans. Electron Dev., 53(11):2816--2823, 2006. Google ScholarCross Ref
- K. M. Zick and J. P. Hayes. On-line sensing for healthier FPGA systems. In FPGA, pages 239--248, 2010. Google ScholarDigital Library
Index Terms
- Quality-Time Tradeoffs in Component-Specific Mapping: How to Train Your Dynamically Reconfigurable Array of Gates with Outrageous Network-delays
Recommendations
Limit study of energy & delay benefits of component-specific routing
FPGA '12: Proceedings of the ACM/SIGDA international symposium on Field Programmable Gate ArraysAs feature sizes scale toward atomic limits, parameter variation continues to increase, leading to increased margins in both delay and energy. The possibility of very slow devices on critical paths forces designers to increase transistor sizes, reduce ...
GROK-LAB: Generating Real On-chip Knowledge for Intra-cluster Delays Using Timing Extraction
Timing Extraction identifies the delay of fine-grained components within an FPGA. From these computed delays, the delay of any path can be calculated. Moreover, a comparison of the fine-grained delays allows a detailed understanding of the amount and ...
GROK-LAB: generating real on-chip knowledge for intra-cluster delays using timing extraction
FPGA '13: Proceedings of the ACM/SIGDA international symposium on Field programmable gate arraysTiming Extraction identifies the delay of fine-grained components within an FPGA. From these computed delays, the delay of any path can be calculated. Moreover, a comparison of the fine-grained delays allows a detailed understanding of the amount and ...
Comments