Abstract
The aim of this article is the definition of a reliability-aware methodology for the design of embedded systems on multi-FPGA platforms. The designed system must be able to detect the occurrence of faults globally and autonomously, in order to recover or to mitigate their effects. Two categories of faults are identified, based on their impact on the device elements; (i) recoverable faults, transient problems that can be fixed without causing a lasting effect namely and (ii) nonrecoverable faults, those that cause a permanent problem, making the portion of the fabric unusable. While some aspects can be taken from previous solutions available in literature, several open issues exist. In fact, no complete design methodology handling all the peculiar issues of the considered scenario has been proposed yet, a gap we aim at filling with our work. The final system exposes reliability properties and increases its overall lifetime and availability.
- P. Alfke. 1998. Xilinx FPGAs: A technical overview for the first-time user. http://www.xilinx.com/support/documentation/application_notes/xapp097.pdf.Google Scholar
- A. Armin, S. Y. Mahnaz, and N. Zainalabedin. 2006. An optimum ORA BIST for multiple fault FPGA look-up table testing. In Proceedings of the 15th Asian Test Symposium (ATS'06). 293--298. Google ScholarDigital Library
- Atmel. 2007. F280E 2007 rad hard reprogrammable FPGA. https://nepp.nasa.gov/mafa/talks/MAFA07_06_Renaud.pdf.Google Scholar
- C. Bolchini, L. Fossati, D. M. Codinachs, A. Miele, and C. Sandionigi. 2010. A reliable reconfiguration controller for fault-tolerant embedded systems on multi-FPGA platforms. In Proceedings of the International Symposium on Defect and Fault Tolerance in VLSI Systems (DTF'10). 191--199. Google ScholarDigital Library
- C. Bolchini and A. Miele. 2008. Design space exploration for the design of reliable sram-based FPGA systems. In Proceedings of the International Symposium on Defect and Fault Tolerance in VLSI Systems (DTF'08). 332--340. Google ScholarDigital Library
- C. Bolchini, A. Miele, and C. Sandionigi. 2011a. A novel design methodology for implementing reliability-aware systems on sram-based FPGAs. IEEE Trans. Comput. 60, 12, 1744--1758. Google ScholarDigital Library
- C. Bolchini, A. Miele, and C. Sandionigi. 2011b. Automated resource-aware floorplanning of reconfigurable areas in partially-reconfigurable FPGA system. In Proceedings of the International Conference on Field Programmable Logic and Applications (FPL'11). 532--538. Google ScholarDigital Library
- C. Bolchini, C. Sandionigi, L. Fossati, and D. M. Codinachs. 2011c. A reliable fault classifier for dependable systems on SRAM-based FPGAs. In Proceedings of the 17th IEEE International On-Line Testing Symposium (IOLTS'11). 92--97. Google ScholarDigital Library
- C. Bolchini and C. Sandionigi. 2010. Fault classification for SRAM-based FPGAs in the space environment for fault mitigation. IEEE Embedd. Syst. Lett. 2, 107--110. Google ScholarDigital Library
- C. Bolchini and C. Sandionigi. 2011. A reliability-aware partitioner for multi-FPGA platforms. In Proceedings of the International Symposium on Defect and Fault Tolerance in VLSI Systems (DTF'11). 34--40. Google ScholarDigital Library
- C. Carmichael, M. Caffrey, and A. Salazari. 2000. Correcting single-event upsets through virtex partial configuration. http://atlas-proj-tgc.web.cern.ch/atlas-proj-tgc/docs/SEUs_xapp216.pdf.Google Scholar
- C. Carmichael and C. W. Tsen. 2009. Correcting single-event upsets in virtex-4 FPGA configuration memory. http://www.xilinx.com/support/documentation/application_notes/xapp1088.pdf.Google Scholar
- J. Emmert, C. Stroud, B. Skaggs, and M. Abramovici. 2000. Dynamic fault tolerance in FPGAs via partial reconfiguration. In Proceedings of the IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM'00). 165--174. Google ScholarDigital Library
- European Cooperation for Space Standardization. 2008. Methods for the calculation of radiation received and its effects, and a policy for design margin. http://www.ecss.nl/forums/ecss/dispatch.cgi/home/showFile/100697/d20080414114318/No/ECSS-E-10-12A%20v0.20Standard.pdf.Google Scholar
- L. Fossati and J. Ilstad. 2011. The future of embedded systems at esa: Towards adaptability and reconfigurability. In Proceedings of the NASA/ESA Conference on Adaptive Hardware and Systems (AHS'11). 113--120.Google Scholar
- S. Hauck and A. Dehon. 2007. Reconfigurable Computing: The Theory and Practice of FPGA-Based Computation. Morgan Kaufmann, San Fransisco. Google ScholarDigital Library
- Lattice Semiconductor. 2010. FPGA devices. http://www.latticesemi.com/products/fpga/index.cfm.Google Scholar
- T. May and M. Woods. 1979. Alpha-particle-induced soft errors in dynamic memories. IEEE Trans. Electron Devices 26, 1, 2--9.Google ScholarCross Ref
- S. Mitra, W.-J. Huang, N. R. Saxena, S.-Y. Yu, and E. Mccluskey. 2004. Reconfigurable architecture for autonomous self-repair. IEEE Des. Test Comput. 21, 3, 228--240. Google ScholarDigital Library
- D. P. Montminy, R. O. Baldwin, P. D. Williams, and B. E. Mullins. 2007. Using relocatable bitstreams for fault tolerance. In Proceedings of the NASA/ESA Conference on Adaptive Hardware and Systems (AHS'07). 701--708. Google ScholarDigital Library
- K. S. Morgan. 2006. SEU-induced persistent error propagation in FPGAs. http://scholarsarchive.byu.edu/cgi/viewcontent.cgi?article=1520&context=etd.Google Scholar
- D. Petrick, W. Powell, J. W. Howard, Jr., and K. A. Label. 2004. Virtex-II pro SEE test methods and results. In Proceedings of the Military and Aerospace Applications of Programmable Devices and Technologies Conference.Google Scholar
- C. Poivey, M. Berg, S. Stansberry, M. Friendlich, H. Kim, D. Petrick, and K. A. Label. 2007. Heavy ion SEE test of virtex4 FPGA xc4vfx60 from xilinx. http://www.klabs.org/DEI/Processor/PowerPC/xilinx/T021607_XC4VFX60.pdf.Google Scholar
- M. Renovell. 2002. A structural test methodology for SRAM-based FPGAs. In Proceedings of the Symposium on Integrated Circuits and Systems Design (SBCCI'02). 385. Google ScholarDigital Library
- P. K. Samudrala, J. Ramos, and S. Katkoori. 2004. Selective triple modular redundancy (STMR) based single-event upset (SEU) tolerant synthesis for FPGAs. IEEE Trans. Nuclear Sci. 51, 5, 2957--2969.Google ScholarCross Ref
- G. L. Smith and L. De La Torre. 2006. Techniques to enable FPGA based reconfigurable fault tolerant space computing. In Proceedings of the IEEE Aerospace Conference. 1--11.Google Scholar
- S. Srinivasan, R. Krishnan, P. Mangalagiri, Y. Xie, V. Narayanan, M. J. Irwin, and K. Sarpatwarii. 2008. Toward increasing FPGA lifetime. IEEE Embedd. Syst. Lett. 5, 2, 115--127. Google ScholarDigital Library
- S. Srinivasan, P. Mangalagiri, Y. Xie, N. Vijaykrishnan, and K. Sarpatwari. 2006. FLAW: FPGA lifetime awareness. In Proceedings of the Design Automation Conference (DAC'06). 630--635. Google ScholarDigital Library
- Synplicity. 2014. HAPS-34. http://www.synopsys.com/home.aspx.Google Scholar
- Xilinx. 2006. TMRTool. http://www.xilinx.com/esp/milaero/collateral/tmrtoolsellsheetwr.pdf.Google Scholar
- J. F. Ziegler. 1996. Terrestrial cosmic rays and soft errors. IBM J. Res. Develop. 40, 1, 19--39. Google ScholarDigital Library
Index Terms
- Design of Hardened Embedded Systems on Multi-FPGA Platforms
Recommendations
Combining checkpointing and scrubbing in FPGA-based real-time systems
VTS '13: Proceedings of the 2013 IEEE 31st VLSI Test Symposium (VTS)SRAM-based FPGAs provide an attractive solution for building high-performance embedded computing systems. Fault tolerant mechanisms are usually implemented in FPGA-based critical systems to improve their vulnerability to transient faults. Most fault ...
A Reliable Reconfiguration Controller for Fault-Tolerant Embedded Systems on Multi-FPGA Platforms
DFT '10: Proceedings of the 2010 IEEE 25th International Symposium on Defect and Fault Tolerance in VLSI SystemsThis paper proposes the design of a controller managing the fault tolerance of multi-FPGA platforms, contributing to the creation of a reliable system featuring high flexibility and resource availability. A fault management strategy that exploits the ...
A Reliability-Aware Partitioner for Multi-FPGA Platforms
DFT '11: Proceedings of the 2011 IEEE International Symposium on Defect and Fault Tolerance in VLSI and Nanotechnology SystemsThis paper presents a partitioning approach for reliable systems on multi-FPGA platforms. We propose a Mixed Integer Linear Programming model that distributes a system composed of self-checking and independently recoverable areas among the available ...
Comments