ABSTRACT
The paper presents an overview of a major research project on dependable embedded systems that has started in Fall 2010 and is running for a projected duration of six years. Aim is a 'dependability co-design' that spans various levels of abstraction in the design process of embedded systems starting from gate level through operating system, applications software to system architecture. In addition, we present a new classification on faults, errors, and failures.
- Sani Nassif during the SPP 1500 meeting in Stuttgart, Germany, July 2011.Google Scholar
- Designing Chips without Guarantees. Design & Test of Computers, IEEE, 27(5):60--67, 2010. Google ScholarDigital Library
- R. A. Abdallah and N. R. Shanbhag. Error-Resilient Low-Power Viterbi Decoder Architectures. Signal Processing, IEEE Transactions on, 57(12):4906--4917, 2009. Google ScholarDigital Library
- Philip Axer, Maurice Sebastian, and Rolf Ernst. Reliability analysis for MPSoCs with mixed-critical, hard real-time constraints. In Proc. of Int. Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS), 2011. Google ScholarDigital Library
- P. Barham, B. Dragovic, K. Fraser, S. Hand, T. Harris, A. Ho, R. Neugebauer, I. Pratt, and A. Warfield. Xen and the art of virtualization. In Proceedings of the 19th ACM symposium on Operating systems principles SOSP '03, pages 164--177, 2003. Google ScholarDigital Library
- S. Borkar. Designing reliable systems from unreliable components: the challenges of transistor variability and degradation. Micro, IEEE, 25(6):10--16, 2005. Google ScholarDigital Library
- M. A. Breuer. Multi-media applications and imprecise computation. In Proc. 8th Euromicro Conference on Digital System Design, pages 2--7, 2005. Google ScholarDigital Library
- S. Chinni and R. Hiremane. Virtual machine device queues. 2007.Google Scholar
- V. K. Chippa, D. Mohapatra, A. Raghunathan, K. Roy, and S. T. Chakradhar. Scalable effort hardware design: Exploiting algorithmic resilience for energy efficiency. In Proc. 47th ACM/IEEE Design Automation Conf. (DAC), pages 555--560, 2010. Google ScholarDigital Library
- C. T. Chow, L. S. M. Tsui, P. H. W. Leong, W. Luk, and S. J. E. Wilton. Dynamic voltage scaling for commercial FPGAs. In ICFPT, 2005, pages 173--180, 2005.Google ScholarCross Ref
- Ayse Kivilcim Coskun, Tajana -- Simunic Rosing, Keith A. Whisnant, and Kenny C. Gross. Static and dynamic temperature-aware scheduling for multiprocessor SoCs. IEEE Trans. Very Large Scale Integr. Syst., 16:1127--1140, 2008. Google ScholarDigital Library
- P. Dubey. Recognition, Mining and Synthesis Moves Computers to the Era of Tera. Technology@Intel Magazine, pages 1--8, 2005.Google Scholar
- Thomas Ebi, David Kramer, Wolfgang Karl, and Jörg Henkel. Economic learning for thermal-aware power budgeting in many-core architectures. In Proc. 9th Intl. Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS), 2011. Google ScholarDigital Library
- Thomas Ebi, Holm Rauchfuss, Andreas Herkersdorf, and Jörg Henkel. Agent-based thermal management using real-time I/O communication relocation for 3D many-cores. In International Workshop on Power and Timing Modeling, Optimization and Simulation (PATMOS), pages 112--121, 2011. Google ScholarDigital Library
- S. Eisenhardt, A. Küster, T. Schweizer, T. Kuhn, and W. Rosenstiel. Runtime datapath remapping for fault-tolerant coarse-grained reconfigurable architectures. In International Workshop on Reconfigurable Communication-centric Systems-on-Chip (ReCoSoC), 2011.Google ScholarCross Ref
- S. Eisenhardt, A. Küster, T. Schweizer, T. Kuhn, and W. Rosenstiel. Spatial and temporal data path remapping for fault-tolerant coarse-grained reconfigurable architectures. In IEEE International Symposium on Defect and Fault Tolerance in VLSI and Nanotechnology Systems (DFT), 2011. accepted to be published. Google ScholarDigital Library
- Michael Engel, Florian Schmoll, Andreas Heinig, and Peter Marwedel. Temporal Properties of Error Handling for Multimedia Applications. In Proceedings of the 14th ITG Conference on Electronic Media Technology, 2011.Google Scholar
- European Nanoelectronics Initiative Advisory Council. Eniac strategic research agenda - european technology platform nanoelectronics. Second Edition, 2007.Google Scholar
- Andreas Heinig, Michael Engel, Florian Schmoll, and Peter Marwedel. Improving Transient Memory Fault Resilience of an H.264 Decoder. In Proceedings of the Workshop on Embedded Systems for Real-time Multimedia (ESTIMedia), 2010.Google ScholarCross Ref
- Andreas Heinig, Michael Engel, Florian Schmoll, and Peter Marwedel. Using Application Knowledge to Improve Embedded Systems Dependability. In Proceedings of the Workshop on Hot Topics in System Dependability (HotDep), 2010.Google Scholar
- Rafik Henia, Arne Hamann, Marek Jersak, Razvan Racu, Kai Richter, and Rolf Ernst. System level performance analysis - the SymTA/S approach. IEE Proceedings Computers and Digital Techniques, 2005.Google ScholarCross Ref
- W.-L. Hung, G. M. Link, Yuan Xie, N. Vijaykrishnan, and M. J. Irwin. Interconnect and thermal-aware oorplanning for 3d microprocessors. In Proceedings of the 7th International Symposium on Quality Electronic Design (ISQED), pages 98--104, 2006. Google ScholarDigital Library
- International Electrotechnical Commission (IEC). Functional safety of electrical / electronic / programmable electronic safety-related systems, 1998.Google Scholar
- Phillip H. Jones, Young H. Cho, and John W. Lockwood. Dynamically optimizing FPGA applications by monitoring temperature and workloads. In VLSI Design. Held jointly with 6th International Conference on Embedded Systems., 20th International Conference on, pages 391--400, 2007. Google ScholarDigital Library
- A. Khajeh, Minyoung Kim, N. Dutt, A. M. Eltawil, and F. J. Kurdahi. Cross-layer co-exploration of exploiting error resilience for video over wireless applications. In Proc. IEEE/ACM/IFIP Workshop Embedded Systems for Real-Time Multimedia ESTImedia, pages 13--18, 2008.Google ScholarCross Ref
- Veit B. Kleeberger, Sebastian Kiesel, Ulf Schlichtmann, and Samarjit Chakraborty. Program-Aware Circuit Level Timing Analysis. In International Symposium on Integrated Circuits (ISIC), 2011. To appear.Google Scholar
- C. LaFrieda, E. Ipek, J. F. Martinez, and R. Manohar. Utilizing dynamically coupled cores to form a resilient chip multiprocessor. In Proc. of Int. Conf. Dependable Systems and Networks, pages 317--326, 2007. Google ScholarDigital Library
- L. Leem, Hyungmin Cho, J. Bau, Q. A. Jacobson, and S. Mitra. ERSA: Error Resilient System Architecture for probabilistic applications. In Proc. Design, Automation & Test in Europe Conf. & Exhibition (DATE), pages 1560--1565, 2010. Google ScholarDigital Library
- Daniel Lohmann, Wanja Hofer, Wolfgang Schröder-Preikschat, Jochen Streicher, and Olaf Spinczyk. CiAO: An aspect-oriented operating-system family for resource-constrained embedded systems. In Proceedings of the USENIX Annual Technical Conference, pages 215--228, 2009. Google ScholarDigital Library
- Enno Lübbers and Marco Platzner. ReconOS: Multithreaded programming for reconfigurable computers. ACM Trans. Embed. Comput. Syst., 9:8:1--8:33, 2009. Google ScholarDigital Library
- M. Glaß, M. Lukasiewycz, F. Reimann, C. Haubelt, and J. Teich. Symbolic system level reliability analysis. In Proceedings of the 2010 International Conference on Computer-Aided Design (ICCAD), pages 185--189. Google ScholarDigital Library
- M. May, M. Alles, and N. Wehn. A Case Study in Reliability-Aware Design: A Resilient LDPC Code Decoder. In Proc. Design, Automation and Test in Europe (DATE), pages 456--461, 2008. Google ScholarDigital Library
- M. May, N. Wehn, A. Bouajila, J. Zeppenfeld, W. Stechele, A. Herkersdorf, D. Ziener, and J. Teich. A Rapid Prototyping System for Error-Resilient Multi-Processor Systems-on-Chip. In Proc. Design, Automation and Test in Europe (DATE), pages 375--380, 2010. Google ScholarDigital Library
- S. Mitra, K. Brelsford, Young Moon Kim, Hsiao-Heng Kelin Lee, and Yanjing Li. Robust System Design to Overcome CMOS Reliability Challenges. Emerging and Selected Topics in Circuits and Systems, IEEE Journal on, 1(1):30--41, 2011.Google Scholar
- S. Mitra, K. Brelsford, and P. N. Sanda. Cross-layer resilience challenges: Metrics and optimization. In Proc. Design, Automation & Test in Europe Conf. & Exhibition (DATE), pages 1029--1034, 2010. Google ScholarDigital Library
- Debabrata Mohapatra, Georgios Karakonstantis, and Kaushik Roy. Significance driven computation: a voltage-scalable, variation-aware, quality-tuning motion estimator. In Proceedings of the 14th ACM/IEEE international symposium on Low power electronics and design, ISLPED, pages 195--200, 2009. Google ScholarDigital Library
- Gordon E. Moore. No exponential is forever: but "forever" can be delayed! {semiconductor industry}. In Solid-State Circuits Conference. Digest of Technical Papers (ISSCC), pages 20--23, vol.1, 2003.Google ScholarCross Ref
- F. Mulas, D. Atienza, A. Acquaviva, S. Carta, L. Benini, and G. De Micheli. Thermal balancing policy for multiprocessor stream computing platforms. Computer-Aided Design of Integrated Circuits and Systems, IEEE Transactions on, 28(12):1870 --1882, 2009. Google ScholarDigital Library
- Krishna V. Palem. Energy aware algorithm design via probabilistic computing: from algorithms and models to moore's law and novel (semiconductor) devices. In Proceedings of the international conference on Compilers, architecture and synthesis for embedded systems, CASES, pages 113--116, 2003. Google ScholarDigital Library
- D. K. Pradhan. Fault-tolerant computer system design. Prentice-Hall, Inc., 1996. Google ScholarDigital Library
- Semeen Rehman, Muhammad Shafique, Florian Kriebel, and Jörg Henkel. Reliable software for unreliable hardware: Embedded code generation aiming at reliability. In Proc. 9th Intl. Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS), 2011. Google ScholarDigital Library
- Semeen Rehman, Muhammad Shafique, Florian Kriebel, and Jörg Henkel. ReVC: Computationally reliable video coding on unreliable hardware platforms: A case study on error-tolerant H.264/AVC CAVLC entropy coding. In Proc. 18th International Conference on Image Processing (ICIP), 2011.Google ScholarCross Ref
- Michael Roitzsch and Martin Pohlack. Video quality and system resources: Scheduling two opponents. J. Vis. Commun. Image Represent., 19:473--488, 2008. Google ScholarDigital Library
- B. Sander, J. Schnerr, and O. Bringmann. ESL power analysis of embedded processors for temperature and reliability estimations. In Proc. 7th Intl. Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS), pages 239--248, 2009. Google ScholarDigital Library
- Horst Schirmeier, Rüdiger Kapitza, Daniel Lohmann, and Olaf Spinczyk. DanceOS: Towards dependability aspects in configurable embedded operating systems. In Proceedings of the 3rd HiPEAC Workshop on Design for Reliability (DFR), pages 21--26, 2011.Google Scholar
- Naresh R. Shanbhag, Rami A. Abdallah, Rakesh Kumar, and Douglas L. Jones. Stochastic computation. In Proc. 47th ACM/IEEE Design Automation Conf. (DAC), pages 859--864, 2010. Google ScholarDigital Library
- S. K. Shukla and R. I. Bahar. Nano, quantum and molecular computing: implications to high level design and validation. Solid Mechanics and Its Applications Series. Kluwer Academic Publishers, 2004. Google ScholarDigital Library
- D. P. Siewiorek and R. S. Swarz. Reliable computer systems: design and evaluation, volume 2. Digital Press, 1992. Google ScholarDigital Library
- J. C. Smolens, B. T. Gold, J. Kim, B. Falsafi, J. C. Hoe, and A. G. Nowatryk. Fingerprinting: bounding soft-error-detection latency and bandwidth. 24(6):22--29, 2004. Google ScholarDigital Library
- Olaf Spinczyk and Daniel Lohmann. The design and implementation of AspectC++. Knowledge-Based Systems, Special Issue on Techniques to Produce Intelligent Secure Software, 20(7):636--651, 2007. Google ScholarDigital Library
- J. von Neumann. Probabilistic logics and synthesis of reliable organisms from unreliable components. In Automata Studies, pages 43--98, 1956.Google ScholarCross Ref
- P. Willmann, J. Shafer, D. Carr, A. Menon, S. Rixner, A. L. Cox, and W. Zwaenepoel. Concurrent direct network access for virtual machine monitors. In Proceedings of the 13th International Symposium on High Performance Computer Architecture, pages 306--317. Citeseer, 2007. Google ScholarDigital Library
- Xiuyi Zhou, Jun Yang, Yi Xu, Youtao Zhang, and Jianhua Zhao. Thermal-aware task scheduling for 3d multicore processors. IEEE Trans. Parallel Distrib. Syst., 21:60--71, 2010. SPP1500 - http://spp1500.itec.kit.edu/. Google ScholarDigital Library
- Joachim Becker. Runtime Reconfigurable Analog Circuits and Adaptive Filter Synthesis for Compensation of Unreliable Hardware Constraints (hexFPAA).Google Scholar
- Uwe Brinkschulte and Lars Hedrich. MixedCoreSoC - A Highly Dependable Self-Adaptive Mixed-Signal Multi-Core System-on-Chip (MixedCoreSoC).Google Scholar
- Samarjit Chakraborty and Ulf Schlichtmann. Lifting Device-Level Characteristics for Error Resilient System Level Design: A Crosslayer Approach (LIFT).Google Scholar
- Rolf Ernst and Hermann Härtig. ASTEROID - An Analyzable, Resilient, Embedded Real-Time Operating System Design (ASTEROID).Google Scholar
- Jörg Henkel and Andreas Herkersdorf. VirTherm-3D Communication Virtualization Enabling Thermal Management for Dependable 3D Many-Cores (VirTherm-3D).Google Scholar
- Jörg Henkel and Hans-Joachim Wunderlich. OTERA: Online Test Strategies for Reliable Reconfigurable Architectures (OTERA).Google Scholar
- Rüdiger Kapitza, Daniel Lohmann, and Olaf Spinczyk. Dependability Aspects in Configurable Embedded Operating Systems (DanceOS).Google Scholar
- Peter Marwedel and Michael Engel. Software-Based Error Handling Using Cooperation Between Compilers and Operating Systems (FEHLER).Google Scholar
- Marco Platzner. Temperature-driven Thread Mapping and Shadowing in Hybrid Multi-Cores (SMASH).Google Scholar
- Wolfgang Rosenstiel. Self-Adaptive Coarse-Grained Reconfigurable Architectures as Reliability Enhancers in Embedded Systems (ARES).Google Scholar
- Mehdi Tahoori. Providing Efficient Reliability in Critical Embedded Systems (PERCEDES).Google Scholar
- Jürgen Teich. Compositional System Level Reliability Analysis in the Presence of Uncertainties (CRAU).Google Scholar
- Norbert Wehn. Design of Efficient, Dependable VLSI Architectures Based on a Cross-Layer-Reliability Approach Using Wireless Communication as Application (MIMODeS).Google Scholar
Index Terms
- Design and architectures for dependable embedded systems
Recommendations
Software architectures for dependable systems: a software engineering perspective
ICSE '06: Proceedings of the 28th international conference on Software engineeringAlthough there is a large body of research in dependability, architectural level reasoning about dependability is only just emerging as an important theme in software development. This is due to the fact that dependability concerns are often left until ...
Time-predictable and composable architectures for dependable embedded systems
EMSOFT '11: Proceedings of the ninth ACM international conference on Embedded softwareEmbedded systems must interact with their real-time environment in a timely and dependable fashion. Most embedded-systems architectures and design processes consider "non-functional" properties such as time, energy, and reliability as an afterthought, ...
Diskless Checkpointing with Rollback-Dependency Trackability
SRDS '10: Proceedings of the 2010 29th IEEE Symposium on Reliable Distributed SystemsOne way to implement fault tolerant applications is storing its current state in stable memory and, when a failure occurs, restart the application from the last global consistent state. If the number of simultaneous failures is expected to be small a ...
Comments