ABSTRACT
Software fault tolerance mechanisms aim at improving the reliability of software systems. Their effectiveness (i.e., reliability impact) is highly application-specific and depends on the overall system architecture and usage profile. When examining multiple architecture configurations, such as in software product lines, it is a complex and error-prone task to include fault tolerance mechanisms effectively. Existing approaches for reliability analysis of software architectures either do not support modelling fault tolerance mechanisms or are not designed for an efficient evaluation of multiple architecture variants. We present a novel approach to analyse the effect of software fault tolerance mechanisms in varying architecture configurations. We have validated the approach in multiple case studies, including a large-scale industrial system, demonstrating its ability to support architecture design, and its robustness against imprecise input data.
- M. Auerswald, M. Herrmann, S. Kowalewski, and V. Schulte-Coerne. Software Product-Family Engineering, volume 2290 of LNCS, chapter Reliability-Oriented Product Line Engineering of Embedded Systems, pages 237--280. Springer, 2001. Google ScholarDigital Library
- S. Becker, H. Koziolek, and R. Reussner. The Palladio Component Model for model-driven performance prediction. Journal of Systems and Software, 82(1):3--22, 2009. Google ScholarDigital Library
- S. Bernardi, J. Merseguer, and D. Petriu. A dependability profile within MARTE. Software and Systems Modeling, pages 1--24, 2009. Google ScholarDigital Library
- F. Brosch, H. Koziolek, B. Buhnova, and R. Reussner. Parameterized reliability prediction for component-based software architectures. In Proc. of QoSA'10, volume 6093 of LNCS, pages 36--51. Springer, 2010. Google ScholarDigital Library
- L. Cheung, R. Roshandel, N. Medvidovic, and L. Golubchik. Early prediction of software component reliability. In Proc. of ICSE'08, pages 111--120. ACM Press, 2008. Google ScholarDigital Library
- R. C. Cheung. A user-oriented software reliability model. IEEE Trans. Softw. Eng., 6(2):118--125, 1980. Google ScholarDigital Library
- P. Clements and L. Northrop. Software Product Lines: Practices and Patterns. Addison-Wesley, 2001. Google ScholarDigital Library
- V. Cortellessa, H. Singh, and B. Cukic. Early reliability assessment of UML based software models. In Proc. of WOSP'02, pages 302--309. ACM, 2002. Google ScholarDigital Library
- J. Dehlinger and R. R. Lutz. Plfaultcat: A product-line software fault tree analysis tool. Automated Software Engineering, 13(1):169--193, 2006. Google ScholarDigital Library
- A. Filieri, C. Ghezzi, V. Grassi, and R. Mirandola. Reliability analysis of component-based systems with multiple failure modes. In Proc. of CBSE'10, volume 6092 of LNCS, pages 1--20. Springer, 2010. Google ScholarDigital Library
- S. S. Gokhale. Architecture-based software reliability analysis: Overview and limitations. IEEE Trans. on Dependable and Secure Computing, 4(1):32--40, 2007. Google ScholarDigital Library
- K. Goseva-Popstojanova, A. Hassan, A. Guedem, W. Abdelmoez, D. E. M. Nassar, H. Ammar, and A. Mili. Architectural-level risk analysis using UML. IEEE Trans. on Softw. Eng., 29(10):946--960, 2003. Google ScholarDigital Library
- K. Goseva-Popstojanova and K. S. Trivedi. Architecture-based approach to reliability assessment of software systems. Performance Evaluation, 45(2--3):179--204, 2001. Google ScholarDigital Library
- A. Immonen. Software Product Lines, chapter A Method for Predicting Reliability and Availability at the Architecture Level, pages 373--422. Springer, 2006.Google Scholar
- A. Immonen and E. Niemelä. Survey of reliability and availability prediction methods from the viewpoint of software architecture. Software and Systems Modeling, 7(1):49--65, 2008.Google ScholarCross Ref
- K. Kanoun and M. Ortalo-Borrel. Fault-tolerant system dependability-explicit modeling of hardware and software component-interactions. IEEE Transactions on Reliability, 49(4):363--376, 2000.Google ScholarCross Ref
- H. Koziolek, B. Schlich, and C. Bilich. A large-scale industrial case study on architecture-based software reliability analysis. In Proc. of ISSRE'10, pages 279--288. IEEE Computer Society, 2010. Google ScholarDigital Library
- H. Muccini and A. Romanovsky. Architecting fault tolerant systems. Technical Report CS-TR-1051, University of Newcastle upon Tyne, 2007.Google ScholarDigital Library
- F. G. Olumofin and V. B. Misic. Extending the atam architecture evaluation to product line architectures. In Proc. of WICSA'05, pages 45--56. IEEE Computer Society, 2005. Google ScholarDigital Library
- B. Randell. System structure for software fault tolerance. In Proc. Int. Conf. on Reliable software, pages 437--449. ACM, 1975. Google ScholarDigital Library
- R. H. Reussner, H. W. Schmidt, and I. H. Poernomo. Reliability prediction for component-based software architectures. Journal of Systems and Software, 66(3):241--252, 2003. Google ScholarDigital Library
- N. Sato and K. S. Trivedi. Accurate and efficient stochastic reliability analysis of composite services using their compact Markov reward model representations. In Proc. of SCC'07, pages 114--121. IEEE Computer Society, 2007.Google ScholarCross Ref
- B. Schroeder and G. A. Gibson. Understanding disk failure rates: What does an mttf of 1,000,000 hours mean to you? ACM Trans. Storage, 3(3):8, 2007. Google ScholarDigital Library
- V. Sharma and K. Trivedi. Quantifying software performance, reliability and security: An architecture-based approach. Journal of Systems and Software, 80:493--509, 2007. Google ScholarDigital Library
- V. S. Sharma and K. S. Trivedi. Reliability and performance of component based software systems with restarts, retries, reboots and repairs. In Proc. of ISSRE'06, pages 299--310. IEEE, 2006. Google ScholarDigital Library
- W.-L. Wang, D. Pan, and M.-H. Chen. Architecture-based software reliability modeling. Journal of Systems and Software, 79(1):132--146, 2006. Google ScholarDigital Library
- Z. Zheng and M. R. Lyu. Collaborative reliability prediction of service-oriented systems. In Proc. of ICSE'10, pages 35--44. ACM Press, 2010. Google ScholarDigital Library
Index Terms
- Reliability prediction for fault-tolerant software architectures
Recommendations
The N-Version Approach to Fault-Tolerant Software
Evolution of the N-version software approach to the tolerance of design faults is reviewed. Principal requirements for the implementation of N-version software are summarized and the DEDIX distributed supervisor and testbed for the execution of N-...
On Fault Representativeness of Software Fault Injection
The injection of software faults in software components to assess the impact of these faults on other components or on the system as a whole, allowing the evaluation of fault tolerance, is relatively new compared to decades of research on hardware fault ...
A software fix towards fault-tolerant computing
This article describes a low cost software technique for transient fault detection and fault tolerance in a processing system. The random errors caused by potential transients, Electrical Fast Transients (EFT) can be controlled by this proposed ...
Comments