skip to main content
10.1145/1529282.1529506acmconferencesArticle/Chapter ViewAbstractPublication PagessacConference Proceedingsconference-collections
research-article

Adaptive optimal checkpoint interval and its impact on system's overall quality in soft real-time applications

Published: 08 March 2009 Publication History

Abstract

Soft real-time systems often have to consider both timing and probabilistic fault-tolerance requirements. When checkpointing techniques are used for fault tolerance purposes, the checkpointing frequency unyieldingly affects the system's overall quality measured by an integrated value of system QoS properties, such as availability, task execution time, and task deadline miss probability. In this paper, we first formally analyze the relationships between checkpoint interval and system availability, task execution time, and task deadline miss probability, respectively by considering a Poisson probabilistic fault model. We further define the system's overall quality as a weighted sum of these three QoS measures, from which an optimization problem is formulated to decide the checkpoint interval that maximizes system's overall quality. Also presented in the paper are a prototype implementation of a framework that allows adaptive checkpointing and a set of experiments executed upon the framework that further validate our analytical results.

References

[1]
A. Burns, G. Bernat, I. Broster. A probabilistic Framework for Schedulability Analysis. In Proceeding of IEEE EmSoft. 2003.
[2]
G. C. Buttazzo, M. Caccamo. Minimizing Aperiodic Response Times in a Firm Real-Time Environment. IEEE Transactions on Software Engineering. 25(1). 22--32. 1999.
[3]
K. M. Chandy, J. C. Browne, C. W. Dissly, W. R. Uhrig. Analytic Models for Rollback and Recovery Strategies in Database Systems. IEEE Transaction of Software Engineering. SE-1, 1, 100--110. 1975.
[4]
E. Gelenbe. On the Optimal Checkpoint Interval. Journal of the ACM, 26 (2), 259--270. 1979.
[5]
T. Dohi, N. Kaio, K. S. Trivedi. Availability Models with Age-Dependent Checkpoint. In Proceedings of 21st IEEE Symposium on Reliable Distributed Systems. 2002.
[6]
H. Lee, H. Shin and S. Min. Worst case timing requirement of real-time tasks with time redundancy. In Proc. Real-Time Computing Systems and Application. 1999. 410--414.
[7]
S. W. Kwak, B. J. Choi and B. K. Kim. An optimal checkpointing-strategy for real-time control systems under transient faults. In IEEE Transaction of Reliability, vol. 50, no. 3, pp. 293--301, 2001.
[8]
P. B. Goes, U. Sumita. Stochastic Models for Performance Analysis of Database Recovery Control. IEEE Transaction Computer. C-44 (4), 561--576. 1995.
[9]
Y. Zhang, R. Dick, K. Chakrabarty. Energy-Aware Deterministic Fault Tolerance in Distributed Real-Time Embedded Systems. In the Proceeding of Design Automation Conference. 2004.
[10]
G. Bernat, A. Colin, and S. Petters. Wcet analysis of probabilistic hard real-time systems. Real-Time Systems Symposium, 2002. RTSS 2002. 23rd IEEE, pp. 279--288, 2002.
[11]
R. Wilhelm, J. Engblom, A. Ermedahl, N. Holsti, S. Thesing, D. Whalley, G. Bernat, C. Ferdinand, R. Heckmann, T. Mitra, F. Muller, I. Puaut, P. Puschner, J. Staschulat, and P. Stenstr. The worst-case execution time problem - overview of methods and survey of tools. Malardalen University, Technical Report ISSN 1404--3041 ISRN MDH-MRTC-209/2007-1-SE, March 2007.
[12]
C. L. Hwang and K. Yoon, Multiple Criteria Decision Making, Lecture Notes in Economics and Mathematical Systems. Springer-Verlag, 1981.
[13]
R. Narasimhan. Solving a Nonlinear Optimization Problem using Excel. ICTCM. 1999.
[14]
B. Borchers. CSDP, A C Library for Semidefinite Programming. Optimization Methods and Software 11(1): 613--623, 1999.
[15]
L. S. Lasdon, A. D. Waren, A. Jain, and M. Ratner. Design and Testing of a Generalized Reduced Gradient Code for Nonlinear Programming. ACM Transactions on Matao, H. G. Molina. Deadline Assignment in a Distributed Soft Real-time System. IEEE Transactions on Parallel and Distributed Systems. 8(12): 1268--1274. 1997.
[16]
L. Dozio, P. Mantegazza. Real Time Distributed Control Systems Using RTAI. Sixth IEEE International Symposium on Object-Oriented Real-Time Distributed Computing. 14--16. 2003.
[17]
D. K. Zhu. Reliability-Aware Dynamic Energy Management in Dependable Embedded Real-Time Systems, in Proc. of the IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS'06), pages 397--407, San Jose, Apr. 2006.

Cited By

View all
  • (2024)LEC-MiCs: Low-Energy Checkpointing in Mixed-Criticality Multicore SystemsACM Transactions on Cyber-Physical Systems10.1145/36537209:1(1-29)Online publication date: 26-Mar-2024
  • (2023)IoT Service Runtime Fault Tolerance Mechanism Based on Flink Dynamic CheckpointService Science10.1007/978-981-99-4402-6_7(91-105)Online publication date: 27-Jul-2023
  • (2019)An optimal checkpointing model with online OCI adjustment for stream processing applicationsConcurrency and Computation: Practice and Experience10.1002/cpe.534731:20Online publication date: 10-Jun-2019
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SAC '09: Proceedings of the 2009 ACM symposium on Applied Computing
March 2009
2347 pages
ISBN:9781605581668
DOI:10.1145/1529282
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 08 March 2009

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. checkpoint rollback recovery
  2. optimization
  3. soft real-time systems
  4. system overall quality

Qualifiers

  • Research-article

Funding Sources

Conference

SAC09
Sponsor:
SAC09: The 2009 ACM Symposium on Applied Computing
March 8, 2009 - March 12, 2008
Hawaii, Honolulu

Acceptance Rates

Overall Acceptance Rate 1,650 of 6,669 submissions, 25%

Upcoming Conference

SAC '25
The 40th ACM/SIGAPP Symposium on Applied Computing
March 31 - April 4, 2025
Catania , Italy

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)3
  • Downloads (Last 6 weeks)0
Reflects downloads up to 05 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2024)LEC-MiCs: Low-Energy Checkpointing in Mixed-Criticality Multicore SystemsACM Transactions on Cyber-Physical Systems10.1145/36537209:1(1-29)Online publication date: 26-Mar-2024
  • (2023)IoT Service Runtime Fault Tolerance Mechanism Based on Flink Dynamic CheckpointService Science10.1007/978-981-99-4402-6_7(91-105)Online publication date: 27-Jul-2023
  • (2019)An optimal checkpointing model with online OCI adjustment for stream processing applicationsConcurrency and Computation: Practice and Experience10.1002/cpe.534731:20Online publication date: 10-Jun-2019
  • (2018)An Optimal Checkpointing Model with Online OCI Adjustment for Stream Processing Applications2018 27th International Conference on Computer Communication and Networks (ICCCN)10.1109/ICCCN.2018.8487327(1-9)Online publication date: Jul-2018
  • (2018)Quantification, Trade-off Analysis, and Optimal Checkpoint Placement for Reliability and Availability2018 IEEE 25th International Conference on High Performance Computing (HiPC)10.1109/HiPC.2018.00029(183-192)Online publication date: Dec-2018
  • (2017)Analytic Models of Checkpointing for Concurrent Component-Based Software SystemsProceedings of the 8th ACM/SPEC on International Conference on Performance Engineering10.1145/3030207.3030209(245-256)Online publication date: 17-Apr-2017
  • (2017)Toward an Optimal Online Checkpoint Solution under a Two-Level HPC Checkpoint ModelIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2016.254624828:1(244-259)Online publication date: 1-Jan-2017
  • (2016)Efficient mode changes in multi-mode systems2016 IEEE 34th International Conference on Computer Design (ICCD)10.1109/ICCD.2016.7753345(592-599)Online publication date: Oct-2016
  • (2015)Adaptive Checkpoint Interval Algorithm Considering Task Deadline and Lifetime Reliability for Real-Time SystemProcedia Computer Science10.1016/j.procs.2015.10.12470(821-828)Online publication date: 2015
  • (2013)Cloud Computing Towards Technological ConvergenceCloud Computing Advancements in Design, Implementation, and Technologies10.4018/978-1-4666-1879-4.ch019(263-279)Online publication date: 2013
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media