skip to main content
article

On the completion time distribution for tasks that must restart from the beginning if a failure occurs

Published: 01 December 2006 Publication History

Abstract

For many systems, failure is so common that the design choice of how to deal with it may have a significant impact on the performance of the system. There are many specific and distinct failure recovery schemes, but they can be grouped into three broad classes: RESUME, also referred to as preemptive resume (prs), or check-pointing; REPLACE, also referred to as preemptive repeat different (prd); and RESTART, also referred to as preemptive repeat identical (pri). The following describes the three recovery schemes: (1) RESUME: when a task is fails, it knows exactly where it stops, and can continue from that point when allowed to resume; (2)REPLACE: given a task fails, then when it begins processing again, it starts with a brand new task sampled from the same task time distribution; and, (3) RESTART: When a task fails, it loses all that it had acquired to up to that point and must start anew when upon continuing later. This is distinctly different from (2) since the task must run at least as long as it did before it failed, whereas a new sample, selected at random, might run for a shorter or longer time.

References

[1]
P. Fiorini, R. Sheahan, and L. Lipsky, "On Unreliable Computing Systems When Heavy-Tails Appear as a Result of The Recovery Procedure," ACM Sigmetrics Perf. Eval. Rev., Vol. 33(2), 2005.
[2]
V. Kulkarni, V. Nicola, and K. Trivedi, "The Completion Time of a Job on a Multmode System," Advances in Applied Probability, 19:932--954, 1987.

Cited By

View all
  • (2025)Queues with service resettingEuropean Journal of Operational Research10.1016/j.ejor.2024.12.044Online publication date: Jan-2025
  • (2024)Online Task Scheduling and Termination With Throughput ConstraintIEEE/ACM Transactions on Networking10.1109/TNET.2024.342561732:6(4629-4643)Online publication date: Dec-2024
  • (2024)Completion times of jobs on two-state service processes and their asymptotic behaviorProbability in the Engineering and Informational Sciences10.1017/S0269964824000226(1-29)Online publication date: 23-Dec-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM SIGMETRICS Performance Evaluation Review
ACM SIGMETRICS Performance Evaluation Review  Volume 34, Issue 3
December 2006
62 pages
ISSN:0163-5999
DOI:10.1145/1215956
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 December 2006
Published in SIGMETRICS Volume 34, Issue 3

Check for updates

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)8
  • Downloads (Last 6 weeks)0
Reflects downloads up to 30 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2025)Queues with service resettingEuropean Journal of Operational Research10.1016/j.ejor.2024.12.044Online publication date: Jan-2025
  • (2024)Online Task Scheduling and Termination With Throughput ConstraintIEEE/ACM Transactions on Networking10.1109/TNET.2024.342561732:6(4629-4643)Online publication date: Dec-2024
  • (2024)Completion times of jobs on two-state service processes and their asymptotic behaviorProbability in the Engineering and Informational Sciences10.1017/S0269964824000226(1-29)Online publication date: 23-Dec-2024
  • (2023)Learning to Schedule Tasks with Deadline and Throughput ConstraintsIEEE INFOCOM 2023 - IEEE Conference on Computer Communications10.1109/INFOCOM53939.2023.10228901(1-10)Online publication date: 17-May-2023
  • (2022)Mitigating long queues and waiting times with service resettingPNAS Nexus10.1093/pnasnexus/pgac0701:3Online publication date: 1-Jul-2022
  • (2020)A Fault-Tolerant Approach to Alleviate Failures in Offloading SystemsWireless Personal Communications: An International Journal10.1007/s11277-019-06772-6110:2(1033-1055)Online publication date: 1-Jan-2020
  • (2019)Learning to Control Renewal Processes with Bandit FeedbackProceedings of the ACM on Measurement and Analysis of Computing Systems10.1145/3341617.33261583:2(1-32)Online publication date: 19-Jun-2019
  • (2019)PROFETProceedings of the ACM on Measurement and Analysis of Computing Systems10.1145/3341617.33261493:2(1-33)Online publication date: 19-Jun-2019
  • (2019)HyperBenchProceedings of the ACM on Measurement and Analysis of Computing Systems10.1145/3341617.33261383:2(1-22)Online publication date: 19-Jun-2019
  • (2019)Retracted on December 2, 2020: On the Value of Look-Ahead in Competitive Online Convex OptimizationProceedings of the ACM on Measurement and Analysis of Computing Systems10.1145/3341617.33261363:2(1-42)Online publication date: 19-Jun-2019
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media