ACM Home Page
Please provide us with feedback. Feedback
On unreliable computing systems when heavy-tails appear as a result of the recovery procedure
Full text PdfPdf (263 KB)
Source ACM SIGMETRICS Performance Evaluation Review archive
Volume 33 ,  Issue 2  (September 2005) table of contents
Special issue on the workshop on MAthematical performance Modeling And Analysis (MAMA 2005)
Pages: 15 - 17  
Year of Publication: 2005
ISSN:0163-5999
Authors
Pierre M. Fiorini  University of Southern Maine, Portland, ME
Robert Sheahan  University of Connecticut, Storrs, CT
Lester Lipsky  University of Connecticut, Storrs, CT
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 1,   Downloads (12 Months): 7,   Citation Count: 3
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
Save this Article to a Binder    Display Formats: BibTex  EndNote ACM Ref   
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1101892.1101898
What is a DOI?

ABSTRACT

For some computing systems, failure is rare enough that it can be ignored. In other systems, failure is so common that how to handle it can have a significant impact on the performance of the system. There are many different recovery schemes for tasks, however, they can be classified into three broad categories: 1) Resume: when a task fails, it knows exactly where it stops and can continue at that point when allowed to resume (i.e., preemptive resume - prs); 2) Replace: when a task fails, then later when the processor continues, it begins with a brand new task (i.e., preemptive repeat different prd); and, 3) Restart: when a task fails it loses all work done to that point and must start anew upon continuing later (i.e., preemptive repeat identical - pri).In this paper, assuming a computing system is unreliable, we discuss how heavy-tail (hereafter referred to as power-tail - PT) distributions can appear in a job's task stream given the Restart recovery procedure. This is an important consideration since it is known that power-tails can lead to unstable systems [4], We then demonstrate how to obtain performance and dependablity measures for a class of computing systems comprised of P unreliable processors and a finite number of tasks N given the above recovery procedures.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
A. Bobbio and K. Trivedi, "Computation of the Distribution of the Completion Time When the Work Requirement is a PH Random Variable", Communications in Statistics - Stochastic Models, 1990.
 
2
 
3
V. Kulkarni, V. Nicola, and K. Trivedi, "The Completion Time of a Job on a Multmode System," Advances in Applied Probability, 19:932--954, 1987.
 
4
L. Lipsky, Queueing Theory: A Linear Algebraic Approach, MacMillan and Company, New York, 1992.


Collaborative Colleagues:
Pierre M. Fiorini: colleagues
Robert Sheahan: colleagues
Lester Lipsky: colleagues