research-article

Disk scrubbing versus intra-disk redundancy for high-reliability raid storage systems

Authors:

Evangelos EleftheriouAuthors Info & Claims

SIGMETRICS '08: Proceedings of the 2008 ACM SIGMETRICS international conference on Measurement and modeling of computer systems

Pages 241 - 252

https://doi.org/10.1145/1375457.1375485

Published: 02 June 2008 Publication History

Abstract

Two schemes proposed to cope with unrecoverable or latent media errors and enhance the reliability of RAID systems are examined. The first scheme is the established, widely used disk scrubbing scheme, which operates by periodically accessing disk drives to detect media-related unrecoverable errors. These errors are subsequently corrected by rebuilding the sectors affected. The second scheme is the recently proposed intradisk redundancy scheme which uses a further level of redundancy inside each disk, in addition to the RAID redundancy across multiple disks. Analytic results are obtained assuming Poisson arrivals of random I/O requests. Our results demonstrate that the reliability improvement due to disk scrubbing depends on the scrubbing frequency and the workload of the system, and may not reach the reliability level achieved by a simple IPC-based intra-disk redundancy scheme, which is insensitive to the workload. In fact, the IPC-based intra-disk redundancy scheme achieves essentially the same reliability as that of a system operating without unrecoverable sector errors. For heavy workloads, the reliability achieved by the scrubbing scheme can be orders of magnitude less than that of the intra-disk redundancy scheme.

References

[1]

L. N. Bairavasundaram, G. R. Goodson, S. Pasupathy, and J. Schindler. An analysis of latent sector errors in disk drives. ACM SIGMETRICS Performance Evaluation Review, 35(1):289--300, June 2007 (Proc. ACM SIGMETRICS 2007, San Diego, CA).

Digital Library

[2]

M. Baker, M. Shah, D. S. H. Rosenthal, M. Roussopoulos, P. Maniatis, T. Giuli, and P. Bungale. A fresh look at the reliability of long-term digital storage. In Proceedings of the ACM SIGOPS/EuroSys European Conference on Computer Systems (EuroSys 2006) (Leuven, Belgium), pages 221--234, Apr. 2006.

Digital Library

[3]

M. Blaum, J. Brady, J. Bruck, and J. Mennon. EVENODD: An efficient scheme for tolerating double disk failures in RAID architectures. IEEE Trans. Comput., 44(2):192--202, Feb. 1995.

Digital Library

[4]

P. M. Chen, E. Lee, G. Gibson, R. Katz, and D. Patterson. RAID: High-performance, reliable secondary storage. ACM Computing Surveys, 26(2):145--185, June 1994.

Digital Library

[5]

P. Corbett, R. English, A. Goel, T. Grcanac, S. Kleiman, J. Leong, and S. Sankar. Row-diagonal parity for double disk failure correction. In Proceedings of the 3rd USENIX Conference on File and Storage Technologies (FAST) (San Francisco, CA), pages 1--14, Mar.-Apr. 2004.

Digital Library

[6]

A. Dholakia, E. Eleftheriou, X.-Y. Hu, I. Iliadis, J. Menon, and K. Rao. Analysis of a new intra-disk redundancy scheme for high-reliability RAID storage systems in the presence of unrecoverable errors. ACM SIGMETRICS Performance Evaluation Review, 34(1):373--374, June 2006 (Proc. ACM SIGMETRICS 2006/Performance 2006, Saint Malo, France).

Digital Library

[7]

A. Dholakia, E. Eleftheriou, X.-Y. Hu, I. Iliadis, J. Menon, and K. Rao. A new intra-disk redundancy scheme for high-reliability RAID storage systems in the presence of unrecoverable errors. ACM Trans. Storage, 4(1), 2008.

Digital Library

[8]

J. G. Elerath and M. Pecht. Enhanced reliability modeling of raid storage systems. In Proceedings of the 37th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN) (Edinburgh, UK), pages 175--184, June 2007.

Digital Library

[9]

Hitachi Global Storage Technologies, Hitachi Disk Drive Product Datasheets. http://www.hitachigst.com/. 2007.

[10]

HP Labs, Private Software. http://tesla.hpl.hp.com/private_software/. 2006.

[11]

LeCroy, Data Storage Solutions, DDNA. http://www.lecroy.com/tm/solutions/datastorage/DDNA/. 2007.

[12]

D. A. Patterson, G. Gibson, and R. H. Katz. A case for redundant arrays of inexpensive disks (RAID). In Proceedings of the ACM SIGMOD International Conference on Management of Data (Chicago, IL), pages 109--116, June 1988.

Digital Library

[13]

E. Pinheiro, W.-D. Weber, and L. A. Barroso. Failure trends in a large disk drive population. In Proceedings of the 5th USENIX Conference on File and Storage Technologies (FAST) (San Jose, CA), pages 17--28, Feb. 2007.

Digital Library

[14]

A. Riska and E. Riedel. Disk drive level workload characterization. In Proceedings of the USENIX Annual Technical Conference (Boston, MA), pages 97--102, June 2003.

Digital Library

[15]

C. Ruemmler and J. Wilkes. An introduction to disk drive modeling. IEEE Computer, 27(3):17--28, Mar. 1994.

Digital Library

[16]

D. C. Sawyer. Dependability analysis of parallel systems using a simulation-based approach. NASA-CR-195762, Feb. 1994.

[17]

B. Schroeder and G. A. Gibson. Disk failures in the real world: What does an MTTF of 1,000,000 hours mean to you? In Proceedings of the 5th USENIX Conference on File and Storage Technologies (FAST) (San Jose, CA), pages 1--16, Feb. 2007.

Digital Library

[18]

T. J. E. Schwarz, Q. Xin, E. L. Miller, D. D. E. Long, A. Hospodor, and S. Ng. Disk scrubbing in large archival storage systems. In Proceedings of the 12th Annual International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunications Systems (MASCOTS) (Volendam, The Netherlands), pages 409--418, Oct. 2004.

Digital Library

[19]

S. Shah and J. G. Elerath. Reliability analysis of disk drive failure mechanisms. In Proceedings of the 51th IEEE Annual Reliability and Maintainability Symposium (RAMS) (Washington, DC), pages 226--231, Jan. 2005.

[20]

The DiskSim Simulation Environment (Version 3.0) http://www.pdl.cmu.edu/DiskSim/. 2007.

Cited By

Qiao Z(2020)Building Reliable and Cost-Effective Storage Systems for High-Performance Computing Datacentersundefined10.12794/metadc1707348Online publication date: Aug-2020
https://doi.org/10.12794/metadc1707348
Kim BChoi JMin SMerchant AWeatherspoon H(2019)Design tradeoffs for SSD reliabilityProceedings of the 17th USENIX Conference on File and Storage Technologies10.5555/3323298.3323325(281-294)Online publication date: 25-Feb-2019
https://dl.acm.org/doi/10.5555/3323298.3323325
Jiang THuang PZhou K(2019)Scrub Unleveling: Achieving High Data Reliability at Low Scrubbing Cost2019 Design, Automation & Test in Europe Conference & Exhibition (DATE)10.23919/DATE.2019.8715169(1403-1408)Online publication date: Mar-2019
https://doi.org/10.23919/DATE.2019.8715169
Show More Cited By

Index Terms

Disk scrubbing versus intra-disk redundancy for high-reliability raid storage systems

Recommendations

Reliability Evaluation of Erasure-coded Storage Systems with Latent Errors
Large-scale storage systems employ erasure-coding redundancy schemes to protect against device failures. The adverse effect of latent sector errors on the Mean Time to Data Loss (MTTDL) and the Expected Annual Fraction of Data Loss (EAFDL) reliability ...
Disk Scrubbing Versus Intradisk Redundancy for RAID Storage Systems

Two schemes proposed to cope with unrecoverable or latent media errors and enhance the reliability of RAID systems are examined. The first scheme is the established, widely used, disk scrubbing scheme, which operates by periodically accessing disk ...
Disk scrubbing versus intra-disk redundancy for high-reliability raid storage systems
SIGMETRICS '08

Two schemes proposed to cope with unrecoverable or latent media errors and enhance the reliability of RAID systems are examined. The first scheme is the established, widely used disk scrubbing scheme, which operates by periodically accessing disk drives ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

SIGMETRICS '08: Proceedings of the 2008 ACM SIGMETRICS international conference on Measurement and modeling of computer systems

June 2008

486 pages

ISBN:9781605580050

DOI:10.1145/1375457

General Chair:
Zhen Liu
IBM T. J. Watson Research Center, USA
,
Program Chairs:
Vishal Misra
Columbia University, USA
,
Prashant Shenoy
University of Massachusetts, USA

ACM SIGMETRICS Performance Evaluation Review Volume 36, Issue 1
SIGMETRICS '08
June 2008
469 pages
ISSN:0163-5999
DOI:10.1145/1384529
Issue’s Table of Contents

Copyright © 2008 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 02 June 2008

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

SIGMETRICS08

Sponsor:

SIGMETRICS08: ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems

June 2 - 6, 2008

MD, Annapolis, USA

Acceptance Rates

Overall Acceptance Rate 459 of 2,691 submissions, 17%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

49
Total Citations
View Citations
582
Total Downloads

Downloads (Last 12 months)4
Downloads (Last 6 weeks)0

Reflects downloads up to 20 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Qiao Z(2020)Building Reliable and Cost-Effective Storage Systems for High-Performance Computing Datacentersundefined10.12794/metadc1707348Online publication date: Aug-2020
https://doi.org/10.12794/metadc1707348
Kim BChoi JMin SMerchant AWeatherspoon H(2019)Design tradeoffs for SSD reliabilityProceedings of the 17th USENIX Conference on File and Storage Technologies10.5555/3323298.3323325(281-294)Online publication date: 25-Feb-2019
https://dl.acm.org/doi/10.5555/3323298.3323325
Jiang THuang PZhou K(2019)Scrub Unleveling: Achieving High Data Reliability at Low Scrubbing Cost2019 Design, Automation & Test in Europe Conference & Exhibition (DATE)10.23919/DATE.2019.8715169(1403-1408)Online publication date: Mar-2019
https://doi.org/10.23919/DATE.2019.8715169
Wu SMao BJiang HLuan HZhou J(2019)PFP: Improving the Reliability of Deduplication-based Storage Systems with Per-File ParityIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2019.2898942(1-1)Online publication date: 2019
https://doi.org/10.1109/TPDS.2019.2898942
Schwarz TBreitgand DYadgar GPorter DEyal I(2018)Protecting Single Shingled Write Drives Against Latent Sector FailuresProceedings of the 11th ACM International Systems and Storage Conference10.1145/3211890.3211893(26-36)Online publication date: 4-Jun-2018
https://dl.acm.org/doi/10.1145/3211890.3211893
Wu SLuan HMao BJiang HNiu GRao HYu FZhou J(2018)Improving Reliability of Deduplication-Based Storage Systems with Per-File Parity2018 IEEE 37th Symposium on Reliable Distributed Systems (SRDS)10.1109/SRDS.2018.00028(171-180)Online publication date: Oct-2018
https://doi.org/10.1109/SRDS.2018.00028
Xing LTannous MVokkarane VWang HGuo J(2017)Reliability Modeling of Mesh Storage Area Networks for Internet of ThingsIEEE Internet of Things Journal10.1109/JIOT.2017.27493754:6(2047-2057)Online publication date: Dec-2017
https://doi.org/10.1109/JIOT.2017.2749375
Grawinkel MNagel LBrinkmann A(2016)LoneStar RAIDACM Transactions on Storage10.1145/284081012:1(1-29)Online publication date: 7-Jan-2016
https://dl.acm.org/doi/10.1145/2840810
Yan FSmirni E(2016)Workload interleaving with performance guarantees in data centersNOMS 2016 - 2016 IEEE/IFIP Network Operations and Management Symposium10.1109/NOMS.2016.7502934(967-972)Online publication date: Apr-2016
https://doi.org/10.1109/NOMS.2016.7502934
Wu SJiang HMao B(2015)Proactive Data Migration for Improved Storage Availability in Large-Scale Data CentersIEEE Transactions on Computers10.1109/TC.2014.236673464:9(2637-2651)Online publication date: 1-Sep-2015
https://doi.org/10.1109/TC.2014.2366734
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten