skip to main content
10.1145/1375457.1375485acmconferencesArticle/Chapter ViewAbstractPublication PagesmetricsConference Proceedingsconference-collections
research-article

Disk scrubbing versus intra-disk redundancy for high-reliability raid storage systems

Published: 02 June 2008 Publication History

Abstract

Two schemes proposed to cope with unrecoverable or latent media errors and enhance the reliability of RAID systems are examined. The first scheme is the established, widely used disk scrubbing scheme, which operates by periodically accessing disk drives to detect media-related unrecoverable errors. These errors are subsequently corrected by rebuilding the sectors affected. The second scheme is the recently proposed intradisk redundancy scheme which uses a further level of redundancy inside each disk, in addition to the RAID redundancy across multiple disks. Analytic results are obtained assuming Poisson arrivals of random I/O requests. Our results demonstrate that the reliability improvement due to disk scrubbing depends on the scrubbing frequency and the workload of the system, and may not reach the reliability level achieved by a simple IPC-based intra-disk redundancy scheme, which is insensitive to the workload. In fact, the IPC-based intra-disk redundancy scheme achieves essentially the same reliability as that of a system operating without unrecoverable sector errors. For heavy workloads, the reliability achieved by the scrubbing scheme can be orders of magnitude less than that of the intra-disk redundancy scheme.

References

[1]
L. N. Bairavasundaram, G. R. Goodson, S. Pasupathy, and J. Schindler. An analysis of latent sector errors in disk drives. ACM SIGMETRICS Performance Evaluation Review, 35(1):289--300, June 2007 (Proc. ACM SIGMETRICS 2007, San Diego, CA).
[2]
M. Baker, M. Shah, D. S. H. Rosenthal, M. Roussopoulos, P. Maniatis, T. Giuli, and P. Bungale. A fresh look at the reliability of long-term digital storage. In Proceedings of the ACM SIGOPS/EuroSys European Conference on Computer Systems (EuroSys 2006) (Leuven, Belgium), pages 221--234, Apr. 2006.
[3]
M. Blaum, J. Brady, J. Bruck, and J. Mennon. EVENODD: An efficient scheme for tolerating double disk failures in RAID architectures. IEEE Trans. Comput., 44(2):192--202, Feb. 1995.
[4]
P. M. Chen, E. Lee, G. Gibson, R. Katz, and D. Patterson. RAID: High-performance, reliable secondary storage. ACM Computing Surveys, 26(2):145--185, June 1994.
[5]
P. Corbett, R. English, A. Goel, T. Grcanac, S. Kleiman, J. Leong, and S. Sankar. Row-diagonal parity for double disk failure correction. In Proceedings of the 3rd USENIX Conference on File and Storage Technologies (FAST) (San Francisco, CA), pages 1--14, Mar.-Apr. 2004.
[6]
A. Dholakia, E. Eleftheriou, X.-Y. Hu, I. Iliadis, J. Menon, and K. Rao. Analysis of a new intra-disk redundancy scheme for high-reliability RAID storage systems in the presence of unrecoverable errors. ACM SIGMETRICS Performance Evaluation Review, 34(1):373--374, June 2006 (Proc. ACM SIGMETRICS 2006/Performance 2006, Saint Malo, France).
[7]
A. Dholakia, E. Eleftheriou, X.-Y. Hu, I. Iliadis, J. Menon, and K. Rao. A new intra-disk redundancy scheme for high-reliability RAID storage systems in the presence of unrecoverable errors. ACM Trans. Storage, 4(1), 2008.
[8]
J. G. Elerath and M. Pecht. Enhanced reliability modeling of raid storage systems. In Proceedings of the 37th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN) (Edinburgh, UK), pages 175--184, June 2007.
[9]
Hitachi Global Storage Technologies, Hitachi Disk Drive Product Datasheets. http://www.hitachigst.com/. 2007.
[10]
HP Labs, Private Software. http://tesla.hpl.hp.com/private_software/. 2006.
[11]
LeCroy, Data Storage Solutions, DDNA. http://www.lecroy.com/tm/solutions/datastorage/DDNA/. 2007.
[12]
D. A. Patterson, G. Gibson, and R. H. Katz. A case for redundant arrays of inexpensive disks (RAID). In Proceedings of the ACM SIGMOD International Conference on Management of Data (Chicago, IL), pages 109--116, June 1988.
[13]
E. Pinheiro, W.-D. Weber, and L. A. Barroso. Failure trends in a large disk drive population. In Proceedings of the 5th USENIX Conference on File and Storage Technologies (FAST) (San Jose, CA), pages 17--28, Feb. 2007.
[14]
A. Riska and E. Riedel. Disk drive level workload characterization. In Proceedings of the USENIX Annual Technical Conference (Boston, MA), pages 97--102, June 2003.
[15]
C. Ruemmler and J. Wilkes. An introduction to disk drive modeling. IEEE Computer, 27(3):17--28, Mar. 1994.
[16]
D. C. Sawyer. Dependability analysis of parallel systems using a simulation-based approach. NASA-CR-195762, Feb. 1994.
[17]
B. Schroeder and G. A. Gibson. Disk failures in the real world: What does an MTTF of 1,000,000 hours mean to you? In Proceedings of the 5th USENIX Conference on File and Storage Technologies (FAST) (San Jose, CA), pages 1--16, Feb. 2007.
[18]
T. J. E. Schwarz, Q. Xin, E. L. Miller, D. D. E. Long, A. Hospodor, and S. Ng. Disk scrubbing in large archival storage systems. In Proceedings of the 12th Annual International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunications Systems (MASCOTS) (Volendam, The Netherlands), pages 409--418, Oct. 2004.
[19]
S. Shah and J. G. Elerath. Reliability analysis of disk drive failure mechanisms. In Proceedings of the 51th IEEE Annual Reliability and Maintainability Symposium (RAMS) (Washington, DC), pages 226--231, Jan. 2005.
[20]
The DiskSim Simulation Environment (Version 3.0) http://www.pdl.cmu.edu/DiskSim/. 2007.

Cited By

View all
  • (2020)Building Reliable and Cost-Effective Storage Systems for High-Performance Computing Datacentersundefined10.12794/metadc1707348Online publication date: Aug-2020
  • (2019)Design tradeoffs for SSD reliabilityProceedings of the 17th USENIX Conference on File and Storage Technologies10.5555/3323298.3323325(281-294)Online publication date: 25-Feb-2019
  • (2019)Scrub Unleveling: Achieving High Data Reliability at Low Scrubbing Cost2019 Design, Automation & Test in Europe Conference & Exhibition (DATE)10.23919/DATE.2019.8715169(1403-1408)Online publication date: Mar-2019
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGMETRICS '08: Proceedings of the 2008 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
June 2008
486 pages
ISBN:9781605580050
DOI:10.1145/1375457
  • cover image ACM SIGMETRICS Performance Evaluation Review
    ACM SIGMETRICS Performance Evaluation Review  Volume 36, Issue 1
    SIGMETRICS '08
    June 2008
    469 pages
    ISSN:0163-5999
    DOI:10.1145/1384529
    Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 02 June 2008

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. MTTDL
  2. RAID
  3. reliability analysis
  4. stochastic modeling
  5. unrecoverable or latent sector errors

Qualifiers

  • Research-article

Conference

SIGMETRICS08

Acceptance Rates

Overall Acceptance Rate 459 of 2,691 submissions, 17%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)4
  • Downloads (Last 6 weeks)0
Reflects downloads up to 20 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2020)Building Reliable and Cost-Effective Storage Systems for High-Performance Computing Datacentersundefined10.12794/metadc1707348Online publication date: Aug-2020
  • (2019)Design tradeoffs for SSD reliabilityProceedings of the 17th USENIX Conference on File and Storage Technologies10.5555/3323298.3323325(281-294)Online publication date: 25-Feb-2019
  • (2019)Scrub Unleveling: Achieving High Data Reliability at Low Scrubbing Cost2019 Design, Automation & Test in Europe Conference & Exhibition (DATE)10.23919/DATE.2019.8715169(1403-1408)Online publication date: Mar-2019
  • (2019)PFP: Improving the Reliability of Deduplication-based Storage Systems with Per-File ParityIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2019.2898942(1-1)Online publication date: 2019
  • (2018)Protecting Single Shingled Write Drives Against Latent Sector FailuresProceedings of the 11th ACM International Systems and Storage Conference10.1145/3211890.3211893(26-36)Online publication date: 4-Jun-2018
  • (2018)Improving Reliability of Deduplication-Based Storage Systems with Per-File Parity2018 IEEE 37th Symposium on Reliable Distributed Systems (SRDS)10.1109/SRDS.2018.00028(171-180)Online publication date: Oct-2018
  • (2017)Reliability Modeling of Mesh Storage Area Networks for Internet of ThingsIEEE Internet of Things Journal10.1109/JIOT.2017.27493754:6(2047-2057)Online publication date: Dec-2017
  • (2016)LoneStar RAIDACM Transactions on Storage10.1145/284081012:1(1-29)Online publication date: 7-Jan-2016
  • (2016)Workload interleaving with performance guarantees in data centersNOMS 2016 - 2016 IEEE/IFIP Network Operations and Management Symposium10.1109/NOMS.2016.7502934(967-972)Online publication date: Apr-2016
  • (2015)Proactive Data Migration for Improved Storage Availability in Large-Scale Data CentersIEEE Transactions on Computers10.1109/TC.2014.236673464:9(2637-2651)Online publication date: 1-Sep-2015
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media