research-article

A new intra-disk redundancy scheme for high-reliability RAID storage systems in the presence of unrecoverable errors

Authors:

Ajay Dholakia,

Evangelos Eleftheriou,

Xiao-Yu Hu,

Ilias Iliadis,

Jai Menon,

K.K. RaoAuthors Info & Claims

ACM Transactions on Storage (TOS), Volume 4, Issue 1

Article No.: 1, Pages 1 - 42

https://doi.org/10.1145/1353452.1353453

Published: 28 May 2008 Publication History

Get Access

Abstract

Today's data storage systems are increasingly adopting low-cost disk drives that have higher capacity but lower reliability, leading to more frequent rebuilds and to a higher risk of unrecoverable media errors. We propose an efficient intradisk redundancy scheme to enhance the reliability of RAID systems. This scheme introduces an additional level of redundancy inside each disk, on top of the RAID redundancy across multiple disks. The RAID parity provides protection against disk failures, whereas the proposed scheme aims to protect against media-related unrecoverable errors. In particular, we consider an intradisk redundancy architecture that is based on an interleaved parity-check coding scheme, which incurs only negligible I/O performance degradation. A comparison between this coding scheme and schemes based on traditional Reed--Solomon codes and single-parity-check codes is conducted by analytical means. A new model is developed to capture the effect of correlated unrecoverable sector errors. The probability of an unrecoverable failure associated with these schemes is derived for the new correlated model, as well as for the simpler independent error model. We also derive closed-form expressions for the mean time to data loss of RAID-5 and RAID-6 systems in the presence of unrecoverable errors and disk failures. We then combine these results to characterize the reliability of RAID systems that incorporate the intradisk redundancy scheme. Our results show that in the practical case of correlated errors, the interleaved parity-check scheme provides the same reliability as the optimum, albeit more complex, Reed--Solomon coding scheme. Finally, the I/O and throughput performances are evaluated by means of analysis and event-driven simulation.

References

[1]

Blaum, M., Brady, J., Bruck, J., and Mennon, J. 1995. EVENODD: An efficient scheme for tolerating double disk failures in RAID architectures. IEEE Trans. Comput. 44, 2 (Feb.), 192--202.

Abstract

References

Cited By

Index Terms

Recommendations

Reliability Evaluation of Erasure-coded Storage Systems with Latent Errors

Higher reliability redundant disk arrays: Organization, operation, and coding

Analysis of a new intra-disk redundancy scheme for high-reliability RAID storage systems in the presence of unrecoverable errors

Comments

Information

Published In

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

Login options

Full Access

View options

PDF

eReader

Share

Share this Publication link

Share on social media

Affiliations