skip to main content
research-article

Reliability analysis of deduplicated and erasure-coded storage

Published:03 January 2011Publication History
Skip Abstract Section

Abstract

Space efficiency and data reliability are two primary concerns for modern storage systems. Chunk-based deduplication, which breaks up data objects into single-instance chunks that can be shared across objects, is an effective method for saving storage space. However, deduplication affects data reliability because an object's constituent chunks are often spread across a large number of disks, potentially decreasing the object's reliability. Therefore, an important problem in deduplicated storage is how to achieve space efficiency yet maintain each object's original reliability. In this paper, we present initial results on the reliability analysis of HP-KVS, a deduplicated key-value store that allows each object to specify its own reliability level and that uses software erasure coding for data reliability. The combination of deduplication and erasure coding gives rise to several interesting research problems. We show how to compare the reliability of erasure codes with different parameters and how to analyze the reliability of a big data object given its constituent parts' reliabilities. We also present a method for system designers to determine under what conditions deduplication will save space for erasure-coded data.

References

  1. E. Anderson et al. Efficient eventual consistency in Pahoehoe, an erasure-coded key-blob archive. In Proceedings of the 40th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN 2010), pages 181--190, June 2010.Google ScholarGoogle ScholarCross RefCross Ref
  2. L. N. Bairavasundaram et al. An analysis of latent sector errors in disk drives. In Proceedings of the 2007 SIGMETRICS Conference on Measurement and Modeling of Computer Systems, pages 289--300, June 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. D. Bhagwat et al. Providing high reliability in a minimum redundancy archival storage system. In Proceedings of the 14th International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems(MASCOTS '06), pages 413--421, September 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. C. Dubnicki et al. HYDRAstor: A scalable secondary storage. In Proceedings of the Eighth USENIX Conference on File and Storage Technologies (FAST), pages 197--210, February 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. K. Greenan, J. S. Plank, and J. J. Wylie. Mean time to meaningless: MTTDL, markov models, and storage system reliability. In Proceedings of the Second Workshop on Hot Topics in Storage and File Systems, June 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. M. Lillibridge et al. Sparse indexing: Large scale, inline deduplication using sampling and locality. In Proceedings ofthe Eighth USENIX Conference on File and Storage Technologies (FAST), pages 111--123, February 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. C. Liu et al. R-ADMAD: High reliability provision for large-scale de-duplication archival storage systems. In Proceedings of the 23rd international conference on Supercomputing, pages 370--379, June 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. F. J. MacWilliams and N. J. A. Sloane. The Theory of Error-Correcting Codes. North Holland, Amsterdam, 1978.Google ScholarGoogle Scholar
  9. D. Patterson, G. Gibson, and R. H. Katz. A case for redundant arrays of inexpensive disks (RAID). In Proceedings of the 1988 ACM SIGMOD International Conference on Management of Data, pages 109--116, June 1988. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. B. Schroeder and G. A. Gibson. Understanding disk failure rates: What does an MTTF of 1,000,000 hours mean to you? ACM Transactions on Storage, 3(3):Article 8, October 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. A. Thomasian and M. Blaum. Mirrored disk organization reliability analysis. IEEE Transactions on Computers, 55(12):1640--1644, December 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. B. Zhu, K. Li, and H. Patterson. Avoiding the disk bottleneck in the Data Domain deduplication file system. In Proceedings of the Seventh USENIX Conference on File and Storage Technologies (FAST), pages 269--282, February 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Reliability analysis of deduplicated and erasure-coded storage

              Recommendations

              Comments

              Login options

              Check if you have access through your login credentials or your institution to get full access on this article.

              Sign in

              Full Access

              • Published in

                cover image ACM SIGMETRICS Performance Evaluation Review
                ACM SIGMETRICS Performance Evaluation Review  Volume 38, Issue 3
                December 2010
                84 pages
                ISSN:0163-5999
                DOI:10.1145/1925019
                Issue’s Table of Contents

                Copyright © 2011 Authors

                Publisher

                Association for Computing Machinery

                New York, NY, United States

                Publication History

                • Published: 3 January 2011

                Check for updates

                Qualifiers

                • research-article

              PDF Format

              View or Download as a PDF file.

              PDF

              eReader

              View online with eReader.

              eReader