research-article

Reliability analysis of deduplicated and erasure-coded storage

Authors:
Xiaozhou Li

Hewlett-Packard Laboratories, Palo Alto, CA

Hewlett-Packard Laboratories, Palo Alto, CA
View Profile

,
Mark Lillibridge

Hewlett-Packard Laboratories, Palo Alto, CA

Hewlett-Packard Laboratories, Palo Alto, CA
View Profile

,
Mustafa Uysal

VMware, Palo Alto, CA

VMware, Palo Alto, CA
View Profile

ACM SIGMETRICS Performance Evaluation Review Volume 38 Issue 3December 2010pp 4–9https://doi.org/10.1145/1925019.1925021

Published:03 January 2011Publication History

ACM SIGMETRICS Performance Evaluation Review

Abstract

Space efficiency and data reliability are two primary concerns for modern storage systems. Chunk-based deduplication, which breaks up data objects into single-instance chunks that can be shared across objects, is an effective method for saving storage space. However, deduplication affects data reliability because an object's constituent chunks are often spread across a large number of disks, potentially decreasing the object's reliability. Therefore, an important problem in deduplicated storage is how to achieve space efficiency yet maintain each object's original reliability. In this paper, we present initial results on the reliability analysis of HP-KVS, a deduplicated key-value store that allows each object to specify its own reliability level and that uses software erasure coding for data reliability. The combination of deduplication and erasure coding gives rise to several interesting research problems. We show how to compare the reliability of erasure codes with different parameters and how to analyze the reliability of a big data object given its constituent parts' reliabilities. We also present a method for system designers to determine under what conditions deduplication will save space for erasure-coded data.

References

E. Anderson et al. Efficient eventual consistency in Pahoehoe, an erasure-coded key-blob archive. In Proceedings of the 40th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN 2010), pages 181--190, June 2010.Google ScholarCross Ref
L. N. Bairavasundaram et al. An analysis of latent sector errors in disk drives. In Proceedings of the 2007 SIGMETRICS Conference on Measurement and Modeling of Computer Systems, pages 289--300, June 2007. Google ScholarDigital Library
D. Bhagwat et al. Providing high reliability in a minimum redundancy archival storage system. In Proceedings of the 14th International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems(MASCOTS '06), pages 413--421, September 2006. Google ScholarDigital Library
C. Dubnicki et al. HYDRAstor: A scalable secondary storage. In Proceedings of the Eighth USENIX Conference on File and Storage Technologies (FAST), pages 197--210, February 2009. Google ScholarDigital Library
K. Greenan, J. S. Plank, and J. J. Wylie. Mean time to meaningless: MTTDL, markov models, and storage system reliability. In Proceedings of the Second Workshop on Hot Topics in Storage and File Systems, June 2010. Google ScholarDigital Library
M. Lillibridge et al. Sparse indexing: Large scale, inline deduplication using sampling and locality. In Proceedings ofthe Eighth USENIX Conference on File and Storage Technologies (FAST), pages 111--123, February 2009. Google ScholarDigital Library
C. Liu et al. R-ADMAD: High reliability provision for large-scale de-duplication archival storage systems. In Proceedings of the 23rd international conference on Supercomputing, pages 370--379, June 2009. Google ScholarDigital Library
F. J. MacWilliams and N. J. A. Sloane. The Theory of Error-Correcting Codes. North Holland, Amsterdam, 1978.Google Scholar
D. Patterson, G. Gibson, and R. H. Katz. A case for redundant arrays of inexpensive disks (RAID). In Proceedings of the 1988 ACM SIGMOD International Conference on Management of Data, pages 109--116, June 1988. Google ScholarDigital Library
B. Schroeder and G. A. Gibson. Understanding disk failure rates: What does an MTTF of 1,000,000 hours mean to you? ACM Transactions on Storage, 3(3):Article 8, October 2007. Google ScholarDigital Library
A. Thomasian and M. Blaum. Mirrored disk organization reliability analysis. IEEE Transactions on Computers, 55(12):1640--1644, December 2006. Google ScholarDigital Library
B. Zhu, K. Li, and H. Patterson. Avoiding the disk bottleneck in the Data Domain deduplication file system. In Proceedings of the Seventh USENIX Conference on File and Storage Technologies (FAST), pages 269--282, February 2008. Google ScholarDigital Library

Index Terms

Reliability analysis of deduplicated and erasure-coded storage

Recommendations

Erasure coding in windows azure storage
USENIX ATC'12: Proceedings of the 2012 USENIX conference on Annual Technical Conference

Windows Azure Storage (WAS) is a cloud storage system that provides customers the ability to store seemingly limitless amounts of data for any duration of time. WAS customers have access to their data from anywhere, at any time, and only pay for what ...
Read More
A Layered Architecture for Erasure-Coded Consistent Distributed Storage
PODC '17: Proceedings of the ACM Symposium on Principles of Distributed Computing

Motivated by emerging applications to the edge computing paradigm, we introduce a two-layer erasure-coded fault-tolerant distributed storage system offering atomic access for read and write operations. In edge computing, clients interact with an edge-...
Read More
Data Delta Based Hybrid Writes for Erasure-Coded Storage Systems
Network and Parallel Computing
Abstract
Erasure coding is widely used in storage systems since it can offer higher reliability at lower redundancy than data replication. However, erasure-coded storage systems have to perform a partial write to an entire erasure coding group for a small ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in

ACM SIGMETRICS Performance Evaluation Review Volume 38, Issue 3
December 2010
84 pages
ISSN:0163-5999
DOI:10.1145/1925019
Issue’s Table of Contents

Copyright © 2011 Authors
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 3 January 2011
Check for updates
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 28
  Total Citations
  View Citations
- 478
  Total Downloads
- Downloads (Last 12 months)14
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Reliability analysis of deduplicated and erasure-coded storage

ACM SIGMETRICS Performance Evaluation Review

Abstract

References

Cited By

Index Terms

Recommendations

Erasure coding in windows azure storage

A Layered Architecture for Erasure-Coded Consistent Distributed Storage

Data Delta Based Hybrid Writes for Erasure-Coded Storage Systems

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Check for updates

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Reliability analysis of deduplicated and erasure-coded storage

ACM SIGMETRICS Performance Evaluation Review

Abstract

References

Cited By

Index Terms

Recommendations

Erasure coding in windows azure storage

A Layered Architecture for Erasure-Coded Consistent Distributed Storage

Data Delta Based Hybrid Writes for Erasure-Coded Storage Systems

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Check for updates

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media