Abstract
Space efficiency and data reliability are two primary concerns for modern storage systems. Chunk-based deduplication, which breaks up data objects into single-instance chunks that can be shared across objects, is an effective method for saving storage space. However, deduplication affects data reliability because an object's constituent chunks are often spread across a large number of disks, potentially decreasing the object's reliability. Therefore, an important problem in deduplicated storage is how to achieve space efficiency yet maintain each object's original reliability. In this paper, we present initial results on the reliability analysis of HP-KVS, a deduplicated key-value store that allows each object to specify its own reliability level and that uses software erasure coding for data reliability. The combination of deduplication and erasure coding gives rise to several interesting research problems. We show how to compare the reliability of erasure codes with different parameters and how to analyze the reliability of a big data object given its constituent parts' reliabilities. We also present a method for system designers to determine under what conditions deduplication will save space for erasure-coded data.
- E. Anderson et al. Efficient eventual consistency in Pahoehoe, an erasure-coded key-blob archive. In Proceedings of the 40th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN 2010), pages 181--190, June 2010.Google ScholarCross Ref
- L. N. Bairavasundaram et al. An analysis of latent sector errors in disk drives. In Proceedings of the 2007 SIGMETRICS Conference on Measurement and Modeling of Computer Systems, pages 289--300, June 2007. Google ScholarDigital Library
- D. Bhagwat et al. Providing high reliability in a minimum redundancy archival storage system. In Proceedings of the 14th International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems(MASCOTS '06), pages 413--421, September 2006. Google ScholarDigital Library
- C. Dubnicki et al. HYDRAstor: A scalable secondary storage. In Proceedings of the Eighth USENIX Conference on File and Storage Technologies (FAST), pages 197--210, February 2009. Google ScholarDigital Library
- K. Greenan, J. S. Plank, and J. J. Wylie. Mean time to meaningless: MTTDL, markov models, and storage system reliability. In Proceedings of the Second Workshop on Hot Topics in Storage and File Systems, June 2010. Google ScholarDigital Library
- M. Lillibridge et al. Sparse indexing: Large scale, inline deduplication using sampling and locality. In Proceedings ofthe Eighth USENIX Conference on File and Storage Technologies (FAST), pages 111--123, February 2009. Google ScholarDigital Library
- C. Liu et al. R-ADMAD: High reliability provision for large-scale de-duplication archival storage systems. In Proceedings of the 23rd international conference on Supercomputing, pages 370--379, June 2009. Google ScholarDigital Library
- F. J. MacWilliams and N. J. A. Sloane. The Theory of Error-Correcting Codes. North Holland, Amsterdam, 1978.Google Scholar
- D. Patterson, G. Gibson, and R. H. Katz. A case for redundant arrays of inexpensive disks (RAID). In Proceedings of the 1988 ACM SIGMOD International Conference on Management of Data, pages 109--116, June 1988. Google ScholarDigital Library
- B. Schroeder and G. A. Gibson. Understanding disk failure rates: What does an MTTF of 1,000,000 hours mean to you? ACM Transactions on Storage, 3(3):Article 8, October 2007. Google ScholarDigital Library
- A. Thomasian and M. Blaum. Mirrored disk organization reliability analysis. IEEE Transactions on Computers, 55(12):1640--1644, December 2006. Google ScholarDigital Library
- B. Zhu, K. Li, and H. Patterson. Avoiding the disk bottleneck in the Data Domain deduplication file system. In Proceedings of the Seventh USENIX Conference on File and Storage Technologies (FAST), pages 269--282, February 2008. Google ScholarDigital Library
Index Terms
- Reliability analysis of deduplicated and erasure-coded storage
Recommendations
Erasure coding in windows azure storage
USENIX ATC'12: Proceedings of the 2012 USENIX conference on Annual Technical ConferenceWindows Azure Storage (WAS) is a cloud storage system that provides customers the ability to store seemingly limitless amounts of data for any duration of time. WAS customers have access to their data from anywhere, at any time, and only pay for what ...
A Layered Architecture for Erasure-Coded Consistent Distributed Storage
PODC '17: Proceedings of the ACM Symposium on Principles of Distributed ComputingMotivated by emerging applications to the edge computing paradigm, we introduce a two-layer erasure-coded fault-tolerant distributed storage system offering atomic access for read and write operations. In edge computing, clients interact with an edge-...
Data Delta Based Hybrid Writes for Erasure-Coded Storage Systems
Network and Parallel ComputingAbstractErasure coding is widely used in storage systems since it can offer higher reliability at lower redundancy than data replication. However, erasure-coded storage systems have to perform a partial write to an entire erasure coding group for a small ...
Comments