research-article

Proactive replication in distributed storage systems using machine availability estimation

Authors:

Alessandro Duminuco,

Ernst Biersack,

Taoufik En-NajjaryAuthors Info & Claims

CoNEXT '07: Proceedings of the 2007 ACM CoNEXT conference

Article No.: 27, Pages 1 - 12

https://doi.org/10.1145/1364654.1364689

Published: 10 December 2007 Publication History

Abstract

Distributed storage systems provide data availability by means of redundancy. To assure a given level of availability in case of node failures, new redundant fragments need to be introduced.

Since node failures can be either transient or permanent, deciding when to generate new fragments is non-trivial. An additional difficulty is due to the fact that the failure behavior in terms of the rate of permanent and transient failures may vary over time. To be able to adapt to changes in the failure behavior, many systems adopt a reactive approach, in which new fragments are created as soon as a failure is detected. However, reactive approaches tend to produce spikes in bandwidth consumption.

Proactive approaches create new fragments at a fixed rate that depends on the knowledge of the failure behavior or is given by the system administrator. However, existing proactive systems are not able to adapt to a changing failure behavior, which is common in real world.

We propose a new technique based on an ongoing estimation of the failure behavior that is obtained using a model that consists of a network of queues. This scheme combines the adaptiveness of reactive systems with the smooth bandwidth usage of proactive systems, generalizing the two previous approaches. Now, the duality reactive or proactive becomes a specific case of a wider approach tunable with respect to the dynamics of the failure behavior.

References

[1]

A. Adya, W. Bolosky, M. Castro, G. Cermak, R. Chaiken, J. Douceur, J. Howell, J. Lorch, M. Theimer, and R. Wattenhofer. Farsite: Federated, available and reliable storage for an incompletely trusted environment. In Symposium on Operating Systems Design and Implementation (OSDI), 2002.

Digital Library

[2]

R. Bhagwan, K. Tati, Y.-C. Cheng, S. Savage, and G. M. Voelker. Total recall: System support for automated availability management. In Symposum on Networked Systems Design and Implementation (NSDI), 2004.

Digital Library

[3]

B.-G. Chun, F. Dabek, A. Haeberlen, E. Sit, H. Weatherspoon, M. F. Kaashoek, J. Kubiatowicz, and R. Morris. Efficient replica maintenance for distributed storage systems. In Symposum on Networked Systems Design and Implementation (NSDI), 2006.

Digital Library

[4]

B.-G. Chun, F. Dabek, A. Haeberlen, E. Sit, H. Weatherspoon, J. K. M. Frans Kaashoek, and R. Morris. Proactive replication for data durability. In International Workshop on Peer-to-Peer Systems (IPTPS), 2006.

[5]

F. Dabek, K. Kaashoek, D. Karger, R. Morris, and I. Stoica. Wide-area cooperative storage with cfs. In Symposium on Operating Systems Principles (SOSP), 2001.

Digital Library

[6]

A. Datta and K. Aberer. Internet-scale storage systems under churn - a study of the steady state using markov models. In IEEE International Conference on Peer-to-Peer Computing (P2P), 2006.

Digital Library

[7]

A. G. Dimakis, P. B. Godfrey, M. J. Wainwright, and K. Ramchandran. Network coding for distributed storage systems. In Infocom, 2007.

Digital Library

[8]

P. Druschel and A. Rowstron. PAST: A large-scale, persistent peer-to-peer storage utility. In Workshop on Hot Topics in Operating Systems (HotOS), 2001.

Digital Library

[9]

B. Godfrey. Repository of availability traces. http://www.cs.berkeley.edu/~pbg/availability/, 2006.

[10]

A. Haeberlen, A. Mislove, and P. Druschel. Glacier: Highly durable, decentralized storage despite massive correlated failures. In Symposum on Networked Systems Design and Implementation (NSDI), 2005.

Digital Library

[11]

M. Jelasity, A. Montresor, and O. Babaoglu. Gossip-based aggregation in large dynamic networks. ACM Transactions on Computer System, 23(3):219--252, August 2005.

Digital Library

[12]

D. Kempe, A. Dobra, and J. Gehrke. Gossip-based computation of aggregate information. In IEEE Symposium on Foundations of Computer Science (FOCS), 2003.

Digital Library

[13]

J. Kubiatowicz et al. Oceanstore: An architecture for global-scale persistent storage. In International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2000.

Digital Library

[14]

S. Rhea et al. OpenDHT: A public DHT service and its uses. In SIGCOMM, 2005.

Digital Library

[15]

R. Rodrigues and B. Liskov. High availability in dhts: Erasure coding vs. replication. In International Workshop on Peer-to-Peer Systems (IPTPS), 2005.

Digital Library

[16]

J. Stribling. Planetlab all pairs ping. http://infospect.planet-lab.org/pings.

[17]

K. Tati and G. M. Voelker. On object maintenance in peer-to-peer systems. In International Workshop on Peer-to-Peer Systems (IPTPS), 2006.

[18]

K. S. Trivedi. Probability and statistics with reliability, queuing, and computer science applications. John Wiley & Sons, 2nd edition, 2001.

Digital Library

[19]

G. Utard and A. Vernois. Data durability in peer to peer storage systems. In IEEE International Symposium on Cluster Computing and the Grid, 2004.

Digital Library

Cited By

Wu YLiu DTan YDuan MLuo LWang WChen X(2023)LFPR: A Lazy Fast Predictive Repair Strategy for Mobile Distributed Erasure Coded ClusterIEEE Internet of Things Journal10.1109/JIOT.2022.320341510:1(704-719)Online publication date: 1-Jan-2023
https://doi.org/10.1109/JIOT.2022.3203415
Li XCheng KShen ZLee P(2022)Fast Proactive Repair in Erasure-Coded Storage: Analysis, Design, and ImplementationIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2022.315281733:12(3400-3414)Online publication date: 1-Dec-2022
https://doi.org/10.1109/TPDS.2022.3152817
Sacco AEsposito FMarchetto G(2021)Resource Inference for Sustainable and Responsive Task Offloading in Challenged Edge NetworksIEEE Transactions on Green Communications and Networking10.1109/TGCN.2021.30918125:3(1114-1127)Online publication date: Sep-2021
https://doi.org/10.1109/TGCN.2021.3091812
Show More Cited By

Proactive replication in distributed storage systems using machine availability estimation
1. Software and its engineering
  1. Software organization and properties
    1. Software system structures

Recommendations

Proactive Fault Handling for System Availability Enhancement
IPDPS '05: Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 16 - Volume 17

Proactive fault handling combines prevention and repair actions with failure prediction techniques. We extend the standard availability formula by five key measures: (1) precision and (2) recall assess failure prediction while failure handling is gauged ...
High availability in cheap distributed key value storage
SoCC '20: Proceedings of the 11th ACM Symposium on Cloud Computing

Memory-based storage currently offers the highest-performance distributed storage, keeping the primary copy of all data in DRAM. Recent advances in non-volatile main memory (NVMM) technologies promise latency similar to DRAM at reduced cost and energy, ...
Reliability Equations for Cloud Storage Systems with Proactive Fault Tolerance
As cloud storage systems increase in scale, hard drive failures are becoming more frequent, which raises reliability issues. In addition to traditional reactive fault tolerance, proactive fault tolerance is used to improve a system's reliability. ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

CoNEXT '07: Proceedings of the 2007 ACM CoNEXT conference

December 2007

448 pages

ISBN:9781595937704

DOI:10.1145/1364654

General Chairs:
Jim Kurose
University of Massachusetts
,
Henning Schulzrinne
Columbia University

Copyright © 2007 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Alcatel-Lucent
SIGCOMM: ACM Special Interest Group on Data Communication
Thomson
CISCO
IMDEA
IBM: IBM

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 10 December 2007

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Research-article

Acceptance Rates

Overall Acceptance Rate 198 of 789 submissions, 25%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

50
Total Citations
View Citations
347
Total Downloads

Downloads (Last 12 months)4
Downloads (Last 6 weeks)3

Reflects downloads up to 07 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Wu YLiu DTan YDuan MLuo LWang WChen X(2023)LFPR: A Lazy Fast Predictive Repair Strategy for Mobile Distributed Erasure Coded ClusterIEEE Internet of Things Journal10.1109/JIOT.2022.320341510:1(704-719)Online publication date: 1-Jan-2023
https://doi.org/10.1109/JIOT.2022.3203415
Li XCheng KShen ZLee P(2022)Fast Proactive Repair in Erasure-Coded Storage: Analysis, Design, and ImplementationIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2022.315281733:12(3400-3414)Online publication date: 1-Dec-2022
https://doi.org/10.1109/TPDS.2022.3152817
Sacco AEsposito FMarchetto G(2021)Resource Inference for Sustainable and Responsive Task Offloading in Challenged Edge NetworksIEEE Transactions on Green Communications and Networking10.1109/TGCN.2021.30918125:3(1114-1127)Online publication date: Sep-2021
https://doi.org/10.1109/TGCN.2021.3091812
Sacco AEsposito FMarchetto G(2020)Resource Inference for Task Migration in Challenged Edge Networks with RITMO2020 IEEE 9th International Conference on Cloud Networking (CloudNet)10.1109/CloudNet51028.2020.9335807(1-7)Online publication date: 9-Nov-2020
https://doi.org/10.1109/CloudNet51028.2020.9335807
Sacco AFlocco MEsposito FMarchetto G(2020)An architecture for adaptive task planning in support of IoT-based machine learning applications for disaster scenariosComputer Communications10.1016/j.comcom.2020.07.011Online publication date: Jul-2020
https://doi.org/10.1016/j.comcom.2020.07.011
Shen ZLi XLee P(2019)Fast Predictive Repair in Erasure-Coded Storage2019 49th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN)10.1109/DSN.2019.00062(556-567)Online publication date: Jun-2019
https://doi.org/10.1109/DSN.2019.00062
Ventrella AEsposito FSacco AFlocco MMarchetto GGururajan S(2019)APRON: an Architecture for Adaptive Task Planning of Internet of Things in Challenged Edge Networks2019 IEEE 8th International Conference on Cloud Networking (CloudNet)10.1109/CloudNet47604.2019.9064091(1-6)Online publication date: Nov-2019
https://doi.org/10.1109/CloudNet47604.2019.9064091
Gaggero MDi Paola DPetitti ACaviglione L(2019)When Time Matters: Predictive Mission Planning in Cyber-Physical ScenariosIEEE Access10.1109/ACCESS.2019.28923107(11246-11257)Online publication date: 2019
https://doi.org/10.1109/ACCESS.2019.2892310
Tay Y(2018)Analytical Performance Modeling for Computer Systems, Third EditionSynthesis Lectures on Computer Science10.2200/S00859ED3V01Y201806CSL0107:1(1-171)Online publication date: 23-Jul-2018
https://doi.org/10.2200/S00859ED3V01Y201806CSL010
Jiang SLian MLu CGu QRuan SXie X(2018)Ensemble Prediction Algorithm of Anomaly Monitoring Based on Big Data Analysis Platform of Open-Pit Mine SlopeComplexity10.1155/2018/10487562018Online publication date: 1-Aug-2018
https://dl.acm.org/doi/10.1155/2018/1048756
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten