skip to main content
10.1145/1364654.1364689acmconferencesArticle/Chapter ViewAbstractPublication PagesconextConference Proceedingsconference-collections
research-article

Proactive replication in distributed storage systems using machine availability estimation

Published: 10 December 2007 Publication History

Abstract

Distributed storage systems provide data availability by means of redundancy. To assure a given level of availability in case of node failures, new redundant fragments need to be introduced.
Since node failures can be either transient or permanent, deciding when to generate new fragments is non-trivial. An additional difficulty is due to the fact that the failure behavior in terms of the rate of permanent and transient failures may vary over time. To be able to adapt to changes in the failure behavior, many systems adopt a reactive approach, in which new fragments are created as soon as a failure is detected. However, reactive approaches tend to produce spikes in bandwidth consumption.
Proactive approaches create new fragments at a fixed rate that depends on the knowledge of the failure behavior or is given by the system administrator. However, existing proactive systems are not able to adapt to a changing failure behavior, which is common in real world.
We propose a new technique based on an ongoing estimation of the failure behavior that is obtained using a model that consists of a network of queues. This scheme combines the adaptiveness of reactive systems with the smooth bandwidth usage of proactive systems, generalizing the two previous approaches. Now, the duality reactive or proactive becomes a specific case of a wider approach tunable with respect to the dynamics of the failure behavior.

References

[1]
A. Adya, W. Bolosky, M. Castro, G. Cermak, R. Chaiken, J. Douceur, J. Howell, J. Lorch, M. Theimer, and R. Wattenhofer. Farsite: Federated, available and reliable storage for an incompletely trusted environment. In Symposium on Operating Systems Design and Implementation (OSDI), 2002.
[2]
R. Bhagwan, K. Tati, Y.-C. Cheng, S. Savage, and G. M. Voelker. Total recall: System support for automated availability management. In Symposum on Networked Systems Design and Implementation (NSDI), 2004.
[3]
B.-G. Chun, F. Dabek, A. Haeberlen, E. Sit, H. Weatherspoon, M. F. Kaashoek, J. Kubiatowicz, and R. Morris. Efficient replica maintenance for distributed storage systems. In Symposum on Networked Systems Design and Implementation (NSDI), 2006.
[4]
B.-G. Chun, F. Dabek, A. Haeberlen, E. Sit, H. Weatherspoon, J. K. M. Frans Kaashoek, and R. Morris. Proactive replication for data durability. In International Workshop on Peer-to-Peer Systems (IPTPS), 2006.
[5]
F. Dabek, K. Kaashoek, D. Karger, R. Morris, and I. Stoica. Wide-area cooperative storage with cfs. In Symposium on Operating Systems Principles (SOSP), 2001.
[6]
A. Datta and K. Aberer. Internet-scale storage systems under churn - a study of the steady state using markov models. In IEEE International Conference on Peer-to-Peer Computing (P2P), 2006.
[7]
A. G. Dimakis, P. B. Godfrey, M. J. Wainwright, and K. Ramchandran. Network coding for distributed storage systems. In Infocom, 2007.
[8]
P. Druschel and A. Rowstron. PAST: A large-scale, persistent peer-to-peer storage utility. In Workshop on Hot Topics in Operating Systems (HotOS), 2001.
[9]
B. Godfrey. Repository of availability traces. http://www.cs.berkeley.edu/~pbg/availability/, 2006.
[10]
A. Haeberlen, A. Mislove, and P. Druschel. Glacier: Highly durable, decentralized storage despite massive correlated failures. In Symposum on Networked Systems Design and Implementation (NSDI), 2005.
[11]
M. Jelasity, A. Montresor, and O. Babaoglu. Gossip-based aggregation in large dynamic networks. ACM Transactions on Computer System, 23(3):219--252, August 2005.
[12]
D. Kempe, A. Dobra, and J. Gehrke. Gossip-based computation of aggregate information. In IEEE Symposium on Foundations of Computer Science (FOCS), 2003.
[13]
J. Kubiatowicz et al. Oceanstore: An architecture for global-scale persistent storage. In International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2000.
[14]
S. Rhea et al. OpenDHT: A public DHT service and its uses. In SIGCOMM, 2005.
[15]
R. Rodrigues and B. Liskov. High availability in dhts: Erasure coding vs. replication. In International Workshop on Peer-to-Peer Systems (IPTPS), 2005.
[16]
J. Stribling. Planetlab all pairs ping. http://infospect.planet-lab.org/pings.
[17]
K. Tati and G. M. Voelker. On object maintenance in peer-to-peer systems. In International Workshop on Peer-to-Peer Systems (IPTPS), 2006.
[18]
K. S. Trivedi. Probability and statistics with reliability, queuing, and computer science applications. John Wiley & Sons, 2nd edition, 2001.
[19]
G. Utard and A. Vernois. Data durability in peer to peer storage systems. In IEEE International Symposium on Cluster Computing and the Grid, 2004.

Cited By

View all
  • (2023)LFPR: A Lazy Fast Predictive Repair Strategy for Mobile Distributed Erasure Coded ClusterIEEE Internet of Things Journal10.1109/JIOT.2022.320341510:1(704-719)Online publication date: 1-Jan-2023
  • (2022)Fast Proactive Repair in Erasure-Coded Storage: Analysis, Design, and ImplementationIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2022.315281733:12(3400-3414)Online publication date: 1-Dec-2022
  • (2021)Resource Inference for Sustainable and Responsive Task Offloading in Challenged Edge NetworksIEEE Transactions on Green Communications and Networking10.1109/TGCN.2021.30918125:3(1114-1127)Online publication date: Sep-2021
  • Show More Cited By
  1. Proactive replication in distributed storage systems using machine availability estimation

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    CoNEXT '07: Proceedings of the 2007 ACM CoNEXT conference
    December 2007
    448 pages
    ISBN:9781595937704
    DOI:10.1145/1364654
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 10 December 2007

    Permissions

    Request permissions for this article.

    Check for updates

    Qualifiers

    • Research-article

    Acceptance Rates

    Overall Acceptance Rate 198 of 789 submissions, 25%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)4
    • Downloads (Last 6 weeks)3
    Reflects downloads up to 07 Mar 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)LFPR: A Lazy Fast Predictive Repair Strategy for Mobile Distributed Erasure Coded ClusterIEEE Internet of Things Journal10.1109/JIOT.2022.320341510:1(704-719)Online publication date: 1-Jan-2023
    • (2022)Fast Proactive Repair in Erasure-Coded Storage: Analysis, Design, and ImplementationIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2022.315281733:12(3400-3414)Online publication date: 1-Dec-2022
    • (2021)Resource Inference for Sustainable and Responsive Task Offloading in Challenged Edge NetworksIEEE Transactions on Green Communications and Networking10.1109/TGCN.2021.30918125:3(1114-1127)Online publication date: Sep-2021
    • (2020)Resource Inference for Task Migration in Challenged Edge Networks with RITMO2020 IEEE 9th International Conference on Cloud Networking (CloudNet)10.1109/CloudNet51028.2020.9335807(1-7)Online publication date: 9-Nov-2020
    • (2020)An architecture for adaptive task planning in support of IoT-based machine learning applications for disaster scenariosComputer Communications10.1016/j.comcom.2020.07.011Online publication date: Jul-2020
    • (2019)Fast Predictive Repair in Erasure-Coded Storage2019 49th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN)10.1109/DSN.2019.00062(556-567)Online publication date: Jun-2019
    • (2019)APRON: an Architecture for Adaptive Task Planning of Internet of Things in Challenged Edge Networks2019 IEEE 8th International Conference on Cloud Networking (CloudNet)10.1109/CloudNet47604.2019.9064091(1-6)Online publication date: Nov-2019
    • (2019)When Time Matters: Predictive Mission Planning in Cyber-Physical ScenariosIEEE Access10.1109/ACCESS.2019.28923107(11246-11257)Online publication date: 2019
    • (2018)Analytical Performance Modeling for Computer Systems, Third EditionSynthesis Lectures on Computer Science10.2200/S00859ED3V01Y201806CSL0107:1(1-171)Online publication date: 23-Jul-2018
    • (2018)Ensemble Prediction Algorithm of Anomaly Monitoring Based on Big Data Analysis Platform of Open-Pit Mine SlopeComplexity10.1155/2018/10487562018Online publication date: 1-Aug-2018
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media