skip to main content
10.1145/2913712.2913715acmconferencesArticle/Chapter ViewAbstractPublication PageshpdcConference Proceedingsconference-collections
research-article

Towards Efficient Location and Placement of Dynamic Replicas for Geo-Distributed Data Stores

Published: 01 June 2016 Publication History

Abstract

Large-scale scientific experiments increasingly rely on geo-distributed clouds to serve relevant data to scientists worldwide with minimal latency. State-of-the-art caching systems often require the client to access the data through a caching proxy, or to contact a metadata server to locate the closest available copy of the desired data. Also, such caching systems are inconsistent with the design of distributed hash-table databases such as Dynamo, which focus on allowing clients to locate data independently. We argue there is a gap between existing state-of-the-art solutions and the needs of geographically distributed applications, which require fast access to popular objects while not degrading access latency for the rest of the data. In this paper, we introduce a probabilistic algorithm allowing the user to locate the closest copy of the data efficiently and independently with minimal overhead, allowing low-latency access to non-cached data. Also, we propose a network-efficient technique to identify the most popular data objects in the cluster and trigger their replication close to the clients. Experiments with a real-world data set show that these principles allow clients to locate the closest available copy of data with small memory footprint and low error-rate, thus improving read-latency for non-cached data and allowing hot data to be read locally.

References

[1]
An architectural blueprint for autonomic computing. Technical report, IBM, June 2005.
[2]
Amazon Web Services. https://aws.amazon.com/, 2016. {Online; accessed Feb-1016}.
[3]
Google Cloud. https://cloud.google.com/, 2016. {Online; accessed Feb-1016}.
[4]
Linux Traffic Control. http://tldp.org/HOWTO/Traffic-Control-HOWTO/intro.html, 2016. {Online; accessed Mar-1016}.
[5]
Microsoft Azure. https://azure.microsoft.com/en-us/, 2016. {Online; accessed Feb-1016}.
[6]
Serf by HashiCorp. https://www.serfdom.io/, 2016. {Online; accessed Mar-1016}.
[7]
Verizon Enterprise Solutions IP Latency Statistics. http://www.verizonenterprise.com/about/network/latency/, 2016. {Online; accessed March-1016}.
[8]
L. A. Adamic and B. A. Huberman. Zipf's law and the internet. Glottometrics, 3(1):143--150, 2002.
[9]
P. K. Agarwal, G. Cormode, Z. Huang, J. M. Phillips, Z. Wei, and K. Yi. Mergeable summaries. ACM Transactions on Database Systems (TODS), 38(4):26, 2013.
[10]
B. H. Bloom. Space/time trade-offs in hash coding with allowable errors. Commun. ACM, 13(7):422--426, July 1970.
[11]
A. Carpen-Amarie, A. Costan, J. Cai, G. Antoniu, and L. Bougé. Bringing Introspection into BlobSeer: Towards a Self-Adaptive Distributed Data Management System. International Journal of Applied Mathematics & Computer Science, 21(2):229--242, 2011. To appear.
[12]
J. C. Corbett, J. Dean, M. Epstein, A. Fikes, C. Frost, J. J. Furman, S. Ghemawat, A. Gubarev, C. Heiser, P. Hochschild, W. Hsieh, S. Kanthak, E. Kogan, H. Li, A. Lloyd, S. Melnik, D. Mwaura, D. Nagle, et al. Spanner: Google's globally distributed database. ACM Trans. Comput. Syst., 31(3):8:1--8:22, Aug. 2013.
[13]
A. Das, I. Gupta, and A. Motivala. SWIM: scalable weakly-consistent infection-style process group membership protocol. In Dependable Systems and Networks, 2002. DSN 2002. Proceedings. International Conference on, pages 303--312, 2002.
[14]
G. DeCandia, D. Hastorun, M. Jampani, G. Kakulapati, A. Lakshman, A. Pilchin, S. Sivasubramanian, P. Vosshall, and W. Vogels. Dynamo: Amazon's highly available key-value store. SIGOPS Oper. Syst. Rev., 41(6):205--220, Oct. 2007.
[15]
A. Demers, D. Greene, C. Hauser, W. Irish, J. Larson, S. Shenker, H. Sturgis, et al. Epidemic algorithms for replicated database maintenance. In Proceedings of the Sixth Annual ACM Symposium on Principles of Distributed Computing, PODC '87, pages 1--12, New York, NY, USA, 1987. ACM.
[16]
X. Dong, J. Li, Z. Wu, D. Zhang, and J. Xu. On dynamic replication strategies in data service grids. In Object Oriented Real-Time Distributed Computing (ISORC), 2008 11th IEEE International Symposium on, pages 155--161, May 2008.
[17]
D. Karger, E. Lehman, T. Leighton, R. Panigrahy, M. Levine, and D. Lewin. Consistent hashing and random trees: Distributed caching protocols for relieving hot spots on the world wide web. In Proceedings of the Twenty-ninth Annual ACM Symposium on Theory of Computing, STOC '97, pages 654--663, New York, NY, USA, 1997. ACM.
[18]
P. Knežević, A. Wombacher, and T. Risse. DHT-Based Self-adapting Replication Protocol for Achieving High Data Availability. In E. Damiani, K. Yetongnon, R. Chbeir, and A. Dipanda, editors, Advanced Internet Based Systems and Applications, pages 201--210. Springer-Verlag, Berlin, Heidelberg, 2009.
[19]
A. Lakshman and P. Malik. Cassandra: A decentralized structured storage system. SIGOPS Oper. Syst. Rev., 44(2):35--40, Apr. 2010.
[20]
I. Legrand, H. Newman, R. Voicu, C. Cirstoiu, C. Grigoras, et al. MonALISA: An agent based, dynamic service system to monitor, control and optimize distributed systems. Computer Physics Communications, 180(12):2472--2498, 2009.
[21]
P. Matri, A. Costan, G. Antoniu, J. Montes, and M. Pérez. Týr: Efficient Transactional Storage for Data-Intensive Applications. Technical Report RT-0473, Inria Rennes Bretagne Atlantique; Universidad Politécnica de Madrid, Jan. 2016.
[22]
M. Meiss, F. Menczer, S. Fortunato, A. Flammini, and A. Vespignani. Ranking web sites with real user traffic. In Proc. First ACM International Conference on Web Search and Data Mining (WSDM), pages 65--75, 2008.
[23]
A. Metwally, D. Agrawal, and A. El Abbadi. Efficient computation of frequent and top-k elements in data streams. In Proceedings of the 10th International Conference on Database Theory, ICDT'05, pages 398--412, Berlin, Heidelberg, 2005. Springer-Verlag.
[24]
S. P. N., A. Sivakumar, S. G. Rao, and M. Tawarmalani. D-tunes: self tuning datastores for geo-distributed interactive applications. In SIGCOMM, 2013.
[25]
G. Peng. CDN: content distribution network. CoRR, cs.NI/0411069, 2004.
[26]
D. M. W. Powers. Applications and explanations of zipf's law. In Proceedings of the Joint Conferences on New Methods in Language Processing and Computational Natural Language Learning, NeMLaP3/CoNLL '98, pages 151--160, Stroudsburg, PA, USA, 1998. Association for Computational Linguistics.
[27]
J. Shafer, S. Rixner, and A. Cox. The Hadoop distributed filesystem: Balancing portability and performance. In Performance Analysis of Systems Software (ISPASS), 2010 IEEE International Symposium on, pages 122--133, March 2010.
[28]
H. Shen. Ead: An efficient and adaptive decentralized file replication algorithm in p2p file sharing systems. In Peer-to-Peer Computing, 2008. P2P '08. Eighth International Conference on, pages 99--108, Sept 2008.
[29]
I. Stoica, R. Morris, D. Karger, M. F. Kaashoek, and H. Balakrishnan. Chord: A scalable peer-to-peer lookup service for internet applications. In Proceedings of the 2001 Conference on Applications, Technologies, Architectures, and Protocols for Computer Communications, SIGCOMM '01, pages 149--160, New York, NY, USA, 2001. ACM.
[30]
The ALICE Collaboration, K. Aamodt, A. A. Quintana, R. Achenbach, S. Acounis, D. Adamová, C. Adler, M. Aggarwal, F. Agnese, G. A. Rinella, et al. The ALICE experiment at the CERN LHC. Journal of Instrumentation, 3(08):S08002, 2008.
[31]
Q. Wei, B. Veeravalli, and Z. Li. Dynamic replication management for object-based storage system. In Networking, Architecture and Storage (NAS), 2010 IEEE Fifth International Conference on, pages 412--419, July 2010.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ScienceCloud '16: Proceedings of the ACM 7th Workshop on Scientific Cloud Computing
June 2016
42 pages
ISBN:9781450343534
DOI:10.1145/2913712
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

In-Cooperation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 June 2016

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. availability
  2. cloud
  3. content distribution network
  4. data consistency
  5. data warehousing
  6. geo-replication
  7. metadata
  8. storage networks
  9. wide-area replication

Qualifiers

  • Research-article

Funding Sources

Conference

HPDC'16
Sponsor:

Acceptance Rates

ScienceCloud '16 Paper Acceptance Rate 4 of 8 submissions, 50%;
Overall Acceptance Rate 44 of 151 submissions, 29%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)7
  • Downloads (Last 6 weeks)2
Reflects downloads up to 05 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2021)State Management for Cloud-Native ApplicationsElectronics10.3390/electronics1004042310:4(423)Online publication date: 9-Feb-2021
  • (2020)An Artificial Bee Colony Algorithm for Data Replication Optimization in Cloud EnvironmentsIEEE Access10.1109/ACCESS.2019.29574368(51841-51852)Online publication date: 2020
  • (2020)Reducing network cost of data repair in erasure-coded cross-datacenter storageFuture Generation Computer Systems10.1016/j.future.2019.08.027102:C(494-506)Online publication date: 1-Jan-2020
  • (2019)Minimizing state access delay for cloud-native network functions2019 IEEE 8th International Conference on Cloud Networking (CloudNet)10.1109/CloudNet47604.2019.9064048(1-6)Online publication date: Nov-2019
  • (2018)TýrFSProceedings of the 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing10.1109/CCGRID.2018.00072(452-461)Online publication date: 1-May-2018
  • (2018)Keeping up with storageFuture Generation Computer Systems10.1016/j.future.2017.06.00986:C(1093-1105)Online publication date: 1-Sep-2018
  • (2017)Cloud IaaS for Mass Spectrometry and ProteomicsProceedings of the 8th Workshop on Scientific Cloud Computing10.1145/3086567.3086571(17-24)Online publication date: 27-Jun-2017

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media