research-article

Open access

Replichard: Towards Tradeoff between Consistency and Performance for Metadata

Authors:

Lixiang AoAuthors Info & Claims

ICS '16: Proceedings of the 2016 International Conference on Supercomputing

Article No.: 25, Pages 1 - 11

https://doi.org/10.1145/2925426.2926292

Published: 01 June 2016 Publication History

Abstract

Metadata scalability is critical for distributed systems as the storage scale is growing rapidly. Because of the strict consistency requirement of metadata, many existing metadata services utilize a fundamentally unscalable design for the sake of easy management, while others provide improved scalability but lead to unacceptable latency and management complexity. Without delivering scalable performance, metadata will be the bottleneck of the entire system. Based on the observation that real file dependencies are few, and there are usually more idempotent than non-idempotent operations, we propose a practical strategy, Replichard, allowing a tradeoff between metadata consistency and scalable performance. Replichard provides metadata services through a cluster of metadata servers, in which a flexible consistency scheme is adopted: strict consistency for non-idempotent operations with dynamic write-lock sharding, and relaxed consistency with accuracy estimations of return values where consistency for idempotent requests is relaxed to achieve high throughput. Write-locks are dynamically created at subtree-level and designated to independent metadata servers in an application-oriented manner. A subtree metadata update that occurs on a particular server is replicated to all metadata servers conforming to the application "start-end" semantics, resulting in an eventually consistent namespace. An asynchronous notification mechanism is also devised to enable users to deal with potential stale reads from operations of relaxed consistency. A prototype was implemented based on HDFS, and the experimental results show promising scalability and performance for both micro benchmarks and various real-world applications written in Pig, Hive and MapReduce.

References

[1]

Hadepot: Repository of mapreduce applications. http://nuage.cs.washington.edu/repository.php/.

[2]

The TPC Benchmark H. http://www.tpc.org/tpch/.

[3]

Cristina L Abad, Nathan Roberts, Yi Lu, and Roy H Campbell. A storage-centric analysis of mapreduce workloads: File popularity, temporal locality and arrival patterns. In 2012 IEEE International Symposium on Workload Characterization (IISWC), pages 100--109. IEEE, 2012.

Digital Library

[4]

Manuel Bravo, Luís E. T. Rodrigues, and Peter Van Roy. Towards a scalable, distributed metadata service for causal consistency under partial geo-replication. In Ivan Beschastnikh and Wouter Joosen, editors, Proceedings of the Doctoral Symposium of the 16th International Middleware Conference, Middleware Doct Symposium 2015, Vancouver, BC, Canada, December 7--11, 2015, pages 1--4. ACM, 2015.

Digital Library

[5]

Philip H. Carns, Walter B. Ligon, III, Robert B. Ross, and Rajeev Thakur. Pvfs: A parallel file system for linux clusters. In Proceedings of the 4th Annual Linux Showcase and Conference, pages 317--327. USENIX Association, 2000.

Digital Library

[6]

Yanpei Chen, Sara Alspaugh, and Randy Katz. Interactive analytical processing in big data systems: A cross-industry study of mapreduce workloads. Proceedings of the VLDB Endowment, 5(12):1802--1813, 2012.

Digital Library

[7]

Jeffrey Dean and Sanjay Ghemawat. Mapreduce: simplified data processing on large clusters. Communications of the ACM, 51(1):107--113, 2008.

Digital Library

[8]

John R Douceur and Jon Howell. Distributed directory service in the farsite file system. In Proceedings of the 7th symposium on Operating systems design and implementation, pages 321--334. USENIX, 2006.

Digital Library

[9]

Lars George. HBase: The Definitive Guide. O'Reilly Media, 1 edition, 2011.

[10]

Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung. The google file system. In Michael L. Scott and Larry L. Peterson, editors, Proceedings of the 19th ACM Symposium on Operating Systems Principles 2003, Bolton Landing, NY, USA, October 19--22, 2003, pages 29--43. ACM, 2003.

Digital Library

[11]

Jim Gray. Notes on data base operating systems. In Operating Systems, An Advanced Course, pages 393--481, London, UK, UK, 1978. Springer-Verlag.

Digital Library

[12]

Patrick Hunt, Mahadev Konar, Flavio P. Junqueira, and Benjamin Reed. Zookeeper: Wait-free coordination for internet-scale systems. In Proceedings of the 2010 USENIX Conference on USENIX Annual Technical Conference, volume 8 of USENIXATC'10, pages 11--11, Berkeley, CA, USA, 2010. USENIX Association.

Digital Library

[13]

Flavio P Junqueira, Benjamin C Reed, and Marco Serafini. Zab: High-performance broadcast for primary-backup systems. In 2011 IEEE/IFIP 41st International Conference on Dependable Systems & Networks (DSN), pages 245--256. IEEE, 2011.

Digital Library

[14]

Avinash Lakshman and Prashant Malik. Cassandra: a decentralized structured storage system. ACM SIGOPS Operating Systems Review, 44(2):35--40, 2010.

Digital Library

[15]

Leslie Lamport. Paxos made simple. ACM Sigact News, 32(4):18--25, 2001.

[16]

Lustre. Lustre file system. http://www.lustre.org/.

[17]

Ralph C Merkle. A digital signature based on a conventional encryption function. In Advances in Cryptology, pages 369--378. Springer, 1988.

Digital Library

[18]

Michael Mitzenmacher. The power of two choices in randomized load balancing. IEEE Transactions on Parallel and Distributed Systems, 12(10):1094--1104, 2001.

Digital Library

[19]

Henry Newman. Hpcs mission partner file i/o scenarios, revision 3. http://wiki.lustre.org/images/5/5a/NewmanMayLustreWorkshop.pdf, 2008.

[20]

Christopher Olston, Benjamin Reed, Utkarsh Srivastava, Ravi Kumar, and Andrew Tomkins. Pig latin: a not-so-foreign language for data processing. In 2008 ACM SIGMOD international conference on Management of data, pages 1099--1110. ACM, 2008.

Digital Library

[21]

Diego Ongaro and John Ousterhout. In search of an understandable consensus algorithm. In Proc. USENIX Annual Technical Conference, pages 305--320, 2014.

Digital Library

[22]

Swapnil Patil and Garth A. Gibson. Scale and concurrency of GIGA+: file system directories with millions of files. In Gregory R. Ganger and John Wilkes, editors, 9th USENIX Conference on File and Storage Technologies, San Jose, CA, USA, February 15--17, 2011, pages 177--190. USENIX, 2011.

Digital Library

[23]

Sanjay Radia. High Availability Framework for HDFS NN. https://issues.apache.org/jira/browse/HDFS-1623, Augest 2012.

[24]

Kai Ren, Qing Zheng, Swapnil Patil, and Garth Gibson. Indexfs: scaling file system metadata performance with stateless caching and bulk insertion. In High Performance Computing, Networking, Storage and Analysis, SC14: International Conference for, pages 237--248. IEEE, 2014.

Digital Library

[25]

Drew S Roselli, Jacob R Lorch, Thomas E Anderson, et al. A comparison of file system workloads. In USENIX annual technical conference, general track, pages 41--54, 2000.

Digital Library

[26]

Konstantin Shvachko. Coordinated replication of the namespace using ConsensusNode. https://issues.apache.org/jira/browse/HDFS-6469, February 2015.

[27]

Konstantin Shvachko, Hairong Kuang, Sanjay Radia, and Robert Chansler. The hadoop distributed file system. In Mohammed G. Khatib, Xubin He, and Michael Factor, editors, IEEE 26th Symposium on Mass Storage Systems and Technologies, MSST 2012, Lake Tahoe, Nevada, USA, May 3--7, 2010, pages 1--10. IEEE Computer Society, 2010.

Digital Library

[28]

Suresh Srinivas. Hdfs scalability with multiple namenodes. https://issues.apache.org/jira/browse/HDFS-1052, April 2011.

[29]

Alexander Thomson and Daniel J Abadi. Calvinfs: consistent wan replication and scalable metadata management for distributed file systems. In Proceedings of the 13th USENIX Conference on File and Storage Technologies, pages 1--14. USENIX Association, 2015.

Digital Library

[30]

Ashish Thusoo, Joydeep Sen Sarma, Namit Jain, Zheng Shao, Prasad Chakka, Suresh Anthony, Hao Liu, Pete Wyckoff, and Raghotham Murthy. Hive: a warehousing solution over a map-reduce framework. Proceedings of the VLDB Endowment, 2(2):1626--1629, 2009.

Digital Library

[31]

Sage A Weil, Scott A Brandt, Ethan L Miller, Darrell DE Long, and Carlos Maltzahn. Ceph: A scalable, high-performance distributed file system. In Proceedings of the 7th symposium on Operating systems design and implementation, pages 307--320. USENIX Association, 2006.

Digital Library

[32]

Brent Welch, Marc Unangst, Zainul Abbasi, Garth A Gibson, Brian Mueller, Jason Small, Jim Zelenka, and Bin Zhou. Scalable performance of the panasas parallel file system. In FAST, volume 8, pages 1--17, 2008.

Digital Library

[33]

Lin Xiao, Kai Ren, Qing Zheng, and Garth A Gibson. Shardfs vs. indexfs: replication vs. caching strategies for distributed metadata management in cloud storage systems. In Proceedings of the Sixth ACM Symposium on Cloud Computing, pages 236--249. ACM, 2015.

Digital Library

[34]

Ruini Xue, Lixiang Ao, Shengli Gao, Zhongyang Guan, and Lupeng Lian. Partitioner: A distributed hdfs metadata server cluster. In 2014 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery (CyberC), pages 167--174. IEEE, 2014.

Digital Library

[35]

Ruini Xue, Lixiang Ao, and Zhongyang Guan. Comet: Client-oriented metadata service for highly available distributed file systems. In 2015 IEEE 27th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD), pages 157--164. IEEE, 2015.

Digital Library

[36]

Ruini Xue, Zhongyang Guan, Shengli Gao, and Lixiang Ao. NM2H: design and implementation of nosql extension for HDFS metadata management. In 17th IEEE International Conference on Computational Science and Engineering, CSE 2014, Chengdu, China, December 19--21, 2014, pages 1282--1289. IEEE Computer Society, 2014.

Digital Library

[37]

ZeroMQ. 0MQ: Distributed Messaging. http://zeromq.org/.

Cited By

Tripathi RSingh PSingh S(2024)Revisiting Cuckoo Hashing: re-addressing the challenges of Cuckoo HashingInternational Journal of Information Technology10.1007/s41870-024-02274-217:1(495-512)Online publication date: 20-Nov-2024
https://doi.org/10.1007/s41870-024-02274-2
Dai HWang YKent KZeng LXu C(2022)The State of the Art of Metadata Managements in Large-Scale Distributed File Systems — Scalability, Performance and AvailabilityIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2022.317057433:12(3850-3869)Online publication date: 1-Dec-2022
https://doi.org/10.1109/TPDS.2022.3170574
Bakshi AChatterjee SChaki N(2019)A Novel Approach for Maintaining Consistency in Distributed File System2019 Second International Conference on Advanced Computational and Communication Paradigms (ICACCP)10.1109/ICACCP.2019.8882935(1-6)Online publication date: Feb-2019
https://doi.org/10.1109/ICACCP.2019.8882935
Show More Cited By

Replichard: Towards Tradeoff between Consistency and Performance for Metadata
1. Software and its engineering
  1. Software organization and properties
    1. Software system structures

Recommendations

Towards a New Model of Storage and Access to Data in Big Data and Cloud Computing

The technological revolution integrating multiple information sources and extension of computer science in different sectors led to the explosion of the data quantities, which reflects the scaling of vo-lumes, numbers and types. These massive increases ...
Classification Based Metadata Management for HDFS
HPCC '12: Proceedings of the 2012 IEEE 14th International Conference on High Performance Computing and Communication & 2012 IEEE 9th International Conference on Embedded Software and Systems

The way data storage is viewed has been changing consistently. The current trend in data storage is Hadoop which provides a scalable data storage mechanism for storing extremely large amount of data and to handle data intensive scientific applications. ...
UGSD: Scalable and Efficient Metadata Management for EB-Scale File Systems
ICCDA '17: Proceedings of the International Conference on Compute and Data Analysis

In existing ultra-large-scale file systems in HPC environment, file data and metadata are stored and processed separately. Efficient and scalable metadata management is one of the most critical aspects that may affect overall performance of file systems,...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ICS '16: Proceedings of the 2016 International Conference on Supercomputing

June 2016

547 pages

ISBN:9781450343619

DOI:10.1145/2925426

Copyright © 2016 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGARCH: ACM Special Interest Group on Computer Architecture

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 June 2016

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

ICS '16

Sponsor:

SIGARCH

ICS '16: 2016 International Conference on Supercomputing

June 1 - 3, 2016

Istanbul, Turkey

Acceptance Rates

Overall Acceptance Rate 629 of 2,180 submissions, 29%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

5
Total Citations
View Citations
735
Total Downloads

Downloads (Last 12 months)148
Downloads (Last 6 weeks)30

Reflects downloads up to 08 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Tripathi RSingh PSingh S(2024)Revisiting Cuckoo Hashing: re-addressing the challenges of Cuckoo HashingInternational Journal of Information Technology10.1007/s41870-024-02274-217:1(495-512)Online publication date: 20-Nov-2024
https://doi.org/10.1007/s41870-024-02274-2
Dai HWang YKent KZeng LXu C(2022)The State of the Art of Metadata Managements in Large-Scale Distributed File Systems — Scalability, Performance and AvailabilityIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2022.317057433:12(3850-3869)Online publication date: 1-Dec-2022
https://doi.org/10.1109/TPDS.2022.3170574
Bakshi AChatterjee SChaki N(2019)A Novel Approach for Maintaining Consistency in Distributed File System2019 Second International Conference on Advanced Computational and Communication Paradigms (ICACCP)10.1109/ICACCP.2019.8882935(1-6)Online publication date: Feb-2019
https://doi.org/10.1109/ICACCP.2019.8882935
Liao JTrahay FCai ZXiong HChen SIshikawa Y(2018)Fine Granularity and Adaptive Cache Update Mechanism for Client CachingIEEE Systems Journal10.1109/JSYST.2018.2866905(1-12)Online publication date: 2018
https://doi.org/10.1109/JSYST.2018.2866905
Patgiri R(2016)MDS: In-Depth Insight2016 International Conference on Information Technology (ICIT)10.1109/ICIT.2016.048(193-199)Online publication date: Dec-2016
https://doi.org/10.1109/ICIT.2016.048

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Figures

Tables

Media

View Table of Conten