skip to main content
10.1145/2925426.2926292acmconferencesArticle/Chapter ViewAbstractPublication PagesicsConference Proceedingsconference-collections
research-article
Open access

Replichard: Towards Tradeoff between Consistency and Performance for Metadata

Published: 01 June 2016 Publication History

Abstract

Metadata scalability is critical for distributed systems as the storage scale is growing rapidly. Because of the strict consistency requirement of metadata, many existing metadata services utilize a fundamentally unscalable design for the sake of easy management, while others provide improved scalability but lead to unacceptable latency and management complexity. Without delivering scalable performance, metadata will be the bottleneck of the entire system. Based on the observation that real file dependencies are few, and there are usually more idempotent than non-idempotent operations, we propose a practical strategy, Replichard, allowing a tradeoff between metadata consistency and scalable performance. Replichard provides metadata services through a cluster of metadata servers, in which a flexible consistency scheme is adopted: strict consistency for non-idempotent operations with dynamic write-lock sharding, and relaxed consistency with accuracy estimations of return values where consistency for idempotent requests is relaxed to achieve high throughput. Write-locks are dynamically created at subtree-level and designated to independent metadata servers in an application-oriented manner. A subtree metadata update that occurs on a particular server is replicated to all metadata servers conforming to the application "start-end" semantics, resulting in an eventually consistent namespace. An asynchronous notification mechanism is also devised to enable users to deal with potential stale reads from operations of relaxed consistency. A prototype was implemented based on HDFS, and the experimental results show promising scalability and performance for both micro benchmarks and various real-world applications written in Pig, Hive and MapReduce.

References

[1]
Hadepot: Repository of mapreduce applications. http://nuage.cs.washington.edu/repository.php/.
[2]
The TPC Benchmark H. http://www.tpc.org/tpch/.
[3]
Cristina L Abad, Nathan Roberts, Yi Lu, and Roy H Campbell. A storage-centric analysis of mapreduce workloads: File popularity, temporal locality and arrival patterns. In 2012 IEEE International Symposium on Workload Characterization (IISWC), pages 100--109. IEEE, 2012.
[4]
Manuel Bravo, Luís E. T. Rodrigues, and Peter Van Roy. Towards a scalable, distributed metadata service for causal consistency under partial geo-replication. In Ivan Beschastnikh and Wouter Joosen, editors, Proceedings of the Doctoral Symposium of the 16th International Middleware Conference, Middleware Doct Symposium 2015, Vancouver, BC, Canada, December 7--11, 2015, pages 1--4. ACM, 2015.
[5]
Philip H. Carns, Walter B. Ligon, III, Robert B. Ross, and Rajeev Thakur. Pvfs: A parallel file system for linux clusters. In Proceedings of the 4th Annual Linux Showcase and Conference, pages 317--327. USENIX Association, 2000.
[6]
Yanpei Chen, Sara Alspaugh, and Randy Katz. Interactive analytical processing in big data systems: A cross-industry study of mapreduce workloads. Proceedings of the VLDB Endowment, 5(12):1802--1813, 2012.
[7]
Jeffrey Dean and Sanjay Ghemawat. Mapreduce: simplified data processing on large clusters. Communications of the ACM, 51(1):107--113, 2008.
[8]
John R Douceur and Jon Howell. Distributed directory service in the farsite file system. In Proceedings of the 7th symposium on Operating systems design and implementation, pages 321--334. USENIX, 2006.
[9]
Lars George. HBase: The Definitive Guide. O'Reilly Media, 1 edition, 2011.
[10]
Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung. The google file system. In Michael L. Scott and Larry L. Peterson, editors, Proceedings of the 19th ACM Symposium on Operating Systems Principles 2003, Bolton Landing, NY, USA, October 19--22, 2003, pages 29--43. ACM, 2003.
[11]
Jim Gray. Notes on data base operating systems. In Operating Systems, An Advanced Course, pages 393--481, London, UK, UK, 1978. Springer-Verlag.
[12]
Patrick Hunt, Mahadev Konar, Flavio P. Junqueira, and Benjamin Reed. Zookeeper: Wait-free coordination for internet-scale systems. In Proceedings of the 2010 USENIX Conference on USENIX Annual Technical Conference, volume 8 of USENIXATC'10, pages 11--11, Berkeley, CA, USA, 2010. USENIX Association.
[13]
Flavio P Junqueira, Benjamin C Reed, and Marco Serafini. Zab: High-performance broadcast for primary-backup systems. In 2011 IEEE/IFIP 41st International Conference on Dependable Systems & Networks (DSN), pages 245--256. IEEE, 2011.
[14]
Avinash Lakshman and Prashant Malik. Cassandra: a decentralized structured storage system. ACM SIGOPS Operating Systems Review, 44(2):35--40, 2010.
[15]
Leslie Lamport. Paxos made simple. ACM Sigact News, 32(4):18--25, 2001.
[16]
Lustre. Lustre file system. http://www.lustre.org/.
[17]
Ralph C Merkle. A digital signature based on a conventional encryption function. In Advances in Cryptology, pages 369--378. Springer, 1988.
[18]
Michael Mitzenmacher. The power of two choices in randomized load balancing. IEEE Transactions on Parallel and Distributed Systems, 12(10):1094--1104, 2001.
[19]
Henry Newman. Hpcs mission partner file i/o scenarios, revision 3. http://wiki.lustre.org/images/5/5a/NewmanMayLustreWorkshop.pdf, 2008.
[20]
Christopher Olston, Benjamin Reed, Utkarsh Srivastava, Ravi Kumar, and Andrew Tomkins. Pig latin: a not-so-foreign language for data processing. In 2008 ACM SIGMOD international conference on Management of data, pages 1099--1110. ACM, 2008.
[21]
Diego Ongaro and John Ousterhout. In search of an understandable consensus algorithm. In Proc. USENIX Annual Technical Conference, pages 305--320, 2014.
[22]
Swapnil Patil and Garth A. Gibson. Scale and concurrency of GIGA+: file system directories with millions of files. In Gregory R. Ganger and John Wilkes, editors, 9th USENIX Conference on File and Storage Technologies, San Jose, CA, USA, February 15--17, 2011, pages 177--190. USENIX, 2011.
[23]
Sanjay Radia. High Availability Framework for HDFS NN. https://issues.apache.org/jira/browse/HDFS-1623, Augest 2012.
[24]
Kai Ren, Qing Zheng, Swapnil Patil, and Garth Gibson. Indexfs: scaling file system metadata performance with stateless caching and bulk insertion. In High Performance Computing, Networking, Storage and Analysis, SC14: International Conference for, pages 237--248. IEEE, 2014.
[25]
Drew S Roselli, Jacob R Lorch, Thomas E Anderson, et al. A comparison of file system workloads. In USENIX annual technical conference, general track, pages 41--54, 2000.
[26]
Konstantin Shvachko. Coordinated replication of the namespace using ConsensusNode. https://issues.apache.org/jira/browse/HDFS-6469, February 2015.
[27]
Konstantin Shvachko, Hairong Kuang, Sanjay Radia, and Robert Chansler. The hadoop distributed file system. In Mohammed G. Khatib, Xubin He, and Michael Factor, editors, IEEE 26th Symposium on Mass Storage Systems and Technologies, MSST 2012, Lake Tahoe, Nevada, USA, May 3--7, 2010, pages 1--10. IEEE Computer Society, 2010.
[28]
Suresh Srinivas. Hdfs scalability with multiple namenodes. https://issues.apache.org/jira/browse/HDFS-1052, April 2011.
[29]
Alexander Thomson and Daniel J Abadi. Calvinfs: consistent wan replication and scalable metadata management for distributed file systems. In Proceedings of the 13th USENIX Conference on File and Storage Technologies, pages 1--14. USENIX Association, 2015.
[30]
Ashish Thusoo, Joydeep Sen Sarma, Namit Jain, Zheng Shao, Prasad Chakka, Suresh Anthony, Hao Liu, Pete Wyckoff, and Raghotham Murthy. Hive: a warehousing solution over a map-reduce framework. Proceedings of the VLDB Endowment, 2(2):1626--1629, 2009.
[31]
Sage A Weil, Scott A Brandt, Ethan L Miller, Darrell DE Long, and Carlos Maltzahn. Ceph: A scalable, high-performance distributed file system. In Proceedings of the 7th symposium on Operating systems design and implementation, pages 307--320. USENIX Association, 2006.
[32]
Brent Welch, Marc Unangst, Zainul Abbasi, Garth A Gibson, Brian Mueller, Jason Small, Jim Zelenka, and Bin Zhou. Scalable performance of the panasas parallel file system. In FAST, volume 8, pages 1--17, 2008.
[33]
Lin Xiao, Kai Ren, Qing Zheng, and Garth A Gibson. Shardfs vs. indexfs: replication vs. caching strategies for distributed metadata management in cloud storage systems. In Proceedings of the Sixth ACM Symposium on Cloud Computing, pages 236--249. ACM, 2015.
[34]
Ruini Xue, Lixiang Ao, Shengli Gao, Zhongyang Guan, and Lupeng Lian. Partitioner: A distributed hdfs metadata server cluster. In 2014 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery (CyberC), pages 167--174. IEEE, 2014.
[35]
Ruini Xue, Lixiang Ao, and Zhongyang Guan. Comet: Client-oriented metadata service for highly available distributed file systems. In 2015 IEEE 27th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD), pages 157--164. IEEE, 2015.
[36]
Ruini Xue, Zhongyang Guan, Shengli Gao, and Lixiang Ao. NM2H: design and implementation of nosql extension for HDFS metadata management. In 17th IEEE International Conference on Computational Science and Engineering, CSE 2014, Chengdu, China, December 19--21, 2014, pages 1282--1289. IEEE Computer Society, 2014.
[37]
ZeroMQ. 0MQ: Distributed Messaging. http://zeromq.org/.

Cited By

View all
  • (2024)Revisiting Cuckoo Hashing: re-addressing the challenges of Cuckoo HashingInternational Journal of Information Technology10.1007/s41870-024-02274-217:1(495-512)Online publication date: 20-Nov-2024
  • (2022)The State of the Art of Metadata Managements in Large-Scale Distributed File Systems — Scalability, Performance and AvailabilityIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2022.317057433:12(3850-3869)Online publication date: 1-Dec-2022
  • (2019)A Novel Approach for Maintaining Consistency in Distributed File System2019 Second International Conference on Advanced Computational and Communication Paradigms (ICACCP)10.1109/ICACCP.2019.8882935(1-6)Online publication date: Feb-2019
  • Show More Cited By
  1. Replichard: Towards Tradeoff between Consistency and Performance for Metadata

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    ICS '16: Proceedings of the 2016 International Conference on Supercomputing
    June 2016
    547 pages
    ISBN:9781450343619
    DOI:10.1145/2925426
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 01 June 2016

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Distributed Storage
    2. File System
    3. HDFS
    4. Hadoop
    5. High Performance
    6. Metadata

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Conference

    ICS '16
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 629 of 2,180 submissions, 29%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)148
    • Downloads (Last 6 weeks)30
    Reflects downloads up to 08 Mar 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Revisiting Cuckoo Hashing: re-addressing the challenges of Cuckoo HashingInternational Journal of Information Technology10.1007/s41870-024-02274-217:1(495-512)Online publication date: 20-Nov-2024
    • (2022)The State of the Art of Metadata Managements in Large-Scale Distributed File Systems — Scalability, Performance and AvailabilityIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2022.317057433:12(3850-3869)Online publication date: 1-Dec-2022
    • (2019)A Novel Approach for Maintaining Consistency in Distributed File System2019 Second International Conference on Advanced Computational and Communication Paradigms (ICACCP)10.1109/ICACCP.2019.8882935(1-6)Online publication date: Feb-2019
    • (2018)Fine Granularity and Adaptive Cache Update Mechanism for Client CachingIEEE Systems Journal10.1109/JSYST.2018.2866905(1-12)Online publication date: 2018
    • (2016)MDS: In-Depth Insight2016 International Conference on Information Technology (ICIT)10.1109/ICIT.2016.048(193-199)Online publication date: Dec-2016

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Login options

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media