skip to main content
10.1145/2925426.2926290acmconferencesArticle/Chapter ViewAbstractPublication PagesicsConference Proceedingsconference-collections
research-article
Public Access

High Performance Design for HDFS with Byte-Addressability of NVM and RDMA

Authors Info & Claims
Published:01 June 2016Publication History

ABSTRACT

Non-Volatile Memory (NVM) offers byte-addressability with DRAM like performance along with persistence. Thus, NVMs provide the opportunity to build high-throughput storage systems for data-intensive applications. HDFS (Hadoop Distributed File System) is the primary storage engine for MapReduce, Spark, and HBase. Even though HDFS was initially designed for commodity hardware, it is increasingly being used on HPC (High Performance Computing) clusters. The outstanding performance requirements of HPC systems make the I/O bottlenecks of HDFS a critical issue to rethink its storage architecture over NVMs. In this paper, we present a novel design for HDFS to leverage the byte-addressability of NVM for RDMA (Remote Direct Memory Access)-based communication. We analyze the performance potential of using NVM for HDFS and re-design HDFS I/O with memory semantics to exploit the byte-addressability fully. We call this design NVFS (NVM- and RDMA-aware HDFS). We also present cost-effective acceleration techniques for HBase and Spark to utilize the NVM-based design of HDFS by storing only the HBase Write Ahead Logs and Spark job outputs to NVM, respectively. We also propose enhancements to use the NVFS design as a burst buffer for running Spark jobs on top of parallel file systems like Lustre. Performance evaluations show that our design can improve the write and read throughputs of HDFS by up to 4x and 2x, respectively. The execution times of data generation benchmarks are reduced by up to 45%. The proposed design also reduces the overall execution time of the SWIM workload by up to 18% over HDFS with a maximum benefit of 37% for job-38. For Spark TeraSort, our proposed scheme yields a performance gain of up to 11%. The performances of HBase insert, update, and read operations are improved by 21%, 16%, and 26%, respectively. Our NVM-based burst buffer can improve the I/O performance of Spark PageRank by up to 24% over Lustre. To the best of our knowledge, this paper is the first attempt to incorporate NVM with RDMA for HDFS.

References

  1. Apache HBase. http://hbase.apache.org.Google ScholarGoogle Scholar
  2. Big data needs a new type of non-volatile memory. http://www.electronicsweekly.com/news/big-data-needs-a-new-type-of-non-volatile-memory-2015-10/.Google ScholarGoogle Scholar
  3. Hadoop 2.6 Storage Policies. https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/ArchivalStorage.html.Google ScholarGoogle Scholar
  4. HDFS. https://hadoop.apache.org/docs/r1.2.1/hdfs_design.html.Google ScholarGoogle Scholar
  5. HiBD. http://hibd.cse.ohio-state.edu/.Google ScholarGoogle Scholar
  6. IDC. www.idc.com.Google ScholarGoogle Scholar
  7. Kudu. https://blog.cloudera.com/blog/2015/09/kudu-new-apache-hadoop-storage-for-fast-analytics-on-fast-data/.Google ScholarGoogle Scholar
  8. NVMe. http://www.nvmexpress.org/.Google ScholarGoogle Scholar
  9. NVRAM. http://www.enterprisetech.com/2014/08/06/flashtec-nvram-15-million-iops-sub-microsecond-latency/.Google ScholarGoogle Scholar
  10. Statistical Workload Injector for MapReduce. https://github.com/SWIMProjectUCB.Google ScholarGoogle Scholar
  11. TeraGen. http://hadoop.apache.org/docs/r0.20.0/api/org/apache/hadoop/examples/terasort/TeraGen.html.Google ScholarGoogle Scholar
  12. G. Ananthanarayanan, A. Ghodsi, A. Wang, D. Borthakur, S. Kandula, S. Shenker, and I. Stoica. PACMan: Coordinated Memory Caching for Parallel Jobs. In 9th USENIX Conference on Networked Systems Design and Implementation (NSDI), 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. F. Chang, J. Dean, S. Ghemawat, W. C. Hsieh, D. A. Wallach, M. Burrows, T. Chandra, A. Fikes, and R. E. Gruber. Bigtable: A Distributed Storage System for Structured Data. In Proceedings of the 7th USENIX Symposium on Operating Systems Design and Implementation (OSDI), 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Y. Chen, S. Alspaugh, and R. Katz. Interactive Analytical Processing in Big Data Systems: A Cross-industry Study of MapReduce Workloads. Proc. VLDB Endow., 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. J. Condit, E. B. Nightingale, C. Frost, E. Ipek, D. Burger, B. Lee, and D. Coetzee. Better I/O Through Byte-Addressable, Persistent Memory. In Symposium on Operating Systems Principles (SOSP), 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. S. R. Dulloor, S. Kumar, A. Keshavamurthy, P. Lantz, D. Reddy, R. Sankaran, and J. Jackson. System Software for Persistent Memory. In Proceedings of the Ninth European Conference on Computer Systems (EuroSys), 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. T. Harter, D. Borthakur, S. Dong, A. Aiyer, L. Tang, A. Arpaci-Dusseau, and R. Arpaci-Dusseau. Analysis of HDFS Under HBase: A Facebook Messages Case Study. In 12th USENIX Conference on File and Storage Technologies (FAST), 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. J. Huang, K. Schwan, and M. Qureshi. NVRAM-aware Logging in Transaction Systems. In 41st International Conference on Very Large Data Bases (VLDB), 2015.Google ScholarGoogle Scholar
  19. N. S. Islam, X. Lu, M. W. Rahman, and D. K. Panda. SOR-HDFS: A SEDA-based Approach to Maximize Overlapping in RDMA-Enhanced HDFS. In 23rd International ACM Symposium on High-Performance Parallel and Distributed Computing (HPDC), 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. N. S. Islam, X. Lu, M. W. Rahman, D. Shankar, and D. K. Panda. Triple-H: A Hybrid Approach to Accelerate HDFS on HPC Clusters with Heterogeneous Storage Architecture. In 15th IEEE/ACM Intl. Symposium on Cluster, Cloud and Grid Computing (CCGrid), 2015.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. N. S. Islam, M. W. Rahman, J. Jose, R. Rajachandrasekar, H. Wang, H. Subramoni, C. Murthy, and D. K. Panda. High Performance RDMA-based Design of HDFS over InfiniBand. In International Conference for High Performance Computing, Networking, Storage and Analysis (SC), 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. N. S. Islam, M. W. Rahman, X. Lu, D. Shankar, and D. K. Panda. Performance Characterization and Acceleration of In-Memory File Systems for Hadoop and Spark Applications on HPC Clusters. In 2015 IEEE International Conference on Big Data (IEEE BigData), 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. W. K. Josephson, L. A. Bongo, K. Li, and D. Flynn. DFS: A File System for Virtualized Flash Storage. Trans. Storage, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. K. Massey. Worldwide Financial Services 3rd Platform IT Spending, 2014 - 2019 - Opportunities Abound. http://www.idc.com/getdoc.jsp?containerId=US40697215.Google ScholarGoogle Scholar
  25. K. R. Krish, A. Anwar, and A. Butt. hatS: A Heterogeneity-Aware Tiered Storage for Hadoop. In 14th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID), 2014.Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. K. R. Krish, S. Iqbal, and A. Butt. VENU: Orchestrating SSDs in Hadoop Storage. In 2014 IEEE International Conference on Big Data (IEEE BigData), 2014.Google ScholarGoogle Scholar
  27. H. Li, A. Ghodsi, M. Zaharia, S. Shenker, and I. Stoica. Tachyon: Reliable, Memory Speed Storage for Cluster Computing Frameworks. In ACM Symposium on Cloud Computing (SoCC), 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. T. Lipcon, D. Alves, D. Burkert, J. Cryans, A. Dembo, M. Percy, S. Rus, D. Wang, M. Bertozzi, C. P. McCabe, and A. Wang. Kudu: Storage for Fast Analytics on Fast Data. http://getkudu.io/kudu.pdf.Google ScholarGoogle Scholar
  29. N. Liu, J. Cope, P. Carns, C. Carothers, R. Ross, G. Grider, A. Crume, and C. Maltzahn. On the Role of Burst Buffers in Leadership-Class Storage Systems. In 2012 IEEE Conference on Massive Data Storage, 2012.Google ScholarGoogle Scholar
  30. S. Pelley, T. F. Wenisch, B. T. Gold, and B. Bridge. Storage Management in the NVRAM Era. Proc. VLDB Endow., 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. S. Qiu and A. L. N. Reddy. NVMFS: A Hybrid File System for Improving Random Write in Nand-Flash SSD. In IEEE 29th Symposium on Mass Storage Systems and Technologies (MSST), 2013.Google ScholarGoogle Scholar
  32. K. R, A. Khasymski, A. Butt, S. Tiwari, and M. Bhandarkar. AptStore: Dynamic Storage Management for Hadoop. In International Conference on Cluster Computing (CLUSTER), 2013.Google ScholarGoogle Scholar
  33. P. Sehgal, S. Basu, K. Srinivasan, and K. Voruganti. An Empirical Study of File Systems on NVM. In IEEE 31st Symposium on Mass Storage Systems and Technologies, (MSST), 2015.Google ScholarGoogle Scholar
  34. K. Shvachko, H. Kuang, S. Radia, and R. Chansler. The Hadoop Distributed File System. In IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST), 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. W. Tantisiriroj, S. Patil, G. Gibson, S. Son, S. Lang, and R. Ross. On the Duality of Data-intensive File System Design:Reconciling HDFS and PVFS. In International Conference for High Performance Computing, Networking, Storage and Analysis (SC), 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. The Apache Software Foundation. Centralized Cache Management in HDFS. http://hadoop.apache.org/docs/r2.3.0/hadoop-project-dist/hadoop-hdfs/CentralizedCacheManagement.html.Google ScholarGoogle Scholar
  37. The Apache Software Foundation. The Apache Hadoop Project. http://hadoop.apache.org/.Google ScholarGoogle Scholar
  38. T. Wang, K. Mohror, A. Moody, W. Yu, and K. Sato. BurstFS: A Distributed Burst Buffer File System for Scientific Applications. In The International Conference for High Performance Computing, Networking, Storage and Analysis (SC), 2015.Google ScholarGoogle Scholar
  39. M. Welsh, D. Culler, and E. Brewer. SEDA: An Architecture for Well-Conditioned, Scalable Internet Services. In 18th ACM Symposium on Operating Systems Principles (SOSP), 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. X. Wu and A. L. N. Reddy. SCMFS: A File System for Storage Class Memory. In Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC), 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. M. Zaharia, M. Chowdhury, M. J. Franklin, S. Shenker, and I. Stoica. Spark: Cluster Computing with Working Sets. In Proceedings of the 2Nd USENIX Conference on Hot Topics in Cloud Computing (HotCloud), 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Y. Zhang, J. Yang, A. Memaripour, and S. Swanson. Mojim: A Reliable and Highly-Available Non-Volatile Memory System. In 20th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in
  • Published in

    cover image ACM Conferences
    ICS '16: Proceedings of the 2016 International Conference on Supercomputing
    June 2016
    547 pages
    ISBN:9781450343619
    DOI:10.1145/2925426

    Copyright © 2016 ACM

    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 1 June 2016

    Permissions

    Request permissions about this article.

    Request Permissions

    Check for updates

    Qualifiers

    • research-article
    • Research
    • Refereed limited

    Acceptance Rates

    Overall Acceptance Rate584of2,055submissions,28%

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader