ABSTRACT
Non-Volatile Memory (NVM) offers byte-addressability with DRAM like performance along with persistence. Thus, NVMs provide the opportunity to build high-throughput storage systems for data-intensive applications. HDFS (Hadoop Distributed File System) is the primary storage engine for MapReduce, Spark, and HBase. Even though HDFS was initially designed for commodity hardware, it is increasingly being used on HPC (High Performance Computing) clusters. The outstanding performance requirements of HPC systems make the I/O bottlenecks of HDFS a critical issue to rethink its storage architecture over NVMs. In this paper, we present a novel design for HDFS to leverage the byte-addressability of NVM for RDMA (Remote Direct Memory Access)-based communication. We analyze the performance potential of using NVM for HDFS and re-design HDFS I/O with memory semantics to exploit the byte-addressability fully. We call this design NVFS (NVM- and RDMA-aware HDFS). We also present cost-effective acceleration techniques for HBase and Spark to utilize the NVM-based design of HDFS by storing only the HBase Write Ahead Logs and Spark job outputs to NVM, respectively. We also propose enhancements to use the NVFS design as a burst buffer for running Spark jobs on top of parallel file systems like Lustre. Performance evaluations show that our design can improve the write and read throughputs of HDFS by up to 4x and 2x, respectively. The execution times of data generation benchmarks are reduced by up to 45%. The proposed design also reduces the overall execution time of the SWIM workload by up to 18% over HDFS with a maximum benefit of 37% for job-38. For Spark TeraSort, our proposed scheme yields a performance gain of up to 11%. The performances of HBase insert, update, and read operations are improved by 21%, 16%, and 26%, respectively. Our NVM-based burst buffer can improve the I/O performance of Spark PageRank by up to 24% over Lustre. To the best of our knowledge, this paper is the first attempt to incorporate NVM with RDMA for HDFS.
- Apache HBase. http://hbase.apache.org.Google Scholar
- Big data needs a new type of non-volatile memory. http://www.electronicsweekly.com/news/big-data-needs-a-new-type-of-non-volatile-memory-2015-10/.Google Scholar
- Hadoop 2.6 Storage Policies. https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/ArchivalStorage.html.Google Scholar
- HDFS. https://hadoop.apache.org/docs/r1.2.1/hdfs_design.html.Google Scholar
- HiBD. http://hibd.cse.ohio-state.edu/.Google Scholar
- IDC. www.idc.com.Google Scholar
- Kudu. https://blog.cloudera.com/blog/2015/09/kudu-new-apache-hadoop-storage-for-fast-analytics-on-fast-data/.Google Scholar
- NVMe. http://www.nvmexpress.org/.Google Scholar
- NVRAM. http://www.enterprisetech.com/2014/08/06/flashtec-nvram-15-million-iops-sub-microsecond-latency/.Google Scholar
- Statistical Workload Injector for MapReduce. https://github.com/SWIMProjectUCB.Google Scholar
- TeraGen. http://hadoop.apache.org/docs/r0.20.0/api/org/apache/hadoop/examples/terasort/TeraGen.html.Google Scholar
- G. Ananthanarayanan, A. Ghodsi, A. Wang, D. Borthakur, S. Kandula, S. Shenker, and I. Stoica. PACMan: Coordinated Memory Caching for Parallel Jobs. In 9th USENIX Conference on Networked Systems Design and Implementation (NSDI), 2012. Google ScholarDigital Library
- F. Chang, J. Dean, S. Ghemawat, W. C. Hsieh, D. A. Wallach, M. Burrows, T. Chandra, A. Fikes, and R. E. Gruber. Bigtable: A Distributed Storage System for Structured Data. In Proceedings of the 7th USENIX Symposium on Operating Systems Design and Implementation (OSDI), 2006. Google ScholarDigital Library
- Y. Chen, S. Alspaugh, and R. Katz. Interactive Analytical Processing in Big Data Systems: A Cross-industry Study of MapReduce Workloads. Proc. VLDB Endow., 2012. Google ScholarDigital Library
- J. Condit, E. B. Nightingale, C. Frost, E. Ipek, D. Burger, B. Lee, and D. Coetzee. Better I/O Through Byte-Addressable, Persistent Memory. In Symposium on Operating Systems Principles (SOSP), 2009. Google ScholarDigital Library
- S. R. Dulloor, S. Kumar, A. Keshavamurthy, P. Lantz, D. Reddy, R. Sankaran, and J. Jackson. System Software for Persistent Memory. In Proceedings of the Ninth European Conference on Computer Systems (EuroSys), 2014. Google ScholarDigital Library
- T. Harter, D. Borthakur, S. Dong, A. Aiyer, L. Tang, A. Arpaci-Dusseau, and R. Arpaci-Dusseau. Analysis of HDFS Under HBase: A Facebook Messages Case Study. In 12th USENIX Conference on File and Storage Technologies (FAST), 2014. Google ScholarDigital Library
- J. Huang, K. Schwan, and M. Qureshi. NVRAM-aware Logging in Transaction Systems. In 41st International Conference on Very Large Data Bases (VLDB), 2015.Google Scholar
- N. S. Islam, X. Lu, M. W. Rahman, and D. K. Panda. SOR-HDFS: A SEDA-based Approach to Maximize Overlapping in RDMA-Enhanced HDFS. In 23rd International ACM Symposium on High-Performance Parallel and Distributed Computing (HPDC), 2014. Google ScholarDigital Library
- N. S. Islam, X. Lu, M. W. Rahman, D. Shankar, and D. K. Panda. Triple-H: A Hybrid Approach to Accelerate HDFS on HPC Clusters with Heterogeneous Storage Architecture. In 15th IEEE/ACM Intl. Symposium on Cluster, Cloud and Grid Computing (CCGrid), 2015.Google ScholarDigital Library
- N. S. Islam, M. W. Rahman, J. Jose, R. Rajachandrasekar, H. Wang, H. Subramoni, C. Murthy, and D. K. Panda. High Performance RDMA-based Design of HDFS over InfiniBand. In International Conference for High Performance Computing, Networking, Storage and Analysis (SC), 2012. Google ScholarDigital Library
- N. S. Islam, M. W. Rahman, X. Lu, D. Shankar, and D. K. Panda. Performance Characterization and Acceleration of In-Memory File Systems for Hadoop and Spark Applications on HPC Clusters. In 2015 IEEE International Conference on Big Data (IEEE BigData), 2015. Google ScholarDigital Library
- W. K. Josephson, L. A. Bongo, K. Li, and D. Flynn. DFS: A File System for Virtualized Flash Storage. Trans. Storage, 2010. Google ScholarDigital Library
- K. Massey. Worldwide Financial Services 3rd Platform IT Spending, 2014 - 2019 - Opportunities Abound. http://www.idc.com/getdoc.jsp?containerId=US40697215.Google Scholar
- K. R. Krish, A. Anwar, and A. Butt. hatS: A Heterogeneity-Aware Tiered Storage for Hadoop. In 14th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID), 2014.Google ScholarDigital Library
- K. R. Krish, S. Iqbal, and A. Butt. VENU: Orchestrating SSDs in Hadoop Storage. In 2014 IEEE International Conference on Big Data (IEEE BigData), 2014.Google Scholar
- H. Li, A. Ghodsi, M. Zaharia, S. Shenker, and I. Stoica. Tachyon: Reliable, Memory Speed Storage for Cluster Computing Frameworks. In ACM Symposium on Cloud Computing (SoCC), 2014. Google ScholarDigital Library
- T. Lipcon, D. Alves, D. Burkert, J. Cryans, A. Dembo, M. Percy, S. Rus, D. Wang, M. Bertozzi, C. P. McCabe, and A. Wang. Kudu: Storage for Fast Analytics on Fast Data. http://getkudu.io/kudu.pdf.Google Scholar
- N. Liu, J. Cope, P. Carns, C. Carothers, R. Ross, G. Grider, A. Crume, and C. Maltzahn. On the Role of Burst Buffers in Leadership-Class Storage Systems. In 2012 IEEE Conference on Massive Data Storage, 2012.Google Scholar
- S. Pelley, T. F. Wenisch, B. T. Gold, and B. Bridge. Storage Management in the NVRAM Era. Proc. VLDB Endow., 2013. Google ScholarDigital Library
- S. Qiu and A. L. N. Reddy. NVMFS: A Hybrid File System for Improving Random Write in Nand-Flash SSD. In IEEE 29th Symposium on Mass Storage Systems and Technologies (MSST), 2013.Google Scholar
- K. R, A. Khasymski, A. Butt, S. Tiwari, and M. Bhandarkar. AptStore: Dynamic Storage Management for Hadoop. In International Conference on Cluster Computing (CLUSTER), 2013.Google Scholar
- P. Sehgal, S. Basu, K. Srinivasan, and K. Voruganti. An Empirical Study of File Systems on NVM. In IEEE 31st Symposium on Mass Storage Systems and Technologies, (MSST), 2015.Google Scholar
- K. Shvachko, H. Kuang, S. Radia, and R. Chansler. The Hadoop Distributed File System. In IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST), 2010. Google ScholarDigital Library
- W. Tantisiriroj, S. Patil, G. Gibson, S. Son, S. Lang, and R. Ross. On the Duality of Data-intensive File System Design:Reconciling HDFS and PVFS. In International Conference for High Performance Computing, Networking, Storage and Analysis (SC), 2011. Google ScholarDigital Library
- The Apache Software Foundation. Centralized Cache Management in HDFS. http://hadoop.apache.org/docs/r2.3.0/hadoop-project-dist/hadoop-hdfs/CentralizedCacheManagement.html.Google Scholar
- The Apache Software Foundation. The Apache Hadoop Project. http://hadoop.apache.org/.Google Scholar
- T. Wang, K. Mohror, A. Moody, W. Yu, and K. Sato. BurstFS: A Distributed Burst Buffer File System for Scientific Applications. In The International Conference for High Performance Computing, Networking, Storage and Analysis (SC), 2015.Google Scholar
- M. Welsh, D. Culler, and E. Brewer. SEDA: An Architecture for Well-Conditioned, Scalable Internet Services. In 18th ACM Symposium on Operating Systems Principles (SOSP), 2001. Google ScholarDigital Library
- X. Wu and A. L. N. Reddy. SCMFS: A File System for Storage Class Memory. In Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC), 2011. Google ScholarDigital Library
- M. Zaharia, M. Chowdhury, M. J. Franklin, S. Shenker, and I. Stoica. Spark: Cluster Computing with Working Sets. In Proceedings of the 2Nd USENIX Conference on Hot Topics in Cloud Computing (HotCloud), 2010. Google ScholarDigital Library
- Y. Zhang, J. Yang, A. Memaripour, and S. Swanson. Mojim: A Reliable and Highly-Available Non-Volatile Memory System. In 20th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2015. Google ScholarDigital Library
Recommendations
NVM duet: unified working memory and persistent store architecture
ASPLOS '14: Proceedings of the 19th international conference on Architectural support for programming languages and operating systemsEmerging non-volatile memory (NVM) technologies have gained a lot of attention recently. The byte-addressability and high density of NVM enable computer architects to build large-scale main memory systems. NVM has also been shown to be a promising ...
System evaluation of the Intel optane byte-addressable NVM
MEMSYS '19: Proceedings of the International Symposium on Memory SystemsByte-addressable non-volatile memory (NVM) features high density, DRAM comparable performance, and persistence. These characteristics position NVM as a promising new tier in the memory hierarchy. Nevertheless, NVM has asymmetric read and write ...
NVM duet: unified working memory and persistent store architecture
ASPLOS '14Emerging non-volatile memory (NVM) technologies have gained a lot of attention recently. The byte-addressability and high density of NVM enable computer architects to build large-scale main memory systems. NVM has also been shown to be a promising ...
Comments