Abstract
Storage disaggregation separates compute and storage to different nodes to allow for independent resource scaling and, thus, better hardware resource utilization. While disaggregation of hard-drives storage is a common practice, NVMe-SSD (i.e., PCIe-based SSD) disaggregation is considered more challenging. This is because SSDs are significantly faster than hard drives, so the latency overheads (due to both network and CPU processing) as well as the extra compute cycles needed for the offloading stack become much more pronounced.
In this work, we characterize the overheads of NVMe-SSD disaggregation. We show that NVMe-over-Fabrics (NVMe-oF)—a recently released remote storage protocol specification—reduces the overheads of remote access to a bare minimum, thus greatly increasing the cost-efficiency of Flash disaggregation. Specifically, while recent work showed that SSD storage disaggregation via iSCSI degrades application-level throughput by 20%, we report on negligible performance degradation with NVMe-oF—both when using stress-tests as well as with a more-realistic KV-store workload.
- Amazon. 2008. Amazon Elastic Block Store. Retrieved from https://aws.amazon.com/ebs/.Google Scholar
- Jens Axboe. 2014. FIO. Retrieved from https://github.com/axboe/fio.Google Scholar
- Luiz André Barroso, Jimmy Clidaras, and Urs Hölzle. 2013. The datacenter as a computer: An introduction to the design of warehouse-scale machines. Synth. Lect. Comput. Architect. 8, 3 (2013), 1--154.Google ScholarDigital Library
- X bit labs. 2016. OCZ Demos 4 TiB, 16 TiB Solid-State Drives for Enterprise. https://web.archive.org/web/20130325121004/http:/www.xbitlabs.com/news/storage/display/20120110180208_OCZ_Demos_4TB_16TB_Solid_State_Drives_for_Enterprise.html.Google Scholar
- Matias Bjørling, Jens Axboe, David Nellans, and Philippe Bonnet. 2013. Linux block IO: Introducing multi-queue SSD access on multi-core systems. In Proceedings of the 6th International Systems and Storage Conference (SYSTOR’13). ACM, New York, NY, Article 22. Google ScholarDigital Library
- Brandon Hoff. 2016. RDMA Interconnects Paving the Way for NVMe over Fabrics Technology. Retrieved from http://www.roceinitiative.org/.Google Scholar
- David Cohen, Thomas Talpey, Arkady Kanevsky, Uri Cummings, Michael Krause, Renato Recio, Diego Crupnicoff, Lloyd Dickman, and Paul Grun. 2009. Remote direct memory access over the converged enhanced ethernet fabric: Evaluating the options. In Proceedings of the 17th IEEE Symposium on High Performance Interconnects (HOTI’09). IEEE, 123--130. Google ScholarDigital Library
- Chelsio Communications. 2014. Luster over iWARP RDMA at 40Gbps. http://www.chelsio.com/wp-content/uploads/resources/Lustre-Over-iWARP-vs-IB-FDR.pdf.Google Scholar
- RDMA Consortium. 2002. Architectural Specifications for RDMA over TCP/IP. Technical Report. Retrieved from https://www.rdmaconsortium.org/.Google Scholar
- Transaction Processing Performance Council. 2010. TPC-C Benchmark Standard Specification, Revision 5.11. Retrieved from http://www.tpc.org/tpc_documents_current_versions/pdf/tpc-c_v5.11.0.pdf.Google Scholar
- Brendan Cully, Jake Wires, Dutch Meyer, Kevin Jamieson, Keir Fraser, Tim Deegan, Daniel Stodden, Geoffre Lefebvre, Daniel Ferstay, and Andrew Warfield. 2014. Strata: High-performance scalable storage on virtualized non-volatile memory. In Proceedings of the 12th USENIX Conference on File and Storage Technologies (FAST’14). 17--31. Google ScholarDigital Library
- Datium. 2018. Open Convergence. Retrieved from http://www.datrium.com/open-convergence/.Google Scholar
- Excelero. 2017. Excelero NVMesh. Retrieved from https://www.excelero.com/product/nvmesh/.Google Scholar
- Facebook. 2018. RocksDB users. Retrieved from https://github.com/facebook/rocksdb/blob/master/USERS.md.Google Scholar
- Facebook Inc. 2015. RocksDB: A persistent key-value store for fast storage environments. Retrieved from http://rocksdb.org.Google Scholar
- FusionIO. 2013. Fusion-io flash memory as RAM relief.Google Scholar
- Sangjin Han, Norbert Egi, Aurojit Panda, Sylvia Ratnasamy, Guangyu Shi, and Scott Shenker. 2013. Network support for resource disaggregation in next-generation datacenters. In Proceedings of the 12th ACM Workshop on Hot Topics in Networks. ACM, 10. Google ScholarDigital Library
- Kieran Harty. 2016. Don’t Confuse Hyperconvergence With Web-Scale. Retrieved from http://www.networkcomputing.com/data-centers/dont-confuse-hyperconvergence-web-scale/445839104.Google Scholar
- HGST. 2014. LinkedIn scales to 200 million users with PCIe Flash storage from HGST. Retrieved from https://www.hgst.com/sites/default/files/resources/LinkedIn-Scales-to-200M-Users.pdf.Google Scholar
- Amber Huffman. 2012. NVM Express Revision 1.1. Retrieved from http://www.nvmexpress.org/.Google Scholar
- IBM Research. 2017. Crail. Retrieved from http://www.crail.io/.Google Scholar
- Facebook Inc. 2015. Open Compute Project. Retrieved from http://www.opencompute.org/projects.Google Scholar
- Intel. 2016. Intel Xeon Processor E5-2699 v4. Retrieved from https://ark.intel.com/products/91317/Intel-Xeon-Processor-E5-2699-v4-55M-Cache-2_20-GHz.Google Scholar
- Intel. 2016. Storage Performance Development Kit. Retrieved from http://www.spdk.io/.Google Scholar
- Intel. 2017. SPDK NVMe over Fabrics Target. Retrieved from http://www.spdk.io/doc/nvmf.html.Google Scholar
- Intel. 2017. SPDK NVMe over Fabrics Target Programming Guide. Retrieved from http://www.spdk.io/doc/nvmf_tgt_pg.html.Google Scholar
- Svilen Kanev, Juan Pablo Darago, Kim Hazelwood, Parthasarathy Ranganathan, Tipp Moseley, Gu-Yeon Wei, and David Brooks. 2015. Profiling a warehouse-scale computer. In Proceedings of the 42nd Annual International Symposium on Computer Architecture (ISCA’15). ACM, New York, NY, 158--169. Google ScholarDigital Library
- Hyeong-Jun Kim, Young-Sik Lee, and Jin-Soo Kim. 2016. NVMeDirect: A user-space I/O framework for application-specific optimization on NVMe SSDs. In Proceedings of the 8th USENIX Workshop on Hot Topics in Storage and File Systems (HotStorage’16). USENIX Association, Denver, CO. Retrieved from https://www.usenix.org/conference/hotstorage16/workshop-program/presentation/kim. Google ScholarDigital Library
- John Kim and David Fair. 2016. How ethernet RDMA protocols iWARP and RoCE support NVMe over fabrics (Ethernet Storage Forum). SNIA. Retrieved from https://www.snia.org/sites/default/files/ESF/How_Ethernet_RDMA_Protocols_Support_NVMe_over_Fabrics_Final.pdf.Google Scholar
- John F. Kim. 2014. Accelerating Ceph with flash and high speed networks. In Proceedings of the Storage Developer Conference. SNIA.Google Scholar
- Ana Klimovic, Christos Kozyrakis, Eno Thereska, Binu John, and Sanjeev Kumar. 2016. Flash storage disaggregation. In Proceedings of the Eleventh European Conference on Computer Systems (EuroSys’16). ACM, New York, NY, Article 29, 15 pages. Google ScholarDigital Library
- Mike Ko, J. Hufferd, M. Chadalapaka, Uri Elzur, H. Shah, and P. Thaler. 2003. iSCSI extensions for RDMA specification (version 1.0). Release Specification of the RDMA Consortium (2003).Google Scholar
- Percona Lab. 2008. tpcc-mysql. Retrieved from https://github.com/Percona-Lab/tpcc-mysql.Google Scholar
- Edward K. Lee and Chandramohan A. Thekkath. 1996. Petal: Distributed virtual disks. SIGOPS Oper. Syst. Rev. 30, 5 (Sept. 1996), 84--92. Google ScholarDigital Library
- Jiuxing Liu, Jiesheng Wu, Sushmitha P. Kini, Pete Wyckoff, and Dhabaleswar K. Panda. 2003. High performance RDMA-based MPI implementation over InfiniBand. In Proceedings of the 17th Annual International Conference on Supercomputing. ACM, 295--304. Google ScholarDigital Library
- Charles Loboz. 2012. Cloud resource usage--Heavy tailed distributions invalidating traditional capacity planning models. J. Grid Comput. 10, 1 (Mar. 2012), 85--108. Google ScholarDigital Library
- Xiaoyi Lu, Nusrat S. Islam, Md Wasi-Ur-Rahman, Jithin Jose, Hari Subramoni, Hao Wang, and Dhabaleswar K. Panda. 2013. High-performance design of Hadoop RPC with RDMA over InfiniBand. In Proceedings of the 42nd International Conference on Parallel Processing (ICPP’13). IEEE, 641--650. Google ScholarDigital Library
- Charlie Manese. 2014. Facebook and open compute, designing for efficiency and scale. SC14 Energy Efficient High Performance Computing Working Group.Google Scholar
- Mellanox. 2015. Connect X-4 VPI 100Gb. Retrieved from http://www.mellanox.com/related-docs/prod_adapter_cards/PB_ConnectX-4_VPI_OCP.pdf.Google Scholar
- Mellanox. 2015. SN2700. Retrieved from https://www.mellanox.com/related-docs/prod_eth_switches/PB_SN2700.pdf.Google Scholar
- James Mickens, Edmund B. Nightingale, Jeremy Elson, Krishna Nareddy, Darren Gehring, Bin Fan, Asim Kadav, Vijay Chidambaram, and Osama Khan. 2014. Blizzard: Fast, cloud-scale block storage for cloud-oblivious applications. In Proceedings of the 11th USENIX Conference on Networked Systems Design and Implementation (NSDI’14). USENIX Association, Berkeley, CA, 257--273. DOI:http://dl.acm.org/citation.cfm?id=2616448.2616473 Google ScholarDigital Library
- Microsoft. 2016. Improve Performance of a File Server with SMB Direct. Retrieved from https://technet.microsoft.com/en-us/library/jj134210(v=ws.11).aspx.Google Scholar
- Dave Minturn. 2016. NVMe over fabrics linux driver eco system. In Proceedings of the NVMe All Hands Meeting.Google Scholar
- Dave Minturn and J. Metz. 2015. Under the hood with NVMe over fabrics. In Proceedings of the Ethernet Storage Forum. SNIA. Retrieved from http://www.snia.org/sites/default/files/ESF/NVMe_Under_Hood_12_15_Final2.pdf.Google Scholar
- MySQL. 1998. MySQL. Retrieved from https://www.mysql.com/.Google Scholar
- MySQL. 2018. MySQL Customers. Retrieved from https://www.mysql.com/customers.Google Scholar
- Dell Networking. 2015. RDMA over converged ethernet technical brief. http://pleiades.ucsc.edu/doc/dell/network/Dell_Networking_RoCE_Configuration.pdf.Google Scholar
- NVM Express. 2016. NVM Express over Fabric 1.0. Retrieved from http://www.nvmexpress.org/.Google Scholar
- Jian Ouyang, Shiding Lin, Song Jiang, Zhenyu Hou, Yong Wang, and Yuanzheng Wang. 2014. SDF: Software-defined flash for web-scale internet storage systems. In Proceedings of the 19th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’14). ACM, New York, NY, 471--484. Google ScholarDigital Library
- Julia Parlmer and Stanley Zaffos. 2016. How to Determine When Hyperconverged Integrated Systems Can Replace Traditional Storage. Technical Report. Gartner.Google Scholar
- Kestutis Patiejunas and Amisha Jaiswal. 2016. Facebook’s disaggregated storage and compute for Map/Reduce. In Proceedings of Data @Scale.Google Scholar
- Ro Recio, P. Culley, D. Garcia, J. Hilland, and B. Metzler. 2005. An RDMA Protocol Specification. Technical Report. IETF Internet-draft draft-ietf-rddp-rdmap-03.txt (work in progress).Google Scholar
- Simon Robinson, John Abott, and Tim Stammers. 2015. The Emergence of Hyperconvergence. Technical Report. 451 Research.Google Scholar
- Steven Rostedt. 2008. ftrace—Function Tracer. Retrieved from https://www.kernel.org/doc/Documentation/trace/ftrace.txt.Google Scholar
- Brandon Salmon. 2015. Web scale vs. hyperconverged: Understand the differences. Retrieved from http://www.infoworld.com/article/3005572/enterprise-architecture/web-scale-vs-hyperconverged-understand-the-differences.html.Google Scholar
- Samsung. 2015. PM1725 NVMe PCIe SSD. Retrieved from http://www.samsung.com/semiconductor/global/file/insight/2015/11/pm1725-ProdOverview-2015-0.pdf.Google Scholar
- Bianca Schroeder, Raghav Lagisetty, and Arif Merchant. 2016. Flash reliability in production: The expected and the unexpected. In 14th USENIX Conference on File and Storage Technologies (FAST'16). USENIX Association, Santa Clara, CA, 67--80. Retrieved from http://usenix.org/conference/fast16/technical-sessions/presentation/schroeder. Google ScholarDigital Library
- Simon Sharwood. 2016. Disaggregated hyper-convergence thinks storage outside the box. Retrieved from https://www.theregister.co.uk/2016/03/24/disaggregated_hyper_convergence/.Google Scholar
- Woong Shin, Qichen Chen, Myoungwon Oh, Hyeonsang Eom, and Heon Y. Yeom. 2014. OS I/O path optimizations for flash solid-state drives. In Proceedings of the USENIX Annual Technical Conference (USENIX-ATC’14). USENIX Association, Philadelphia, PA, 483--488. Retrieved from https://www.usenix.org/conference/atc14/technical-sessions/presentation/shin. Google ScholarDigital Library
- Y. Son, N. Y. Song, H. Han, H. Eom, and H. Y. Yeom. 2014. A user-level file system for fast storage devices. In 2014 International Conference on Cloud and Autonomic Computing. 258--264. Google ScholarDigital Library
- T10. 2003. SCSI RDMA protocol-2 (SRP-2). Retrieved from http://www.t10.org/ftp/t10/drafts/srp2/srp2r00a.pdf.Google Scholar
- VMWare. 2017. VMware vSAN. Retrieved from http://www.vmware.com/products/virtual-san.html.Google Scholar
- Benjamin Walker. 2016. SPDK: Building blocks for scalable, high performance storage applications (Storage Developer Conference). SNIA. Retrieved from http://www.snia.org/sites/default/files/SDC/2016/presentations/performance/BenjaminWalker_SPDK_Building_Blocks_SDC_2016.pdf.Google Scholar
- Andrew Warfield. 2013. Architecting for “problematically fast” flash. COHO Data Virtualization Field Day 3. https://techfieldday.com/event/sfd4/.Google Scholar
- Qiumin Xu, Huzefa Siyamwala, Mrinmoy Ghosh, Tameesh Suri, Manu Awasthi, Zvika Guz, Anahita Shayesteh, and Vijay Balakrishnan. 2015. Performance analysis of NVMe SSDs and their implication on real world databases. In Proceedings of the 8th ACM International Systems and Storage Conference (SYSTOR’15). ACM, New York, NY, Article 6, 11 pages. Google ScholarDigital Library
- Jisoo Yang, Dave B. Minturn, and Frank Hady. 2012. When poll is better than interrupt. In Proceedings of the 10th USENIX Conference on File and Storage Technologies (FAST’12). USENIX Association, Berkeley, CA, 3--3. DOI:http://dl.acm.org/citation.cfm?id=2208461.2208464 Google ScholarDigital Library
- Young Jin Yu, Dong In Shin, Woong Shin, Nae Young Song, Jae Woo Choi, Hyeong Seog Kim, Hyeonsang Eom, and Heon Young Yeom. 2014. Optimizing the block I/O subsystem for fast storage devices. ACM Trans. Comput. Syst. 32, 2, Article 6 (June 2014). Google ScholarDigital Library
Index Terms
- Performance Characterization of NVMe-over-Fabrics Storage Disaggregation
Recommendations
Performance analysis of NVMe SSDs and their implication on real world databases
SYSTOR '15: Proceedings of the 8th ACM International Systems and Storage ConferenceThe storage subsystem has undergone tremendous innovation in order to keep up with the ever-increasing demand for throughput. Non Volatile Memory Express (NVMe) based solid state devices are the latest development in this domain, delivering ...
Flash storage disaggregation
EuroSys '16: Proceedings of the Eleventh European Conference on Computer SystemsPCIe-based Flash is commonly deployed to provide datacenter applications with high IO rates. However, its capacity and bandwidth are often underutilized as it is difficult to design servers with the right balance of CPU, memory and Flash resources over ...
NVMe-over-fabrics performance characterization and the path to low-overhead flash disaggregation
SYSTOR '17: Proceedings of the 10th ACM International Systems and Storage ConferenceStorage disaggregation separates compute and storage to different nodes in order to allow for independent resource scaling and thus, better hardware resource utilization. While disaggregation of hard-drives storage is a common practice, NVMe-SSD (i.e., ...
Comments