skip to main content
research-article
Open Access

Performance Characterization of NVMe-over-Fabrics Storage Disaggregation

Published:04 December 2018Publication History
Skip Abstract Section

Abstract

Storage disaggregation separates compute and storage to different nodes to allow for independent resource scaling and, thus, better hardware resource utilization. While disaggregation of hard-drives storage is a common practice, NVMe-SSD (i.e., PCIe-based SSD) disaggregation is considered more challenging. This is because SSDs are significantly faster than hard drives, so the latency overheads (due to both network and CPU processing) as well as the extra compute cycles needed for the offloading stack become much more pronounced.

In this work, we characterize the overheads of NVMe-SSD disaggregation. We show that NVMe-over-Fabrics (NVMe-oF)—a recently released remote storage protocol specification—reduces the overheads of remote access to a bare minimum, thus greatly increasing the cost-efficiency of Flash disaggregation. Specifically, while recent work showed that SSD storage disaggregation via iSCSI degrades application-level throughput by 20%, we report on negligible performance degradation with NVMe-oF—both when using stress-tests as well as with a more-realistic KV-store workload.

References

  1. Amazon. 2008. Amazon Elastic Block Store. Retrieved from https://aws.amazon.com/ebs/.Google ScholarGoogle Scholar
  2. Jens Axboe. 2014. FIO. Retrieved from https://github.com/axboe/fio.Google ScholarGoogle Scholar
  3. Luiz André Barroso, Jimmy Clidaras, and Urs Hölzle. 2013. The datacenter as a computer: An introduction to the design of warehouse-scale machines. Synth. Lect. Comput. Architect. 8, 3 (2013), 1--154.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. X bit labs. 2016. OCZ Demos 4 TiB, 16 TiB Solid-State Drives for Enterprise. https://web.archive.org/web/20130325121004/http:/www.xbitlabs.com/news/storage/display/20120110180208_OCZ_Demos_4TB_16TB_Solid_State_Drives_for_Enterprise.html.Google ScholarGoogle Scholar
  5. Matias Bjørling, Jens Axboe, David Nellans, and Philippe Bonnet. 2013. Linux block IO: Introducing multi-queue SSD access on multi-core systems. In Proceedings of the 6th International Systems and Storage Conference (SYSTOR’13). ACM, New York, NY, Article 22. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Brandon Hoff. 2016. RDMA Interconnects Paving the Way for NVMe over Fabrics Technology. Retrieved from http://www.roceinitiative.org/.Google ScholarGoogle Scholar
  7. David Cohen, Thomas Talpey, Arkady Kanevsky, Uri Cummings, Michael Krause, Renato Recio, Diego Crupnicoff, Lloyd Dickman, and Paul Grun. 2009. Remote direct memory access over the converged enhanced ethernet fabric: Evaluating the options. In Proceedings of the 17th IEEE Symposium on High Performance Interconnects (HOTI’09). IEEE, 123--130. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Chelsio Communications. 2014. Luster over iWARP RDMA at 40Gbps. http://www.chelsio.com/wp-content/uploads/resources/Lustre-Over-iWARP-vs-IB-FDR.pdf.Google ScholarGoogle Scholar
  9. RDMA Consortium. 2002. Architectural Specifications for RDMA over TCP/IP. Technical Report. Retrieved from https://www.rdmaconsortium.org/.Google ScholarGoogle Scholar
  10. Transaction Processing Performance Council. 2010. TPC-C Benchmark Standard Specification, Revision 5.11. Retrieved from http://www.tpc.org/tpc_documents_current_versions/pdf/tpc-c_v5.11.0.pdf.Google ScholarGoogle Scholar
  11. Brendan Cully, Jake Wires, Dutch Meyer, Kevin Jamieson, Keir Fraser, Tim Deegan, Daniel Stodden, Geoffre Lefebvre, Daniel Ferstay, and Andrew Warfield. 2014. Strata: High-performance scalable storage on virtualized non-volatile memory. In Proceedings of the 12th USENIX Conference on File and Storage Technologies (FAST’14). 17--31. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Datium. 2018. Open Convergence. Retrieved from http://www.datrium.com/open-convergence/.Google ScholarGoogle Scholar
  13. Excelero. 2017. Excelero NVMesh. Retrieved from https://www.excelero.com/product/nvmesh/.Google ScholarGoogle Scholar
  14. Facebook. 2018. RocksDB users. Retrieved from https://github.com/facebook/rocksdb/blob/master/USERS.md.Google ScholarGoogle Scholar
  15. Facebook Inc. 2015. RocksDB: A persistent key-value store for fast storage environments. Retrieved from http://rocksdb.org.Google ScholarGoogle Scholar
  16. FusionIO. 2013. Fusion-io flash memory as RAM relief.Google ScholarGoogle Scholar
  17. Sangjin Han, Norbert Egi, Aurojit Panda, Sylvia Ratnasamy, Guangyu Shi, and Scott Shenker. 2013. Network support for resource disaggregation in next-generation datacenters. In Proceedings of the 12th ACM Workshop on Hot Topics in Networks. ACM, 10. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Kieran Harty. 2016. Don’t Confuse Hyperconvergence With Web-Scale. Retrieved from http://www.networkcomputing.com/data-centers/dont-confuse-hyperconvergence-web-scale/445839104.Google ScholarGoogle Scholar
  19. HGST. 2014. LinkedIn scales to 200 million users with PCIe Flash storage from HGST. Retrieved from https://www.hgst.com/sites/default/files/resources/LinkedIn-Scales-to-200M-Users.pdf.Google ScholarGoogle Scholar
  20. Amber Huffman. 2012. NVM Express Revision 1.1. Retrieved from http://www.nvmexpress.org/.Google ScholarGoogle Scholar
  21. IBM Research. 2017. Crail. Retrieved from http://www.crail.io/.Google ScholarGoogle Scholar
  22. Facebook Inc. 2015. Open Compute Project. Retrieved from http://www.opencompute.org/projects.Google ScholarGoogle Scholar
  23. Intel. 2016. Intel Xeon Processor E5-2699 v4. Retrieved from https://ark.intel.com/products/91317/Intel-Xeon-Processor-E5-2699-v4-55M-Cache-2_20-GHz.Google ScholarGoogle Scholar
  24. Intel. 2016. Storage Performance Development Kit. Retrieved from http://www.spdk.io/.Google ScholarGoogle Scholar
  25. Intel. 2017. SPDK NVMe over Fabrics Target. Retrieved from http://www.spdk.io/doc/nvmf.html.Google ScholarGoogle Scholar
  26. Intel. 2017. SPDK NVMe over Fabrics Target Programming Guide. Retrieved from http://www.spdk.io/doc/nvmf_tgt_pg.html.Google ScholarGoogle Scholar
  27. Svilen Kanev, Juan Pablo Darago, Kim Hazelwood, Parthasarathy Ranganathan, Tipp Moseley, Gu-Yeon Wei, and David Brooks. 2015. Profiling a warehouse-scale computer. In Proceedings of the 42nd Annual International Symposium on Computer Architecture (ISCA’15). ACM, New York, NY, 158--169. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Hyeong-Jun Kim, Young-Sik Lee, and Jin-Soo Kim. 2016. NVMeDirect: A user-space I/O framework for application-specific optimization on NVMe SSDs. In Proceedings of the 8th USENIX Workshop on Hot Topics in Storage and File Systems (HotStorage’16). USENIX Association, Denver, CO. Retrieved from https://www.usenix.org/conference/hotstorage16/workshop-program/presentation/kim. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. John Kim and David Fair. 2016. How ethernet RDMA protocols iWARP and RoCE support NVMe over fabrics (Ethernet Storage Forum). SNIA. Retrieved from https://www.snia.org/sites/default/files/ESF/How_Ethernet_RDMA_Protocols_Support_NVMe_over_Fabrics_Final.pdf.Google ScholarGoogle Scholar
  30. John F. Kim. 2014. Accelerating Ceph with flash and high speed networks. In Proceedings of the Storage Developer Conference. SNIA.Google ScholarGoogle Scholar
  31. Ana Klimovic, Christos Kozyrakis, Eno Thereska, Binu John, and Sanjeev Kumar. 2016. Flash storage disaggregation. In Proceedings of the Eleventh European Conference on Computer Systems (EuroSys’16). ACM, New York, NY, Article 29, 15 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Mike Ko, J. Hufferd, M. Chadalapaka, Uri Elzur, H. Shah, and P. Thaler. 2003. iSCSI extensions for RDMA specification (version 1.0). Release Specification of the RDMA Consortium (2003).Google ScholarGoogle Scholar
  33. Percona Lab. 2008. tpcc-mysql. Retrieved from https://github.com/Percona-Lab/tpcc-mysql.Google ScholarGoogle Scholar
  34. Edward K. Lee and Chandramohan A. Thekkath. 1996. Petal: Distributed virtual disks. SIGOPS Oper. Syst. Rev. 30, 5 (Sept. 1996), 84--92. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Jiuxing Liu, Jiesheng Wu, Sushmitha P. Kini, Pete Wyckoff, and Dhabaleswar K. Panda. 2003. High performance RDMA-based MPI implementation over InfiniBand. In Proceedings of the 17th Annual International Conference on Supercomputing. ACM, 295--304. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Charles Loboz. 2012. Cloud resource usage--Heavy tailed distributions invalidating traditional capacity planning models. J. Grid Comput. 10, 1 (Mar. 2012), 85--108. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Xiaoyi Lu, Nusrat S. Islam, Md Wasi-Ur-Rahman, Jithin Jose, Hari Subramoni, Hao Wang, and Dhabaleswar K. Panda. 2013. High-performance design of Hadoop RPC with RDMA over InfiniBand. In Proceedings of the 42nd International Conference on Parallel Processing (ICPP’13). IEEE, 641--650. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Charlie Manese. 2014. Facebook and open compute, designing for efficiency and scale. SC14 Energy Efficient High Performance Computing Working Group.Google ScholarGoogle Scholar
  39. Mellanox. 2015. Connect X-4 VPI 100Gb. Retrieved from http://www.mellanox.com/related-docs/prod_adapter_cards/PB_ConnectX-4_VPI_OCP.pdf.Google ScholarGoogle Scholar
  40. Mellanox. 2015. SN2700. Retrieved from https://www.mellanox.com/related-docs/prod_eth_switches/PB_SN2700.pdf.Google ScholarGoogle Scholar
  41. James Mickens, Edmund B. Nightingale, Jeremy Elson, Krishna Nareddy, Darren Gehring, Bin Fan, Asim Kadav, Vijay Chidambaram, and Osama Khan. 2014. Blizzard: Fast, cloud-scale block storage for cloud-oblivious applications. In Proceedings of the 11th USENIX Conference on Networked Systems Design and Implementation (NSDI’14). USENIX Association, Berkeley, CA, 257--273. DOI:http://dl.acm.org/citation.cfm?id=2616448.2616473 Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Microsoft. 2016. Improve Performance of a File Server with SMB Direct. Retrieved from https://technet.microsoft.com/en-us/library/jj134210(v=ws.11).aspx.Google ScholarGoogle Scholar
  43. Dave Minturn. 2016. NVMe over fabrics linux driver eco system. In Proceedings of the NVMe All Hands Meeting.Google ScholarGoogle Scholar
  44. Dave Minturn and J. Metz. 2015. Under the hood with NVMe over fabrics. In Proceedings of the Ethernet Storage Forum. SNIA. Retrieved from http://www.snia.org/sites/default/files/ESF/NVMe_Under_Hood_12_15_Final2.pdf.Google ScholarGoogle Scholar
  45. MySQL. 1998. MySQL. Retrieved from https://www.mysql.com/.Google ScholarGoogle Scholar
  46. MySQL. 2018. MySQL Customers. Retrieved from https://www.mysql.com/customers.Google ScholarGoogle Scholar
  47. Dell Networking. 2015. RDMA over converged ethernet technical brief. http://pleiades.ucsc.edu/doc/dell/network/Dell_Networking_RoCE_Configuration.pdf.Google ScholarGoogle Scholar
  48. NVM Express. 2016. NVM Express over Fabric 1.0. Retrieved from http://www.nvmexpress.org/.Google ScholarGoogle Scholar
  49. Jian Ouyang, Shiding Lin, Song Jiang, Zhenyu Hou, Yong Wang, and Yuanzheng Wang. 2014. SDF: Software-defined flash for web-scale internet storage systems. In Proceedings of the 19th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’14). ACM, New York, NY, 471--484. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Julia Parlmer and Stanley Zaffos. 2016. How to Determine When Hyperconverged Integrated Systems Can Replace Traditional Storage. Technical Report. Gartner.Google ScholarGoogle Scholar
  51. Kestutis Patiejunas and Amisha Jaiswal. 2016. Facebook’s disaggregated storage and compute for Map/Reduce. In Proceedings of Data @Scale.Google ScholarGoogle Scholar
  52. Ro Recio, P. Culley, D. Garcia, J. Hilland, and B. Metzler. 2005. An RDMA Protocol Specification. Technical Report. IETF Internet-draft draft-ietf-rddp-rdmap-03.txt (work in progress).Google ScholarGoogle Scholar
  53. Simon Robinson, John Abott, and Tim Stammers. 2015. The Emergence of Hyperconvergence. Technical Report. 451 Research.Google ScholarGoogle Scholar
  54. Steven Rostedt. 2008. ftrace—Function Tracer. Retrieved from https://www.kernel.org/doc/Documentation/trace/ftrace.txt.Google ScholarGoogle Scholar
  55. Brandon Salmon. 2015. Web scale vs. hyperconverged: Understand the differences. Retrieved from http://www.infoworld.com/article/3005572/enterprise-architecture/web-scale-vs-hyperconverged-understand-the-differences.html.Google ScholarGoogle Scholar
  56. Samsung. 2015. PM1725 NVMe PCIe SSD. Retrieved from http://www.samsung.com/semiconductor/global/file/insight/2015/11/pm1725-ProdOverview-2015-0.pdf.Google ScholarGoogle Scholar
  57. Bianca Schroeder, Raghav Lagisetty, and Arif Merchant. 2016. Flash reliability in production: The expected and the unexpected. In 14th USENIX Conference on File and Storage Technologies (FAST'16). USENIX Association, Santa Clara, CA, 67--80. Retrieved from http://usenix.org/conference/fast16/technical-sessions/presentation/schroeder. Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. Simon Sharwood. 2016. Disaggregated hyper-convergence thinks storage outside the box. Retrieved from https://www.theregister.co.uk/2016/03/24/disaggregated_hyper_convergence/.Google ScholarGoogle Scholar
  59. Woong Shin, Qichen Chen, Myoungwon Oh, Hyeonsang Eom, and Heon Y. Yeom. 2014. OS I/O path optimizations for flash solid-state drives. In Proceedings of the USENIX Annual Technical Conference (USENIX-ATC’14). USENIX Association, Philadelphia, PA, 483--488. Retrieved from https://www.usenix.org/conference/atc14/technical-sessions/presentation/shin. Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. Y. Son, N. Y. Song, H. Han, H. Eom, and H. Y. Yeom. 2014. A user-level file system for fast storage devices. In 2014 International Conference on Cloud and Autonomic Computing. 258--264. Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. T10. 2003. SCSI RDMA protocol-2 (SRP-2). Retrieved from http://www.t10.org/ftp/t10/drafts/srp2/srp2r00a.pdf.Google ScholarGoogle Scholar
  62. VMWare. 2017. VMware vSAN. Retrieved from http://www.vmware.com/products/virtual-san.html.Google ScholarGoogle Scholar
  63. Benjamin Walker. 2016. SPDK: Building blocks for scalable, high performance storage applications (Storage Developer Conference). SNIA. Retrieved from http://www.snia.org/sites/default/files/SDC/2016/presentations/performance/BenjaminWalker_SPDK_Building_Blocks_SDC_2016.pdf.Google ScholarGoogle Scholar
  64. Andrew Warfield. 2013. Architecting for “problematically fast” flash. COHO Data Virtualization Field Day 3. https://techfieldday.com/event/sfd4/.Google ScholarGoogle Scholar
  65. Qiumin Xu, Huzefa Siyamwala, Mrinmoy Ghosh, Tameesh Suri, Manu Awasthi, Zvika Guz, Anahita Shayesteh, and Vijay Balakrishnan. 2015. Performance analysis of NVMe SSDs and their implication on real world databases. In Proceedings of the 8th ACM International Systems and Storage Conference (SYSTOR’15). ACM, New York, NY, Article 6, 11 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  66. Jisoo Yang, Dave B. Minturn, and Frank Hady. 2012. When poll is better than interrupt. In Proceedings of the 10th USENIX Conference on File and Storage Technologies (FAST’12). USENIX Association, Berkeley, CA, 3--3. DOI:http://dl.acm.org/citation.cfm?id=2208461.2208464 Google ScholarGoogle ScholarDigital LibraryDigital Library
  67. Young Jin Yu, Dong In Shin, Woong Shin, Nae Young Song, Jae Woo Choi, Hyeong Seog Kim, Hyeonsang Eom, and Heon Young Yeom. 2014. Optimizing the block I/O subsystem for fast storage devices. ACM Trans. Comput. Syst. 32, 2, Article 6 (June 2014). Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Performance Characterization of NVMe-over-Fabrics Storage Disaggregation

        Recommendations

        Reviews

        Dominik Strzalka

        When it comes to mass data processing, one of the most challenging issues is finding solutions for maximal resource utilization (processor, memory, storage, network, and so on). The underutilization of resources (overprovisioning of disk capacity and low processor workloads) in data centers is a well-known phenomenon "leading to an increased total cost of ownership." One of the frequently used and flexible possibilities is resource disaggregation, which "decouples compute and storage to different nodes" and reduces resource waste. But resources are spread over a network, and every access generates some additional latencies. If hard disk drives (HDDs) are used, the performance of the solution is low due to high-access latencies. However, disaggregation based on non-volatile memory express solid state drives (NVMe-SSD) is even more challenging because network latency also becomes important. SSDs are orders of magnitude faster than HDDs. Thus, new communication protocols are required. This paper is about NVMe-SSD disaggregation with the NVMe-over-fabrics (NVMe-oF) remote storage protocol. The presented NVMe-oF performance analysis and experiments (methodology is given in Section 3) show that this protocol minimizes the degradation of system performance. Section 4 presents a stress test of NVMe-oF latency breakdowns in comparison to Internet Small Computer Systems Interface (iSCSI) solutions. Section 5's "real-world input/output (I/O) intensive workloads" are based on RocksDB and MySQL databases. When used with the NVMe-oF protocol, low-performance degradation is observed and better storage server scaling is seen (compared to iSCSI). Section 6 presents interesting results related to server storage, processing efficiency, and scalability. The authors show, in a convincing manner, how the NVMe-oF protocol significantly supports NVMe-SSD disaggregation while also "preserv[ing] all the advantages discussed in previous literature."

        Access critical reviews of Computing literature here

        Become a reviewer for Computing Reviews.

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM Transactions on Storage
          ACM Transactions on Storage  Volume 14, Issue 4
          Special Section on Systor 2017 and Regular Papers
          November 2018
          175 pages
          ISSN:1553-3077
          EISSN:1553-3093
          DOI:10.1145/3297750
          • Editor:
          • Sam H. Noh
          Issue’s Table of Contents

          Copyright © 2018 Owner/Author

          Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 4 December 2018
          • Revised: 1 July 2018
          • Accepted: 1 July 2018
          • Received: 1 April 2018
          Published in tos Volume 14, Issue 4

          Check for updates

          Qualifiers

          • research-article
          • Research
          • Refereed

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader