skip to main content
10.1145/2901318.2901337acmotherconferencesArticle/Chapter ViewAbstractPublication PageseurosysConference Proceedingsconference-collections
research-article
Public Access

Flash storage disaggregation

Published:18 April 2016Publication History

ABSTRACT

PCIe-based Flash is commonly deployed to provide datacenter applications with high IO rates. However, its capacity and bandwidth are often underutilized as it is difficult to design servers with the right balance of CPU, memory and Flash resources over time and for multiple applications. This work examines Flash disaggregation as a way to deal with Flash overprovisioning. We tune remote access to Flash over commodity networks and analyze its impact on workloads sampled from real datacenter applications. We show that, while remote Flash access introduces a 20% throughput drop at the application level, disaggregation allows us to make up for these overheads through resource-efficient scale-out. Hence, we show that Flash disaggregation allows scaling CPU and Flash resources independently in a cost effective manner. We use our analysis to draw conclusions about data and control plane issues in remote storage.

References

  1. Amazon. Amazon Elastic Block Store. https://aws.amazon.com/ebs/, 2016.Google ScholarGoogle Scholar
  2. G. Ananthanarayanan, A. Ghodsi, S. Shenker, and I. Stoica. Disk-locality in datacenter computing considered irrelevant. In Proc. of USENIX Hot Topics in Operating Systems, HotOS' 13, pages 12--12, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. D. G. Andersen, J. Franklin, M. Kaminsky, A. Phanishayee, L. Tan, and V. Vasudevan. FAWN: a fast array of wimpy nodes. In Proc. of ACM SIGOPS Symposium on Operating Systems Principles, SOSP '09, pages 1--14. ACM, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. S. Angel, H. Ballani, T. Karagiannis, G. O'Shea, and E. Thereska. End-to-end performance isolation through virtual datacenters. In Proc. of USENIX Operating Systems Design and Implementation, OSDI'14, pages 233--248, Oct. 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Apache Software Foundation. Apache Thrift. https://thrift.apache.org, 2014.Google ScholarGoogle Scholar
  6. Avago Technologies. Storage and PCI Express -- A Natural Combination. http://www.avagotech.com/applications/datacenters/enterprise-storage, 2015.Google ScholarGoogle Scholar
  7. M. Balakrishnan, D. Malkhi, V. Prabhakaran, T. Wobber, M. Wei, and J. D. Davis. Corfu: A shared log design for flash clusters. In Proc. of USENIX Networked Systems Design and Implementation, NSDI'12, pages 1--1, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. S. Balakrishnan, R. Black, A. Donnelly, P. England, A. Glass, D. Harper, S. Legtchenko, A. Ogus, E. Peterson, and A. Rowstron. Pelican: A building block for exascale cold data storage. In Proc. of USENIX Operating Systems Design and Implementation, OSDI'14, pages 351--365, Oct. 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. L. A. Barroso and U. Hölzle. The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines. 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. A. Belay, G. Prekas, A. Klimovic, S. Grossman, C. Kozyrakis, and E. Bugnion. IX: A protected dataplane operating system for high throughput and low latency. In Proc. of USENIX Operating Systems Design and Implementation, OSDI'14, pages 49--65, Oct. 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. M. Chadalapaka, H. Shah, U. Elzur, P. Thaler, and M. Ko. A study of iSCSI extensions for RDMA (iSER). In Proc. of ACM SIGCOMM Workshop on Network-I/O Convergence: Experience, Lessons, Implications, NICELI '03, pages 209--219. ACM, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Chelsio Communications. NVM Express over Fabrics. http://www.chelsio.com/wp-content/uploads/resources/NVM_Express_Over_Fabrics.pdf, 2014.Google ScholarGoogle Scholar
  13. F. Chen, D. A. Koufaty, and X. Zhang. Understanding intrinsic characteristics and system implications of flash memory based solid state drives. In Proc. of Measurement and Modeling of Computer Systems, SIGMETRICS '09, pages 181--192. ACM, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. P. Costa, H. Ballani, K. Razavi, and I. Kash. R2C2: a network stack for rack-scale computers. In Proc. of ACM Conference on Special Interest Group on Data Communication, SIGCOMM '15, pages 551--564. ACM, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. B. Cully, J. Wires, D. Meyer, K. Jamieson, K. Fraser, T. Deegan, D. Stodden, G. Lefebvre, D. Ferstay, and A. Warfield. Strata: High-performance scalable storage on virtualized nonvolatile memory. In Proc. of USENIX File and Storage Technologies (FAST 14), pages 17--31. USENIX, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. C. Delimitrou and C. Kozyrakis. Quasar: Resource-efficient and qos-aware cluster management. In Proc. of International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS XIX, pages 127--144. ACM, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Dell Inc. PowerEdge PCIe Express Flash SSD. http://www.dell.com/learn/us/en/04/campaigns/poweredge-express-flash, 2015.Google ScholarGoogle Scholar
  18. Facebook Inc. Open Compute Project. http://www.opencompute.org/projects, 2015.Google ScholarGoogle Scholar
  19. Facebook Inc. RocksDB: A persistent key-value store for fast storage environments. http://rocksdb.org, 2015.Google ScholarGoogle Scholar
  20. Fusion IO. Atomic Series Server Flash. http://www.fusionio.com/products/atomic-series, 2015.Google ScholarGoogle Scholar
  21. S. Ghemawat and J. Dean. LevelDB. https://github.com/google/leveldb, 2014.Google ScholarGoogle Scholar
  22. S. Ghemawat, H. Gobioff, and S.-T. Leung. The Google file system. In Proc. of ACM Symposium on Operating Systems Principles, SOSP '03, pages 29--43. ACM, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Google. Protocol Buffers. https://developers.google.com/protocol-buffers, 2015.Google ScholarGoogle Scholar
  24. A. Gulati, I. Ahmad, and C. A. Waldspurger. Parda: Proportional allocation of resources for distributed storage access. In Proc. of USENIX File and Storage Technologies, FAST '09, pages 85--98, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. J. Hamilton. Keynote: Internet-scale service infrastructure efficiency. In Proc. of International Symposium on Computer Architecture, ISCA '09, June 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. S. Han, N. Egi, A. Panda, S. Ratnasamy, G. Shi, and S. Shenker. Network support for resource disaggregation in next-generation datacenters. In Proc. of ACM Workshop on Hot Topics in Networks, HotNets-XII, pages 10:1--10:7. ACM, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. S. Han, S. Marshall, B.-G. Chun, and S. Ratnasamy. Megapipe: A new programming interface for scalable network i/o. In Proc. of USENIX Operating Systems Design and Implementation, OSDI'12, pages 135--148, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. HGST. LinkedIn scales to 200 million users with PCIe Flash storage from HGST. https://www.hgst.com/sites/default/files/resources/LinkedIn-Scales-to-200M-Users-CS.pdf, 2014.Google ScholarGoogle Scholar
  29. B. Hindman, A. Konwinski, M. Zaharia, A. Ghodsi, A. D. Joseph, R. Katz, S. Shenker, and I. Stoica. Mesos: A platform for fine-grained resource sharing in the data center. In Proc. of USENIX Networked Systems Design and Implementation, NSDI'11, pages 295--308, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. HP. Moonshot system. http://www8.hp.com/us/en/products/servers/moonshot/, 2015.Google ScholarGoogle Scholar
  31. ideawu. SSDB with RocksDB. https://github.com/ideawu/ssdb-rocks, 2014.Google ScholarGoogle Scholar
  32. Intel. Intel Ethernet Flow Director. http://www.intel.com/content/www/us/en/ethernet-products/ethernet-flow-director-video.html, 2016.Google ScholarGoogle Scholar
  33. Intel Corp. Intel Rack Scale Architecture Platform. http://www.intel.com/content/dam/www/public/us/en/documents/guides/rack-scale-hardware-guide.pdf, 2015.Google ScholarGoogle Scholar
  34. Intel Corp. Intel Solid-State Drive DC P3600 Series. http://www.intel.com/content/dam/www/public/us/en/documents/product-specifications/ssd-dc-p3600-spec.pdf, 2015.Google ScholarGoogle Scholar
  35. Jens Axboe. Flexible IO tester (FIO). http://git.kernel.dk/?p=fio.git;a=summary, 2015.Google ScholarGoogle Scholar
  36. E. Y. Jeong, S. Woo, M. Jamshed, H. Jeong, S. Ihm, D. Han, and K. Park. mTCP: A highly scalable user-level tcp stack for multicore systems. In Proc. of USENIX Networked Systems Design and Implementation, NSDI'14, pages 489--502, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. A. Joglekar, M. E. Kounavis, and F. L. Berry. A scalable and high performance software iSCSI implementation. In In Proc. of USENIX File and Storage Technologies., pages 267--280, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. A. Kalia, M. Kaminsky, and D. G. Andersen. Using RDMA efficiently for key-value services. SIGCOMM Comput. Commun. Rev., 44(4):295--306, Aug. 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. S. Kanev, J. P. Darago, K. M. Hazelwood, P. Ranganathan, T. Moseley, G. Wei, and D. M. Brooks. Profiling a warehouse-scale computer. In Proc. of Annual International Symposium on Computer Architecture, ISCA '15, pages 158--169, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. E. K. Lee and C. A. Thekkath. Petal: Distributed virtual disks. In Proc. of Architectural Support for Programming Languages and Operating Systems, ASPLOS VII, pages 84--92. ACM, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. J. Leverich. Mutilate: High-Performance Memcached Load Generator. https://github.com/leverich/mutilate, 2014.Google ScholarGoogle Scholar
  42. J. Leverich and C. Kozyrakis. Reconciling high server utilization and sub-millisecond quality-of-service. In Proc. of European Conference on Computer Systems, EuroSys '14, pages 4:1--4:14. ACM, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. K. T. Lim, J. Chang, T. N. Mudge, P. Ranganathan, S. K. Reinhardt, and T. F. Wenisch. Disaggregated memory for expansion and sharing in blade servers. In 36th International Symposium on Computer Architecture (ISCA 2009), pages 267--278, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. K. T. Lim, Y. Turner, J. R. Santos, A. AuYoung, J. Chang, P. Ranganathan, and T. F. Wenisch. System-level implications of disaggregated memory. In 18th IEEE International Symposium on High Performance Computer Architecture, HPCA 2012, pages 189--200, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. LinkedIn Inc. Project Voldemort: A distributed key-value storage system. http://www.project-voldemort.com/voldemort, 2015.Google ScholarGoogle Scholar
  46. C. Loboz. Cloud resource usage-heavy tailed distributions invalidating traditional capacity planning models. Journal of Grid Computing, 10(1):85--108, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Y. Lu and D. Du. Performance study of iSCSI-based storage subsystems. Communications Magazine, IEEE, 41(8):76--82, Aug 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. I. Marinos, R. N. Watson, and M. Handley. Network stack specialization for performance. In Proc. of ACM SIGCOMM, SIGCOMM'14, pages 175--186, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Mellanox Technologies. RoCE in the Data Center. http://www.mellanox.com/related-docs/whitepapers/roce_in_the_data_center.pdf, 2014.Google ScholarGoogle Scholar
  50. R. Micheloni, A. Marelli, and K. Eshghi. Inside Solid State Drives (SSDs). Springer Publishing Company, Incorporated, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. J. Mickens, E. B. Nightingale, J. Elson, D. Gehring, B. Fan, A. Kadav, V. Chidambaram, O. Khan, and K. Nareddy. Blizzard: Fast, cloud-scale block storage for cloud-oblivious applications. In Proc. of USENIX Networked Systems Design and Implementation, NSDI'14, pages 257--273, Apr. 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Microsoft. Introduction to Receive Side Scaling. https://msdn.microsoft.com/library/windows/hardware/ff556942.aspx, 2016.Google ScholarGoogle Scholar
  53. D. Narayanan, E. Thereska, A. Donnelly, S. Elnikety, and A. Rowstron. Migrating server storage to SSDs: Analysis of tradeoffs. In Proc. of European Conference on Computer Systems, EuroSys '09, pages 145--158. ACM, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. R. Nathuji, A. Kansal, and A. Ghaffarkhah. Q-clouds: Managing performance interference effects for QoS-aware clouds. In Proc. of European Conference on Computer Systems, EuroSys '10, pages 237--250. ACM, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. NVM Express Inc. NVM Express: the optimized PCI Express SSD interface. http://www.nvmexpress.org, 2015.Google ScholarGoogle Scholar
  56. J. Ouyang, S. Lin, J. Song, Z. Hou, Y. Wang, and Y. Wang. SDF: software-defined flash for web-scale internet storage systems. In Architectural Support for Programming Languages and Operating Systems, ASPLOS XIX, pages 471--484, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. S. Park and K. Shen. FIOS: a fair, efficient flash I/O scheduler. In Proc. of USENIX File and Storage Technologies, FAST'12, page 13, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. A. Pesterev, J. Strauss, N. Zeldovich, and R. T. Morris. Improving network connection locality on multicore systems. In Proc. of ACM European Conference on Computer Systems, EuroSys'12, pages 337--350. ACM, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. P. Radkov, L. Yin, P. Goyal, P. Sarkar, and P. Shenoy. A performance comparison of NFS and iSCSI for IP-networked storage. In In Proc. of USENIX File and Storage Technologies., pages 101--114, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. R. Sandberg. Design and implementation of the Sun network filesystem. In In Proc. of USENIX Summer Conference., pages 119--130. 1985.Google ScholarGoogle Scholar
  61. Satran, et al. Internet Small Computer Systems Interface (iSCSI). https://www.ietf.org/rfc/rfc3720.txt, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  62. M. Schwarzkopf, A. Konwinski, M. Abd-El-Malek, and J. Wilkes. Omega: flexible, scalable schedulers for large compute clusters. In SIGOPS European Conference on Computer Systems, EuroSys'13, pages 351--364, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. SeaMicro. SM15000 fabric compute systems. http://www.seamicro.com/sites/default/files/SM15000_Datasheet.pdf, 2015.Google ScholarGoogle Scholar
  64. D. Shue and M. J. Freedman. From application requests to virtual iops: provisioned key-value storage with libra. In Proc. of European Conference on Computer Systems, EuroSys'14, pages 17:1--17:14, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  65. D. Shue, M. J. Freedman, and A. Shaikh. Performance isolation and fairness for multi-tenant cloud storage. In Proc. of USENIX Operating Systems Design and Implementation, OSDI'12, pages 349--362, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  66. K. Shvachko, H. Kuang, S. Radia, and R. Chansler. The Hadoop distributed file system. In Proc. of IEEE Mass Storage Systems and Technologies, MSST '10, pages 1--10. IEEE Computer Society, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  67. Solarflare Communications Inc. OpenOnload. http://www.openonload.org/, 2013.Google ScholarGoogle Scholar
  68. M. Stokely, A. Mehrabian, C. Albrecht, F. Labelle, and A. Merchant. Projecting disk usage based on historical trends in a cloud environment. In ScienceCloud Proc. of International Workshop on Scientific Cloud Computing, pages 63--70, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  69. C.-C. Tu, C.-t. Lee, and T.-c. Chiueh. Secure I/O device sharing among virtual machines on multiple hosts. In Proc. of International Symposium on Computer Architecture, ISCA '13, pages 108--119. ACM, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  70. M. Uysal, A. Merchant, and G. A. Alvarez. Using MEMS-based storage in disk arrays. In Proc. of USENIX File and Storage Technologies, FAST'03, pages 7--7, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  71. A. Verma, L. Pedrosa, M. R. Korupolu, D. Oppenheimer, E. Tune, and J. Wilkes. Large-scale cluster management at Google with Borg. In Proc. of European Conference on Computer Systems, EuroSys'15, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  72. VMware. Virtual SAN. https://www.vmware.com/products/virtual-san, 2016.Google ScholarGoogle Scholar
  73. M. Wachs, M. Abd-El-Malek, E. Thereska, and G. R. Ganger. Argon: Performance insulation for shared storage servers. In Proc. of USENIX File and Storage Technologies, FAST '07, pages 5--5, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  74. A. Wang, S. Venkataraman, S. Alspaugh, R. Katz, and I. Stoica. Cake: Enabling high-level SLOs on shared storage systems. In Proc. of ACM Symposium on Cloud Computing, SoCC '12, pages 14:1--14:14. ACM, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  75. A. Warfield, R. Ross, K. Fraser, C. Limpach, and S. Hand. Parallax: Managing storage for a million machines. In Proc. of USENIX Hot Topics in Operating Systems - Volume 10, HOTOS'05, pages 4--4, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  76. D. Xinidis, A. Bilas, and M. D. Flouris. Performance evaluation of commodity iSCSI-based storage systems. In Proc. of IEEE/NASA Goddard Mass Storage Systems and Technologies, MSST '05, pages 261--269. IEEE Computer Society, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Flash storage disaggregation

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Other conferences
      EuroSys '16: Proceedings of the Eleventh European Conference on Computer Systems
      April 2016
      605 pages
      ISBN:9781450342407
      DOI:10.1145/2901318

      Copyright © 2016 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 18 April 2016

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      EuroSys '16 Paper Acceptance Rate38of180submissions,21%Overall Acceptance Rate241of1,308submissions,18%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader