skip to main content
10.1145/3131704.3131708acmotherconferencesArticle/Chapter ViewAbstractPublication PagesinternetwareConference Proceedingsconference-collections
research-article

Application-centric SSD Cache Allocation for Hadoop Applications

Authors Info & Claims
Published:23 September 2017Publication History

ABSTRACT

Flash-based Solid State Drive (SSD) is widely used in the virtualization environment, usually as the cache of the hard disk drive-based Virtual Machine (VM) storage, to improve the IO performance. Existing SSD caching schemes are mainly driven by VM-centric metrics. They treat the VMs as independent units and focus on critical low-level performance metrics of individual VMs, such as the working set, the IO latency, or the throughput. However, for elastic Hadoop applications consisting of multiple VMs, the workload is rapidly changing, and the importance of differnet VMs may be different even if they have the same low-level IO pattern. In this situation, the VM-centric SSD caching schemes may not lead to the best performance, i.e., the shortest job completion time. Considering the importance of VMs and relationships among VMs inside the application may potentially better improve the performance, which we regard as the application-centric metrics. We propose the Application-Centric SSD caching for Hadoop applications (ACSSD), which reduces the job completion time from the application level. AC-SSD uses the genetic algorithm based approach to calculate the nearly optimal weights of virtual machines for allocating SSD cache space and controlling the I/O Operations Per Second (IOPS) based on the importance of the VMs. Moreover, AC-SSD introduces the closed-loop adaptation to face the rapidly changing workload. The evaluation shows that AC-SSD reduces the job completion time by up to 39% for IO sensitive workloads, and up to 29% for rapidly changing workloads.

References

  1. Amazon. 2017. Amazon Elastic MapReduce. (2017). https://aws.amazon.com/elasticmapreduce/Google ScholarGoogle Scholar
  2. Apache. 2017. Apache Hadoop. (2017). http://hadoop.apache.org/Google ScholarGoogle Scholar
  3. Apache. 2017. Apache Mahout: Scalable machine learning and data mining. (2017). http://mahout.apache.org/Google ScholarGoogle Scholar
  4. Dulcardo Arteaga, Jorge Cabrera, Jing Xu, Swaminathan Sundararaman, and Ming Zhao. 2016. CloudCache: On-demand Flash Cache Management for Cloud Computing. In Proceedings of the 14th Usenix Conference on File and Storage Technologies (FAST'16). USENIX Association, Berkeley, CA, USA, 355--369.Google ScholarGoogle Scholar
  5. Paul Barham, Boris Dragovic, Keir Fraser, Steven Hand, Tim Harris, Alex Ho, Rolf Neugebauer, Ian Pratt, and Andrew Warfield. 2003. Xen and the Art of Virtualization. In Proceedings of the Nineteenth ACM Symposium on Operating Systems Principles (SOSP '03). ACM, New York, NY, USA, 164--177. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Axel Busch, Qais Noorshams, Samuel Kounev, Anne Koziolek, Ralf Reussner, and Erich Amrehn. 2015. Automated Workload Characterization for I/O Performance Analysis in Virtualized Environments. In Proceedings of the 6th ACM/SPEC International Conference on Performance Engineering (ICPE '15). ACM, New York, NY, USA, 265--276. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. S. Byan, J. Lentini, A. Madan, L. Pabon, M. Condict, J. Kimmel, S. Kleiman, C. Small, and M. Storer. 2012. Mercury: Host-side flash caching for the data center. In Mass Storage Systems and Technologies (MSST), 2012 IEEE 28th Symposium on. 1--12.Google ScholarGoogle Scholar
  8. Autonomic Computing et al. 2006. An architectural blueprint for autonomic computing. IBM White Paper (2006).Google ScholarGoogle Scholar
  9. Lars George. 2011. HBase: The Definitive Guide: Random Access to Your Planet-Size Data. "O'Reilly Media, Inc.".Google ScholarGoogle Scholar
  10. Ajay Gulati, Ganesha Shanmuganathan, Xuechen Zhang, and Peter Varman. 2012. Demand Based Hierarchical QoS Using Storage Resource Pools. In Proceedings of the 2012 USENIX Conference on Annual Technical Conference (USENIX ATC'12). USENIX Association, Berkeley, CA, USA, 1--1.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Jacob Gorm Hansen and Eric Jul. 2010. Lithium: Virtual Machine Storage for the Cloud. In Proceedings of the 1st ACM Symposium on Cloud Computing (SoCC '10). ACM, New York, NY, USA, 15--26. https://doi.org/10.1145/1807128.1807134 Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Shengsheng Huang, Jie Huang, Jinquan Dai, Tao Xie, and Bo Huang. 2010. The Hi-Bench benchmark suite: Characterization of the MapReduce-based data analysis. In Data Engineering Workshops (ICDEW), 2010 IEEE 26th International Conference on. IEEE, 41--51.Google ScholarGoogle ScholarCross RefCross Ref
  13. Intel. 2017. Intel Optane Technology. (2017). http://www.intel.com/content/www/us/en/architecture-and-technology/intel-optane-technology.htmlGoogle ScholarGoogle Scholar
  14. Jaeho Kim, Donghee Lee, and Sam H. Noh. 2015. Towards SLO Complying SSDs Through OPS Isolation. In Proceedings of the 13th USENIX Conference on File and Storage Technologies (FAST'15). USENIX Association, Berkeley, CA, USA, 183--189.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Ricardo Koller, Ali Jose Mashtizadeh, and Raju Rangaswami. 2015. Centaur: Host-Side SSD Caching for Storage Performance Control. In Autonomic Computing (ICAC), 2015 IEEE International Conference on. 51--60. https://doi.org/10.1109/ICAC.2015.44Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Lanyue Lu, Thanumalayan Sankaranarayana Pillai, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau. 2016. WiscKey: Separating Keys from Values in SSD-conscious Storage. In Proceedings of the 14th Usenix Conference on File and Storage Technologies (FAST'16). USENIX Association, Berkeley, CA, USA, 133--148.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Tian Luo, Siyuan Ma, Rubao Lee, Xiaodong Zhang, Deng Liu, and Li Zhou. 2013. S-CAVE: Effective SSD Caching to Improve Virtual Machine Storage Performance. In Proceedings of the 22Nd International Conference on Parallel Architectures and Compilation Techniques (PACT '13). IEEE Press, Piscataway, NJ, USA, 103--112.Google ScholarGoogle Scholar
  18. Fei Meng, Li Zhou, Xiaosong Ma, Sandeep Uttamchandani, and Deng Liu. 2014. vCacheShare: Automated Server Flash Cache Space Management in a Virtualization Environment. In Proceedings of the 2014 USENIX Conference on USENIX Annual Technical Conference (USENIX ATC'14). USENIX Association, Berkeley, CA, USA, 133--144.Google ScholarGoogle Scholar
  19. Microsoft. 2017. HDInsight - Hadoop, Spark and R Solution for the Cloud. (2017). https://azure.microsoft.com/services/hdinsight/Google ScholarGoogle Scholar
  20. Yongseok Oh, Eunjae Lee, Choulseung Hyun, Jongmoo Choi, Donghee Lee, and Sam H. Noh. 2015. Enabling Cost-Effective Flash Based Caching with an Array of Commodity SSDs. In Proceedings of the 16th Annual Middleware Conference (Middleware '15). ACM, New York, NY, USA, 63--74. https://doi.org/10.1145/2814576.2814814 Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. OpenStack. 2017. OpenStack Sahara. (2017). https://docs.openstack.org/developer/sahara/Google ScholarGoogle Scholar
  22. Mohammad Shamma, Dutch T. Meyer, Jake Wires, Maria Ivanova, Norman C. Hutchinson, and Andrew Warfield. 2011. Capo: Recapitulating Storage for Virtual Desktops. In Proceedings of the 9th USENIX Conference on File and Stroage Technologies (FAST'11). USENIX Association, Berkeley, CA, USA, 3--3.Google ScholarGoogle Scholar
  23. K. Shvachko, Hairong Kuang, S. Radia, and R. Chansler. 2010. The Hadoop Distributed File System. In Mass Storage Systems and Technologies (MSST), 2010 IEEE 26th Symposium on. 1--10. https://doi.org/10.1109/MSST.2010.5496972Google ScholarGoogle Scholar
  24. Vinod Kumar Vavilapalli, Arun C. Murthy, Chris Douglas, Sharad Agarwal, Mahadev Konar, Robert Evans, Thomas Graves, Jason Lowe, Hitesh Shah, Siddharth Seth, Bikas Saha, Carlo Curino, Owen O'Malley, Sanjay Radia, Benjamin Reed, and Eric Baldeschwieler. 2013. Apache Hadoop YARN: Yet Another Resource Negotiator. In Proceedings of the 4th Annual Symposium on Cloud Computing (SOCC '13). ACM, New York, NY, USA, Article 5, 16 pages. https://doi.org/10.1145/2523616.2523633Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Lei Ye, Gen Lu, Sushanth Kumar, Chris Gniady, and John H. Hartman. 2010. Energy-efficient Storage in Virtual Machine Environments. In Proceedings of the 6th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments (VEE '10). ACM, New York, NY, USA, 75--84. https://doi.org/10.1145/1735997.1736009 Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Application-centric SSD Cache Allocation for Hadoop Applications

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Other conferences
          Internetware '17: Proceedings of the 9th Asia-Pacific Symposium on Internetware
          September 2017
          172 pages
          ISBN:9781450353137
          DOI:10.1145/3131704

          Copyright © 2017 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 23 September 2017

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article
          • Research
          • Refereed limited

          Acceptance Rates

          Overall Acceptance Rate55of111submissions,50%

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader