ABSTRACT
Flash-based Solid State Drive (SSD) is widely used in the virtualization environment, usually as the cache of the hard disk drive-based Virtual Machine (VM) storage, to improve the IO performance. Existing SSD caching schemes are mainly driven by VM-centric metrics. They treat the VMs as independent units and focus on critical low-level performance metrics of individual VMs, such as the working set, the IO latency, or the throughput. However, for elastic Hadoop applications consisting of multiple VMs, the workload is rapidly changing, and the importance of differnet VMs may be different even if they have the same low-level IO pattern. In this situation, the VM-centric SSD caching schemes may not lead to the best performance, i.e., the shortest job completion time. Considering the importance of VMs and relationships among VMs inside the application may potentially better improve the performance, which we regard as the application-centric metrics. We propose the Application-Centric SSD caching for Hadoop applications (ACSSD), which reduces the job completion time from the application level. AC-SSD uses the genetic algorithm based approach to calculate the nearly optimal weights of virtual machines for allocating SSD cache space and controlling the I/O Operations Per Second (IOPS) based on the importance of the VMs. Moreover, AC-SSD introduces the closed-loop adaptation to face the rapidly changing workload. The evaluation shows that AC-SSD reduces the job completion time by up to 39% for IO sensitive workloads, and up to 29% for rapidly changing workloads.
- Amazon. 2017. Amazon Elastic MapReduce. (2017). https://aws.amazon.com/elasticmapreduce/Google Scholar
- Apache. 2017. Apache Hadoop. (2017). http://hadoop.apache.org/Google Scholar
- Apache. 2017. Apache Mahout: Scalable machine learning and data mining. (2017). http://mahout.apache.org/Google Scholar
- Dulcardo Arteaga, Jorge Cabrera, Jing Xu, Swaminathan Sundararaman, and Ming Zhao. 2016. CloudCache: On-demand Flash Cache Management for Cloud Computing. In Proceedings of the 14th Usenix Conference on File and Storage Technologies (FAST'16). USENIX Association, Berkeley, CA, USA, 355--369.Google Scholar
- Paul Barham, Boris Dragovic, Keir Fraser, Steven Hand, Tim Harris, Alex Ho, Rolf Neugebauer, Ian Pratt, and Andrew Warfield. 2003. Xen and the Art of Virtualization. In Proceedings of the Nineteenth ACM Symposium on Operating Systems Principles (SOSP '03). ACM, New York, NY, USA, 164--177. Google ScholarDigital Library
- Axel Busch, Qais Noorshams, Samuel Kounev, Anne Koziolek, Ralf Reussner, and Erich Amrehn. 2015. Automated Workload Characterization for I/O Performance Analysis in Virtualized Environments. In Proceedings of the 6th ACM/SPEC International Conference on Performance Engineering (ICPE '15). ACM, New York, NY, USA, 265--276. Google ScholarDigital Library
- S. Byan, J. Lentini, A. Madan, L. Pabon, M. Condict, J. Kimmel, S. Kleiman, C. Small, and M. Storer. 2012. Mercury: Host-side flash caching for the data center. In Mass Storage Systems and Technologies (MSST), 2012 IEEE 28th Symposium on. 1--12.Google Scholar
- Autonomic Computing et al. 2006. An architectural blueprint for autonomic computing. IBM White Paper (2006).Google Scholar
- Lars George. 2011. HBase: The Definitive Guide: Random Access to Your Planet-Size Data. "O'Reilly Media, Inc.".Google Scholar
- Ajay Gulati, Ganesha Shanmuganathan, Xuechen Zhang, and Peter Varman. 2012. Demand Based Hierarchical QoS Using Storage Resource Pools. In Proceedings of the 2012 USENIX Conference on Annual Technical Conference (USENIX ATC'12). USENIX Association, Berkeley, CA, USA, 1--1.Google ScholarDigital Library
- Jacob Gorm Hansen and Eric Jul. 2010. Lithium: Virtual Machine Storage for the Cloud. In Proceedings of the 1st ACM Symposium on Cloud Computing (SoCC '10). ACM, New York, NY, USA, 15--26. https://doi.org/10.1145/1807128.1807134 Google ScholarDigital Library
- Shengsheng Huang, Jie Huang, Jinquan Dai, Tao Xie, and Bo Huang. 2010. The Hi-Bench benchmark suite: Characterization of the MapReduce-based data analysis. In Data Engineering Workshops (ICDEW), 2010 IEEE 26th International Conference on. IEEE, 41--51.Google ScholarCross Ref
- Intel. 2017. Intel Optane Technology. (2017). http://www.intel.com/content/www/us/en/architecture-and-technology/intel-optane-technology.htmlGoogle Scholar
- Jaeho Kim, Donghee Lee, and Sam H. Noh. 2015. Towards SLO Complying SSDs Through OPS Isolation. In Proceedings of the 13th USENIX Conference on File and Storage Technologies (FAST'15). USENIX Association, Berkeley, CA, USA, 183--189.Google ScholarDigital Library
- Ricardo Koller, Ali Jose Mashtizadeh, and Raju Rangaswami. 2015. Centaur: Host-Side SSD Caching for Storage Performance Control. In Autonomic Computing (ICAC), 2015 IEEE International Conference on. 51--60. https://doi.org/10.1109/ICAC.2015.44Google ScholarDigital Library
- Lanyue Lu, Thanumalayan Sankaranarayana Pillai, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau. 2016. WiscKey: Separating Keys from Values in SSD-conscious Storage. In Proceedings of the 14th Usenix Conference on File and Storage Technologies (FAST'16). USENIX Association, Berkeley, CA, USA, 133--148.Google ScholarDigital Library
- Tian Luo, Siyuan Ma, Rubao Lee, Xiaodong Zhang, Deng Liu, and Li Zhou. 2013. S-CAVE: Effective SSD Caching to Improve Virtual Machine Storage Performance. In Proceedings of the 22Nd International Conference on Parallel Architectures and Compilation Techniques (PACT '13). IEEE Press, Piscataway, NJ, USA, 103--112.Google Scholar
- Fei Meng, Li Zhou, Xiaosong Ma, Sandeep Uttamchandani, and Deng Liu. 2014. vCacheShare: Automated Server Flash Cache Space Management in a Virtualization Environment. In Proceedings of the 2014 USENIX Conference on USENIX Annual Technical Conference (USENIX ATC'14). USENIX Association, Berkeley, CA, USA, 133--144.Google Scholar
- Microsoft. 2017. HDInsight - Hadoop, Spark and R Solution for the Cloud. (2017). https://azure.microsoft.com/services/hdinsight/Google Scholar
- Yongseok Oh, Eunjae Lee, Choulseung Hyun, Jongmoo Choi, Donghee Lee, and Sam H. Noh. 2015. Enabling Cost-Effective Flash Based Caching with an Array of Commodity SSDs. In Proceedings of the 16th Annual Middleware Conference (Middleware '15). ACM, New York, NY, USA, 63--74. https://doi.org/10.1145/2814576.2814814 Google ScholarDigital Library
- OpenStack. 2017. OpenStack Sahara. (2017). https://docs.openstack.org/developer/sahara/Google Scholar
- Mohammad Shamma, Dutch T. Meyer, Jake Wires, Maria Ivanova, Norman C. Hutchinson, and Andrew Warfield. 2011. Capo: Recapitulating Storage for Virtual Desktops. In Proceedings of the 9th USENIX Conference on File and Stroage Technologies (FAST'11). USENIX Association, Berkeley, CA, USA, 3--3.Google Scholar
- K. Shvachko, Hairong Kuang, S. Radia, and R. Chansler. 2010. The Hadoop Distributed File System. In Mass Storage Systems and Technologies (MSST), 2010 IEEE 26th Symposium on. 1--10. https://doi.org/10.1109/MSST.2010.5496972Google Scholar
- Vinod Kumar Vavilapalli, Arun C. Murthy, Chris Douglas, Sharad Agarwal, Mahadev Konar, Robert Evans, Thomas Graves, Jason Lowe, Hitesh Shah, Siddharth Seth, Bikas Saha, Carlo Curino, Owen O'Malley, Sanjay Radia, Benjamin Reed, and Eric Baldeschwieler. 2013. Apache Hadoop YARN: Yet Another Resource Negotiator. In Proceedings of the 4th Annual Symposium on Cloud Computing (SOCC '13). ACM, New York, NY, USA, Article 5, 16 pages. https://doi.org/10.1145/2523616.2523633Google ScholarDigital Library
- Lei Ye, Gen Lu, Sushanth Kumar, Chris Gniady, and John H. Hartman. 2010. Energy-efficient Storage in Virtual Machine Environments. In Proceedings of the 6th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments (VEE '10). ACM, New York, NY, USA, 75--84. https://doi.org/10.1145/1735997.1736009 Google ScholarDigital Library
Index Terms
- Application-centric SSD Cache Allocation for Hadoop Applications
Recommendations
Evaluation of Exclusive Data Allocation Between SSD Tier and SSD Cache in Storage Systems
ICEIS 2014: Proceedings of the 16th International Conference on Enterprise Information Systems - Volume 1We proposed and evaluated the storage I/O response time with the exclusive allocation method between SSD for tiered volume and SSD for cache in the storage system utilizing SSD and HDD. In the proposed method, the SSD cache function with exclusive ...
ETD-Cache: an expiration-time driven cache scheme to make SSD-based read cache endurable and cost-efficient
CF '15: Proceedings of the 12th ACM International Conference on Computing FrontiersRecently flash-based solid-state drives (SSDs) have been widely deployed as cache devices to boost system performance. However, classical SSD cache algorithms (e.g. LRU) replace the cached data frequently to maintain high hit rates. Such aggressive data ...
Cache isolation and thin provisioning of hypervisor caches
LCN '12: Proceedings of the 2012 IEEE 37th Conference on Local Computer Networks (LCN 2012)Server virtualization has enabled resource consolidation and has minimized the need for additional and expensive hardware. Server virtualization has been widely deployed in a lot of organizations, because of the attractive benefits it offers like ...
Comments