research-article

Application-centric SSD Cache Allocation for Hadoop Applications

Authors:
Zhen Tang

State Key Laboratory of Computer Science, Institute of Software, Chinese, Academy of Sciences, University of Chinese Academy of Sciences

State Key Laboratory of Computer Science, Institute of Software, Chinese, Academy of Sciences, University of Chinese Academy of Sciences
View Profile

,
Wei Wang

State Key Laboratory of Computer Science, Institute of Software, Chinese, Academy of Sciences, University of Chinese Academy of Sciences

State Key Laboratory of Computer Science, Institute of Software, Chinese, Academy of Sciences, University of Chinese Academy of Sciences
View Profile

,
Yu Huang

State Key Laboratory for Novel Software Technology, Department of Computer Science and Technology, Nanjing University

State Key Laboratory for Novel Software Technology, Department of Computer Science and Technology, Nanjing University
View Profile

,
Heng Wu

State Key Laboratory of Computer Science, Institute of Software, Chinese, Academy of Sciences, University of Chinese Academy of Sciences

State Key Laboratory of Computer Science, Institute of Software, Chinese, Academy of Sciences, University of Chinese Academy of Sciences
View Profile

,
Jun Wei

State Key Laboratory of Computer Science, Institute of Software, Chinese, Academy of Sciences, University of Chinese Academy of Sciences

State Key Laboratory of Computer Science, Institute of Software, Chinese, Academy of Sciences, University of Chinese Academy of Sciences
View Profile

,
Tao Huang

State Key Laboratory of Computer Science, Institute of Software, Chinese, Academy of Sciences, University of Chinese Academy of Sciences

State Key Laboratory of Computer Science, Institute of Software, Chinese, Academy of Sciences, University of Chinese Academy of Sciences
View Profile

Internetware '17: Proceedings of the 9th Asia-Pacific Symposium on InternetwareSeptember 2017Article No.: 5Pages 1–10https://doi.org/10.1145/3131704.3131708

Published:23 September 2017Publication History

Internetware '17: Proceedings of the 9th Asia-Pacific Symposium on Internetware

Pages 1–10

ABSTRACT

Flash-based Solid State Drive (SSD) is widely used in the virtualization environment, usually as the cache of the hard disk drive-based Virtual Machine (VM) storage, to improve the IO performance. Existing SSD caching schemes are mainly driven by VM-centric metrics. They treat the VMs as independent units and focus on critical low-level performance metrics of individual VMs, such as the working set, the IO latency, or the throughput. However, for elastic Hadoop applications consisting of multiple VMs, the workload is rapidly changing, and the importance of differnet VMs may be different even if they have the same low-level IO pattern. In this situation, the VM-centric SSD caching schemes may not lead to the best performance, i.e., the shortest job completion time. Considering the importance of VMs and relationships among VMs inside the application may potentially better improve the performance, which we regard as the application-centric metrics. We propose the Application-Centric SSD caching for Hadoop applications (ACSSD), which reduces the job completion time from the application level. AC-SSD uses the genetic algorithm based approach to calculate the nearly optimal weights of virtual machines for allocating SSD cache space and controlling the I/O Operations Per Second (IOPS) based on the importance of the VMs. Moreover, AC-SSD introduces the closed-loop adaptation to face the rapidly changing workload. The evaluation shows that AC-SSD reduces the job completion time by up to 39% for IO sensitive workloads, and up to 29% for rapidly changing workloads.

References

Amazon. 2017. Amazon Elastic MapReduce. (2017). https://aws.amazon.com/elasticmapreduce/Google Scholar
Apache. 2017. Apache Hadoop. (2017). http://hadoop.apache.org/Google Scholar
Apache. 2017. Apache Mahout: Scalable machine learning and data mining. (2017). http://mahout.apache.org/Google Scholar
Dulcardo Arteaga, Jorge Cabrera, Jing Xu, Swaminathan Sundararaman, and Ming Zhao. 2016. CloudCache: On-demand Flash Cache Management for Cloud Computing. In Proceedings of the 14th Usenix Conference on File and Storage Technologies (FAST'16). USENIX Association, Berkeley, CA, USA, 355--369.Google Scholar
Paul Barham, Boris Dragovic, Keir Fraser, Steven Hand, Tim Harris, Alex Ho, Rolf Neugebauer, Ian Pratt, and Andrew Warfield. 2003. Xen and the Art of Virtualization. In Proceedings of the Nineteenth ACM Symposium on Operating Systems Principles (SOSP '03). ACM, New York, NY, USA, 164--177. Google ScholarDigital Library
Axel Busch, Qais Noorshams, Samuel Kounev, Anne Koziolek, Ralf Reussner, and Erich Amrehn. 2015. Automated Workload Characterization for I/O Performance Analysis in Virtualized Environments. In Proceedings of the 6th ACM/SPEC International Conference on Performance Engineering (ICPE '15). ACM, New York, NY, USA, 265--276. Google ScholarDigital Library
S. Byan, J. Lentini, A. Madan, L. Pabon, M. Condict, J. Kimmel, S. Kleiman, C. Small, and M. Storer. 2012. Mercury: Host-side flash caching for the data center. In Mass Storage Systems and Technologies (MSST), 2012 IEEE 28th Symposium on. 1--12.Google Scholar
Autonomic Computing et al. 2006. An architectural blueprint for autonomic computing. IBM White Paper (2006).Google Scholar
Lars George. 2011. HBase: The Definitive Guide: Random Access to Your Planet-Size Data. "O'Reilly Media, Inc.".Google Scholar
Ajay Gulati, Ganesha Shanmuganathan, Xuechen Zhang, and Peter Varman. 2012. Demand Based Hierarchical QoS Using Storage Resource Pools. In Proceedings of the 2012 USENIX Conference on Annual Technical Conference (USENIX ATC'12). USENIX Association, Berkeley, CA, USA, 1--1.Google ScholarDigital Library
Jacob Gorm Hansen and Eric Jul. 2010. Lithium: Virtual Machine Storage for the Cloud. In Proceedings of the 1st ACM Symposium on Cloud Computing (SoCC '10). ACM, New York, NY, USA, 15--26. https://doi.org/10.1145/1807128.1807134 Google ScholarDigital Library
Shengsheng Huang, Jie Huang, Jinquan Dai, Tao Xie, and Bo Huang. 2010. The Hi-Bench benchmark suite: Characterization of the MapReduce-based data analysis. In Data Engineering Workshops (ICDEW), 2010 IEEE 26th International Conference on. IEEE, 41--51.Google ScholarCross Ref
Intel. 2017. Intel Optane Technology. (2017). http://www.intel.com/content/www/us/en/architecture-and-technology/intel-optane-technology.htmlGoogle Scholar
Jaeho Kim, Donghee Lee, and Sam H. Noh. 2015. Towards SLO Complying SSDs Through OPS Isolation. In Proceedings of the 13th USENIX Conference on File and Storage Technologies (FAST'15). USENIX Association, Berkeley, CA, USA, 183--189.Google ScholarDigital Library
Ricardo Koller, Ali Jose Mashtizadeh, and Raju Rangaswami. 2015. Centaur: Host-Side SSD Caching for Storage Performance Control. In Autonomic Computing (ICAC), 2015 IEEE International Conference on. 51--60. https://doi.org/10.1109/ICAC.2015.44Google ScholarDigital Library
Lanyue Lu, Thanumalayan Sankaranarayana Pillai, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau. 2016. WiscKey: Separating Keys from Values in SSD-conscious Storage. In Proceedings of the 14th Usenix Conference on File and Storage Technologies (FAST'16). USENIX Association, Berkeley, CA, USA, 133--148.Google ScholarDigital Library
Tian Luo, Siyuan Ma, Rubao Lee, Xiaodong Zhang, Deng Liu, and Li Zhou. 2013. S-CAVE: Effective SSD Caching to Improve Virtual Machine Storage Performance. In Proceedings of the 22Nd International Conference on Parallel Architectures and Compilation Techniques (PACT '13). IEEE Press, Piscataway, NJ, USA, 103--112.Google Scholar
Fei Meng, Li Zhou, Xiaosong Ma, Sandeep Uttamchandani, and Deng Liu. 2014. vCacheShare: Automated Server Flash Cache Space Management in a Virtualization Environment. In Proceedings of the 2014 USENIX Conference on USENIX Annual Technical Conference (USENIX ATC'14). USENIX Association, Berkeley, CA, USA, 133--144.Google Scholar
Microsoft. 2017. HDInsight - Hadoop, Spark and R Solution for the Cloud. (2017). https://azure.microsoft.com/services/hdinsight/Google Scholar
Yongseok Oh, Eunjae Lee, Choulseung Hyun, Jongmoo Choi, Donghee Lee, and Sam H. Noh. 2015. Enabling Cost-Effective Flash Based Caching with an Array of Commodity SSDs. In Proceedings of the 16th Annual Middleware Conference (Middleware '15). ACM, New York, NY, USA, 63--74. https://doi.org/10.1145/2814576.2814814 Google ScholarDigital Library
OpenStack. 2017. OpenStack Sahara. (2017). https://docs.openstack.org/developer/sahara/Google Scholar
Mohammad Shamma, Dutch T. Meyer, Jake Wires, Maria Ivanova, Norman C. Hutchinson, and Andrew Warfield. 2011. Capo: Recapitulating Storage for Virtual Desktops. In Proceedings of the 9th USENIX Conference on File and Stroage Technologies (FAST'11). USENIX Association, Berkeley, CA, USA, 3--3.Google Scholar
K. Shvachko, Hairong Kuang, S. Radia, and R. Chansler. 2010. The Hadoop Distributed File System. In Mass Storage Systems and Technologies (MSST), 2010 IEEE 26th Symposium on. 1--10. https://doi.org/10.1109/MSST.2010.5496972Google Scholar
Vinod Kumar Vavilapalli, Arun C. Murthy, Chris Douglas, Sharad Agarwal, Mahadev Konar, Robert Evans, Thomas Graves, Jason Lowe, Hitesh Shah, Siddharth Seth, Bikas Saha, Carlo Curino, Owen O'Malley, Sanjay Radia, Benjamin Reed, and Eric Baldeschwieler. 2013. Apache Hadoop YARN: Yet Another Resource Negotiator. In Proceedings of the 4th Annual Symposium on Cloud Computing (SOCC '13). ACM, New York, NY, USA, Article 5, 16 pages. https://doi.org/10.1145/2523616.2523633Google ScholarDigital Library
Lei Ye, Gen Lu, Sushanth Kumar, Chris Gniady, and John H. Hartman. 2010. Energy-efficient Storage in Virtual Machine Environments. In Proceedings of the 6th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments (VEE '10). ACM, New York, NY, USA, 75--84. https://doi.org/10.1145/1735997.1736009 Google ScholarDigital Library

Index Terms

Application-centric SSD Cache Allocation for Hadoop Applications
1. Information systems
  1. Information storage systems

Recommendations

Evaluation of Exclusive Data Allocation Between SSD Tier and SSD Cache in Storage Systems
ICEIS 2014: Proceedings of the 16th International Conference on Enterprise Information Systems - Volume 1

We proposed and evaluated the storage I/O response time with the exclusive allocation method between SSD for tiered volume and SSD for cache in the storage system utilizing SSD and HDD. In the proposed method, the SSD cache function with exclusive ...
Read More
ETD-Cache: an expiration-time driven cache scheme to make SSD-based read cache endurable and cost-efficient
CF '15: Proceedings of the 12th ACM International Conference on Computing Frontiers

Recently flash-based solid-state drives (SSDs) have been widely deployed as cache devices to boost system performance. However, classical SSD cache algorithms (e.g. LRU) replace the cached data frequently to maintain high hit rates. Such aggressive data ...
Read More
Cache isolation and thin provisioning of hypervisor caches
LCN '12: Proceedings of the 2012 IEEE 37th Conference on Local Computer Networks (LCN 2012)

Server virtualization has enabled resource consolidation and has minimized the need for additional and expensive hardware. Server virtualization has been widely deployed in a lot of organizations, because of the attractive benefits it offers like ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
Internetware '17: Proceedings of the 9th Asia-Pacific Symposium on Internetware
September 2017
172 pages
ISBN:9781450353137
DOI:10.1145/3131704
Conference Chairs:
Hong Mei,
Jian Lyu,
Zhi Jin,
Wenyun Zhao
Copyright © 2017 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 23 September 2017
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Hadoop
SSD
cache
virtualization
Qualifiers
- research-article
- Research
- Refereed limited
Conference

Acceptance Rates
Overall Acceptance Rate55of111submissions,50%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 1
  Total Citations
  View Citations
- 169
  Total Downloads
- Downloads (Last 12 months)3
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Application-centric SSD Cache Allocation for Hadoop Applications

Internetware '17: Proceedings of the 9th Asia-Pacific Symposium on Internetware

ABSTRACT

References

Cited By

Index Terms

Recommendations

Evaluation of Exclusive Data Allocation Between SSD Tier and SSD Cache in Storage Systems

ETD-Cache: an expiration-time driven cache scheme to make SSD-based read cache endurable and cost-efficient

Cache isolation and thin provisioning of hypervisor caches

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Application-centric SSD Cache Allocation for Hadoop Applications

Internetware '17: Proceedings of the 9th Asia-Pacific Symposium on Internetware

ABSTRACT

References

Cited By

Index Terms

Recommendations

Evaluation of Exclusive Data Allocation Between SSD Tier and SSD Cache in Storage Systems

ETD-Cache: an expiration-time driven cache scheme to make SSD-based read cache endurable and cost-efficient

Cache isolation and thin provisioning of hypervisor caches

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media