research-article

Learning Driven Parallelization for Large-Scale Video Workload in Hybrid CPU-GPU Cluster

Authors:
Haitao Zhang

Beijing Key Lab of Intelligent Telecomm. Software and Multimedia, BUPT, Beijing, China

Beijing Key Lab of Intelligent Telecomm. Software and Multimedia, BUPT, Beijing, China
View Profile

,
Bingchang Tang

Beijing Key Lab of Intelligent Telecomm. Software and Multimedia, BUPT, Beijing, China

Beijing Key Lab of Intelligent Telecomm. Software and Multimedia, BUPT, Beijing, China
View Profile

,
Xin Geng

Beijing Key Lab of Intelligent Telecomm. Software and Multimedia, BUPT, Beijing, China

Beijing Key Lab of Intelligent Telecomm. Software and Multimedia, BUPT, Beijing, China
View Profile

,
Huadong Ma

Beijing Key Lab of Intelligent Telecomm. Software and Multimedia, BUPT, Beijing, China

Beijing Key Lab of Intelligent Telecomm. Software and Multimedia, BUPT, Beijing, China
View Profile

ICPP '18: Proceedings of the 47th International Conference on Parallel ProcessingAugust 2018Article No.: 32Pages 1–10https://doi.org/10.1145/3225058.3225070

Published:13 August 2018Publication History

ICPP '18: Proceedings of the 47th International Conference on Parallel Processing

Pages 1–10

ABSTRACT

Hybrid CPU-GPU cluster has become a promising computing paradigm for large-scale video analytics. However, the uncertainty and variability of workloads and heterogeneous resources in the cluster can lead to the unbalanced use of the hybrid computing resources and further cause the performance degradation of the computing platform. This problem becomes more challenging with the computation complexity and dependencies of video tasks in the hybrid cluster. In this paper, we focus on the video workload parallelization problem with fine-grained task division and feature description in the hybrid CPU-GPU cluster. Firstly, for achieving high resource utilization and task throughput, we propose a two-stage video task scheduling approach based on deep reinforcement learning. In our approach, a task execution node is selected by the cluster-level scheduler for the mutually independent video tasks, and then the node-level scheduler assigns the interrelated video subtasks to the appropriate computing units. By using the deep Q-network, the two-stage scheduling model is online learned to perform the current optimal scheduling actions according to the runtime status of cluster environments, the characteristics of video tasks, and the dependencies between video tasks. Secondly, based on the transfer learning technology, a scheduling strategy generalization method is proposed to efficiently rebuild the task scheduling model referring to the existing model. Finally, we conduct the extensive experiments to analyze the impact of the model parameters on the scheduling actions, and then the experimental results also validate that our learning based task scheduling approach outperforms the other widely used methods.

References

R. Bleuse, S. Hunold, S. Kedad-Sidhoum, F. Monna, G. Mounié, and D. Trystram. 2017. Scheduling Independent Moldable Tasks on Multi-Cores with GPUs. IEEE Transactions on Parallel and Distributed Systems 28, 9 (2017), 2689--2702.Google ScholarDigital Library
L. Bottou. 2012. Stochastic Gradient Descent Tricks. In Neural Networks: Tricks of the Trade. 421--436.Google Scholar
M. X. Cheng, J. Li, and S. H. Nazarian. 2018. DRL-Cloud: Deep Reinforcement Learning-based Resource Provisioning and Task Scheduling for Cloud Service Providers. In ASP-DAC. 129--134. Google ScholarDigital Library
M. Ciznicki, K. Kurowski, and J. Weglarz. 2017. Energy Aware Scheduling Model and Online Heuristics for Stencil Codes on Heterogeneous Computing Architectures. Cluster Computing 20, 3 (2017), 2535--2549. Google ScholarDigital Library
R. Salakhutdinov E. Parisotto, J. L. Ba. 2016. Actor-Mimic: Deep Multitask and Transfer Reinforcement Learning. In arXiv preprint arXiv:1511.06342.Google Scholar
D. P. Fu, Y. H. Xiong, C. D. Lu, M. Wu, and K. Y. Jiang. 2016. A Task Scheduling Method for Energy-Efficient Cloud Video Surveillance System Using A Time-Clustering-Based Genetic Algorithm. In IEEE ICPADS. 661--668.Google Scholar
J. Fung and S. Mann. 2008. Using Graphics Devices in Reverse: GPU-based Image Processing and Computer Vision. In IEEE ICME. 9--12.Google Scholar
Y. Y. Gao, H. T. Zhang, Y. P. Zhu, B. C. Tang, and H. D. Ma. 2017. A Load-Aware Data Migration Scheme for Distributed Surveillance Video Processing with Hybrid Storage Architecture. In IEEE HPCC. 563--570.Google Scholar
M. Hussin, Y. C. Lee, and A. Y. Zomaya. 2011. Efficient Energy Management using Adaptive Reinforcement Learning-based Scheduling in Large-Scale Distributed Systems. In IEEE ICPP. 385--393. Google ScholarDigital Library
A. Ilic, S. Momcilovic, N. Roma, and L. Sousa. 2014. FEVES: Framework for Efficient Parallel Video Encoding on Heterogeneous Systems. In IEEE ICPP. 20--29. Google ScholarDigital Library
J. W. Liu and H. Y. Shen. 2016. Dependency-Aware and Resource-Efficient Scheduling for Heterogeneous Jobs in Clouds. In IEEE Cloudcom. 110--117.Google Scholar
Y. Ma, L. Z. Wang, A. Y. Zomaya, D. Chen, and R. Ranjan. 2014. Task-Tree based Large-Scale Mosaicking for Massive Remote Sensed Imageries with Dynamic DAG Scheduling. IEEE Transactions on Parallel and Distributed Systems 25, 8 (2014), 2126--2137.Google ScholarCross Ref
H. Z. Mao, M. Alizadeh, I. Menache, and S. Kandula. 2016. Resource Management with Deep Reinforcement Learning. In ACM HotNets'16. 50--56. Google ScholarDigital Library
V. Mnih, K. Kavukcuoglu, and D. Silver et al. 2013. Playing Atari with Deep Reinforcement Learning. arXiv preprint arXiv:1312.5602 (2013).Google Scholar
A. Naithani, S. Eyerman, and L. Eeckhout. 2017. Reliability-Aware Scheduling on Heterogeneous Multicore Processors. In IEEE HPCA. 397--408.Google Scholar
S. J. Pan and Q. Yang. 2010. A Survey on Transfer Learning. IEEE Transactions on Knowledge and Data Engineering 22, 10 (2010), 1345--1359. Google ScholarDigital Library
R. Pereira, M. Azambuja, K. Breitman, and M. Endler. 2010. An Architecture for Distributed High Performance Video Processing in the Cloud. In IEEE CLOUD. 482--489. Google ScholarDigital Library
D. Silver, A. Huang, and C. J. Maddison et al. 2016. Mastering the Game of Go with Deep Neural Networks and Tree Search. Nature 529, 7587 (2016), 484--489.Google Scholar
X. D. Song, X. L. Peng, J. Z. Xu, G. G. Shi, and F. Wu. 2015. Cloud-based Distributed Image Coding. IEEE Transactions on Circuits and Systems for Video Technology 25, 12 (2015), 1926--1940.Google ScholarDigital Library
G. Teodoro, T. M. Kurc, T. Pan, L. A. Cooper, J. Kong, P. Widener, and J. H. Saltz. 2012. Accelerating Large Scale Image Analyses on Parallel, CPU-GPU Equipped Systems. In IEEE IPDPS. 1093--1104. Google ScholarDigital Library
G. Teodoro, T. Pan, T. M. Kurc, J. Kong, and L. A. Cooper. 2013. High-Throughput Analysis of Large Microscopy Image Datasets on CPU-GPU Cluster Platforms. In IEEE IPDPS. 103--114. Google ScholarDigital Library
Z. Tong, Z. Xiao, K. Li, and K. Li. 2014. Proactive Scheduling in Distributed Computing---A Reinforcement Learning Approach. Elsevier Journal of Parallel and Distributed Computing 74, 7 (2014), 2662--2672.Google ScholarCross Ref
Y. H. Xiong, S. Y. Wan, J. H. She, M. Wu, and Y. He. 2016. An Energy-Optimization-based Method of Task Scheduling for a Cloud Video Surveillance Center. Journal of Network and Computer Applications 59 (2016), 63--73. Google ScholarDigital Library
H. B. Yang, J. H. Guo, C. Liang, Z. Lei, and C. S. Wang. 2016. An Optimization of the Delay Scheduling Algorithm for Real-Time Video Stream Processing. In Frontier Computing. 173--183.Google Scholar
H. T. Zhang, B. Xu, J. Yan, L. J. Liu, and H. D. Ma. 2016. Proactive Data Placement for Surveillance Video Processing in Heterogeneous Cluster. In IEEE Cloudcom. 206--213.Google Scholar
H. T. Zhang, J. Yan, and Y. Kou. 2016. Efficient Online Surveillance Video Processing Based on Spark Framework. In Bigcom. 309--318.Google Scholar
L. X. Zhang, K. L. Li, Y. M. Xu, J. Mei, and F. Zhang. 2015. Maximizing Reliability with Energy Conservation for Parallel Task Scheduling in a Heterogeneous Cluster. Information Sciences 319 (2015), 113--131. Google ScholarDigital Library
T. Zhang and J. Li. 2015. Online Task Scheduling for LiDAR Data Preprocessing on Hybrid GPU/CPU Devices: A Reinforcement Learning Approach. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 8, 1 (2015), 386--397.Google ScholarCross Ref

Recommendations

On the Efficacy of a Fused CPU+GPU Processor (or APU) for Parallel Computing
SAAHPC '11: Proceedings of the 2011 Symposium on Application Accelerators in High-Performance Computing

The graphics processing unit (GPU) has made significant strides as an accelerator in parallel computing. However, because the GPU has resided out on PCIe as a discrete device, the performance of GPU applications can be bottlenecked by data transfers ...
Read More
Optimized HPL for AMD GPU and multi-core CPU usage

The installation of the LOEWE-CSC ( http://csc.uni-frankfurt.de/csc/__ __51 ) supercomputer at the Goethe University in Frankfurt lead to the development of a Linpack which can fully utilize the installed AMD Cypress GPUs. At its core, a fast DGEMM for ...
Read More
Boosting CUDA Applications with CPU---GPU Hybrid Computing

This paper presents a cooperative heterogeneous computing framework which enables the efficient utilization of available computing resources of host CPU cores for CUDA kernels, which are designed to run only on GPU. The proposed system exploits at ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

ICPP '18: Proceedings of the 47th International Conference on Parallel Processing
August 2018
945 pages
ISBN:9781450365109
DOI:10.1145/3225058

Copyright © 2018 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 13 August 2018
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Heterogeneous computing
deep reinforcement learning
task scheduling
transfer learning
video processing
Qualifiers
- research-article
- Research
- Refereed limited
Conference

Acceptance Rates
ICPP '18 Paper Acceptance Rate91of313submissions,29%Overall Acceptance Rate91of313submissions,29%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 3
  Total Citations
  View Citations
- 199
  Total Downloads
- Downloads (Last 12 months)15
- Downloads (Last 6 weeks)4
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Learning Driven Parallelization for Large-Scale Video Workload in Hybrid CPU-GPU Cluster

ICPP '18: Proceedings of the 47th International Conference on Parallel Processing

ABSTRACT

References

Cited By

Recommendations

On the Efficacy of a Fused CPU+GPU Processor (or APU) for Parallel Computing

Optimized HPL for AMD GPU and multi-core CPU usage

Boosting CUDA Applications with CPU---GPU Hybrid Computing

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Learning Driven Parallelization for Large-Scale Video Workload in Hybrid CPU-GPU Cluster

ICPP '18: Proceedings of the 47th International Conference on Parallel Processing

ABSTRACT

References

Cited By

Recommendations

On the Efficacy of a Fused CPU+GPU Processor (or APU) for Parallel Computing

Optimized HPL for AMD GPU and multi-core CPU usage

Boosting CUDA Applications with CPU---GPU Hybrid Computing

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media