ABSTRACT
Hybrid CPU-GPU cluster has become a promising computing paradigm for large-scale video analytics. However, the uncertainty and variability of workloads and heterogeneous resources in the cluster can lead to the unbalanced use of the hybrid computing resources and further cause the performance degradation of the computing platform. This problem becomes more challenging with the computation complexity and dependencies of video tasks in the hybrid cluster. In this paper, we focus on the video workload parallelization problem with fine-grained task division and feature description in the hybrid CPU-GPU cluster. Firstly, for achieving high resource utilization and task throughput, we propose a two-stage video task scheduling approach based on deep reinforcement learning. In our approach, a task execution node is selected by the cluster-level scheduler for the mutually independent video tasks, and then the node-level scheduler assigns the interrelated video subtasks to the appropriate computing units. By using the deep Q-network, the two-stage scheduling model is online learned to perform the current optimal scheduling actions according to the runtime status of cluster environments, the characteristics of video tasks, and the dependencies between video tasks. Secondly, based on the transfer learning technology, a scheduling strategy generalization method is proposed to efficiently rebuild the task scheduling model referring to the existing model. Finally, we conduct the extensive experiments to analyze the impact of the model parameters on the scheduling actions, and then the experimental results also validate that our learning based task scheduling approach outperforms the other widely used methods.
- R. Bleuse, S. Hunold, S. Kedad-Sidhoum, F. Monna, G. Mounié, and D. Trystram. 2017. Scheduling Independent Moldable Tasks on Multi-Cores with GPUs. IEEE Transactions on Parallel and Distributed Systems 28, 9 (2017), 2689--2702.Google ScholarDigital Library
- L. Bottou. 2012. Stochastic Gradient Descent Tricks. In Neural Networks: Tricks of the Trade. 421--436.Google Scholar
- M. X. Cheng, J. Li, and S. H. Nazarian. 2018. DRL-Cloud: Deep Reinforcement Learning-based Resource Provisioning and Task Scheduling for Cloud Service Providers. In ASP-DAC. 129--134. Google ScholarDigital Library
- M. Ciznicki, K. Kurowski, and J. Weglarz. 2017. Energy Aware Scheduling Model and Online Heuristics for Stencil Codes on Heterogeneous Computing Architectures. Cluster Computing 20, 3 (2017), 2535--2549. Google ScholarDigital Library
- R. Salakhutdinov E. Parisotto, J. L. Ba. 2016. Actor-Mimic: Deep Multitask and Transfer Reinforcement Learning. In arXiv preprint arXiv:1511.06342.Google Scholar
- D. P. Fu, Y. H. Xiong, C. D. Lu, M. Wu, and K. Y. Jiang. 2016. A Task Scheduling Method for Energy-Efficient Cloud Video Surveillance System Using A Time-Clustering-Based Genetic Algorithm. In IEEE ICPADS. 661--668.Google Scholar
- J. Fung and S. Mann. 2008. Using Graphics Devices in Reverse: GPU-based Image Processing and Computer Vision. In IEEE ICME. 9--12.Google Scholar
- Y. Y. Gao, H. T. Zhang, Y. P. Zhu, B. C. Tang, and H. D. Ma. 2017. A Load-Aware Data Migration Scheme for Distributed Surveillance Video Processing with Hybrid Storage Architecture. In IEEE HPCC. 563--570.Google Scholar
- M. Hussin, Y. C. Lee, and A. Y. Zomaya. 2011. Efficient Energy Management using Adaptive Reinforcement Learning-based Scheduling in Large-Scale Distributed Systems. In IEEE ICPP. 385--393. Google ScholarDigital Library
- A. Ilic, S. Momcilovic, N. Roma, and L. Sousa. 2014. FEVES: Framework for Efficient Parallel Video Encoding on Heterogeneous Systems. In IEEE ICPP. 20--29. Google ScholarDigital Library
- J. W. Liu and H. Y. Shen. 2016. Dependency-Aware and Resource-Efficient Scheduling for Heterogeneous Jobs in Clouds. In IEEE Cloudcom. 110--117.Google Scholar
- Y. Ma, L. Z. Wang, A. Y. Zomaya, D. Chen, and R. Ranjan. 2014. Task-Tree based Large-Scale Mosaicking for Massive Remote Sensed Imageries with Dynamic DAG Scheduling. IEEE Transactions on Parallel and Distributed Systems 25, 8 (2014), 2126--2137.Google ScholarCross Ref
- H. Z. Mao, M. Alizadeh, I. Menache, and S. Kandula. 2016. Resource Management with Deep Reinforcement Learning. In ACM HotNets'16. 50--56. Google ScholarDigital Library
- V. Mnih, K. Kavukcuoglu, and D. Silver et al. 2013. Playing Atari with Deep Reinforcement Learning. arXiv preprint arXiv:1312.5602 (2013).Google Scholar
- A. Naithani, S. Eyerman, and L. Eeckhout. 2017. Reliability-Aware Scheduling on Heterogeneous Multicore Processors. In IEEE HPCA. 397--408.Google Scholar
- S. J. Pan and Q. Yang. 2010. A Survey on Transfer Learning. IEEE Transactions on Knowledge and Data Engineering 22, 10 (2010), 1345--1359. Google ScholarDigital Library
- R. Pereira, M. Azambuja, K. Breitman, and M. Endler. 2010. An Architecture for Distributed High Performance Video Processing in the Cloud. In IEEE CLOUD. 482--489. Google ScholarDigital Library
- D. Silver, A. Huang, and C. J. Maddison et al. 2016. Mastering the Game of Go with Deep Neural Networks and Tree Search. Nature 529, 7587 (2016), 484--489.Google Scholar
- X. D. Song, X. L. Peng, J. Z. Xu, G. G. Shi, and F. Wu. 2015. Cloud-based Distributed Image Coding. IEEE Transactions on Circuits and Systems for Video Technology 25, 12 (2015), 1926--1940.Google ScholarDigital Library
- G. Teodoro, T. M. Kurc, T. Pan, L. A. Cooper, J. Kong, P. Widener, and J. H. Saltz. 2012. Accelerating Large Scale Image Analyses on Parallel, CPU-GPU Equipped Systems. In IEEE IPDPS. 1093--1104. Google ScholarDigital Library
- G. Teodoro, T. Pan, T. M. Kurc, J. Kong, and L. A. Cooper. 2013. High-Throughput Analysis of Large Microscopy Image Datasets on CPU-GPU Cluster Platforms. In IEEE IPDPS. 103--114. Google ScholarDigital Library
- Z. Tong, Z. Xiao, K. Li, and K. Li. 2014. Proactive Scheduling in Distributed Computing---A Reinforcement Learning Approach. Elsevier Journal of Parallel and Distributed Computing 74, 7 (2014), 2662--2672.Google ScholarCross Ref
- Y. H. Xiong, S. Y. Wan, J. H. She, M. Wu, and Y. He. 2016. An Energy-Optimization-based Method of Task Scheduling for a Cloud Video Surveillance Center. Journal of Network and Computer Applications 59 (2016), 63--73. Google ScholarDigital Library
- H. B. Yang, J. H. Guo, C. Liang, Z. Lei, and C. S. Wang. 2016. An Optimization of the Delay Scheduling Algorithm for Real-Time Video Stream Processing. In Frontier Computing. 173--183.Google Scholar
- H. T. Zhang, B. Xu, J. Yan, L. J. Liu, and H. D. Ma. 2016. Proactive Data Placement for Surveillance Video Processing in Heterogeneous Cluster. In IEEE Cloudcom. 206--213.Google Scholar
- H. T. Zhang, J. Yan, and Y. Kou. 2016. Efficient Online Surveillance Video Processing Based on Spark Framework. In Bigcom. 309--318.Google Scholar
- L. X. Zhang, K. L. Li, Y. M. Xu, J. Mei, and F. Zhang. 2015. Maximizing Reliability with Energy Conservation for Parallel Task Scheduling in a Heterogeneous Cluster. Information Sciences 319 (2015), 113--131. Google ScholarDigital Library
- T. Zhang and J. Li. 2015. Online Task Scheduling for LiDAR Data Preprocessing on Hybrid GPU/CPU Devices: A Reinforcement Learning Approach. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 8, 1 (2015), 386--397.Google ScholarCross Ref
Recommendations
On the Efficacy of a Fused CPU+GPU Processor (or APU) for Parallel Computing
SAAHPC '11: Proceedings of the 2011 Symposium on Application Accelerators in High-Performance ComputingThe graphics processing unit (GPU) has made significant strides as an accelerator in parallel computing. However, because the GPU has resided out on PCIe as a discrete device, the performance of GPU applications can be bottlenecked by data transfers ...
Optimized HPL for AMD GPU and multi-core CPU usage
The installation of the LOEWE-CSC ( http://csc.uni-frankfurt.de/csc/__ __51 ) supercomputer at the Goethe University in Frankfurt lead to the development of a Linpack which can fully utilize the installed AMD Cypress GPUs. At its core, a fast DGEMM for ...
Boosting CUDA Applications with CPU---GPU Hybrid Computing
This paper presents a cooperative heterogeneous computing framework which enables the efficient utilization of available computing resources of host CPU cores for CUDA kernels, which are designed to run only on GPU. The proposed system exploits at ...
Comments