skip to main content
10.1145/3225058.3225070acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicppConference Proceedingsconference-collections
research-article

Learning Driven Parallelization for Large-Scale Video Workload in Hybrid CPU-GPU Cluster

Authors Info & Claims
Published:13 August 2018Publication History

ABSTRACT

Hybrid CPU-GPU cluster has become a promising computing paradigm for large-scale video analytics. However, the uncertainty and variability of workloads and heterogeneous resources in the cluster can lead to the unbalanced use of the hybrid computing resources and further cause the performance degradation of the computing platform. This problem becomes more challenging with the computation complexity and dependencies of video tasks in the hybrid cluster. In this paper, we focus on the video workload parallelization problem with fine-grained task division and feature description in the hybrid CPU-GPU cluster. Firstly, for achieving high resource utilization and task throughput, we propose a two-stage video task scheduling approach based on deep reinforcement learning. In our approach, a task execution node is selected by the cluster-level scheduler for the mutually independent video tasks, and then the node-level scheduler assigns the interrelated video subtasks to the appropriate computing units. By using the deep Q-network, the two-stage scheduling model is online learned to perform the current optimal scheduling actions according to the runtime status of cluster environments, the characteristics of video tasks, and the dependencies between video tasks. Secondly, based on the transfer learning technology, a scheduling strategy generalization method is proposed to efficiently rebuild the task scheduling model referring to the existing model. Finally, we conduct the extensive experiments to analyze the impact of the model parameters on the scheduling actions, and then the experimental results also validate that our learning based task scheduling approach outperforms the other widely used methods.

References

  1. R. Bleuse, S. Hunold, S. Kedad-Sidhoum, F. Monna, G. Mounié, and D. Trystram. 2017. Scheduling Independent Moldable Tasks on Multi-Cores with GPUs. IEEE Transactions on Parallel and Distributed Systems 28, 9 (2017), 2689--2702.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. L. Bottou. 2012. Stochastic Gradient Descent Tricks. In Neural Networks: Tricks of the Trade. 421--436.Google ScholarGoogle Scholar
  3. M. X. Cheng, J. Li, and S. H. Nazarian. 2018. DRL-Cloud: Deep Reinforcement Learning-based Resource Provisioning and Task Scheduling for Cloud Service Providers. In ASP-DAC. 129--134. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. M. Ciznicki, K. Kurowski, and J. Weglarz. 2017. Energy Aware Scheduling Model and Online Heuristics for Stencil Codes on Heterogeneous Computing Architectures. Cluster Computing 20, 3 (2017), 2535--2549. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. R. Salakhutdinov E. Parisotto, J. L. Ba. 2016. Actor-Mimic: Deep Multitask and Transfer Reinforcement Learning. In arXiv preprint arXiv:1511.06342.Google ScholarGoogle Scholar
  6. D. P. Fu, Y. H. Xiong, C. D. Lu, M. Wu, and K. Y. Jiang. 2016. A Task Scheduling Method for Energy-Efficient Cloud Video Surveillance System Using A Time-Clustering-Based Genetic Algorithm. In IEEE ICPADS. 661--668.Google ScholarGoogle Scholar
  7. J. Fung and S. Mann. 2008. Using Graphics Devices in Reverse: GPU-based Image Processing and Computer Vision. In IEEE ICME. 9--12.Google ScholarGoogle Scholar
  8. Y. Y. Gao, H. T. Zhang, Y. P. Zhu, B. C. Tang, and H. D. Ma. 2017. A Load-Aware Data Migration Scheme for Distributed Surveillance Video Processing with Hybrid Storage Architecture. In IEEE HPCC. 563--570.Google ScholarGoogle Scholar
  9. M. Hussin, Y. C. Lee, and A. Y. Zomaya. 2011. Efficient Energy Management using Adaptive Reinforcement Learning-based Scheduling in Large-Scale Distributed Systems. In IEEE ICPP. 385--393. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. A. Ilic, S. Momcilovic, N. Roma, and L. Sousa. 2014. FEVES: Framework for Efficient Parallel Video Encoding on Heterogeneous Systems. In IEEE ICPP. 20--29. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. J. W. Liu and H. Y. Shen. 2016. Dependency-Aware and Resource-Efficient Scheduling for Heterogeneous Jobs in Clouds. In IEEE Cloudcom. 110--117.Google ScholarGoogle Scholar
  12. Y. Ma, L. Z. Wang, A. Y. Zomaya, D. Chen, and R. Ranjan. 2014. Task-Tree based Large-Scale Mosaicking for Massive Remote Sensed Imageries with Dynamic DAG Scheduling. IEEE Transactions on Parallel and Distributed Systems 25, 8 (2014), 2126--2137.Google ScholarGoogle ScholarCross RefCross Ref
  13. H. Z. Mao, M. Alizadeh, I. Menache, and S. Kandula. 2016. Resource Management with Deep Reinforcement Learning. In ACM HotNets'16. 50--56. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. V. Mnih, K. Kavukcuoglu, and D. Silver et al. 2013. Playing Atari with Deep Reinforcement Learning. arXiv preprint arXiv:1312.5602 (2013).Google ScholarGoogle Scholar
  15. A. Naithani, S. Eyerman, and L. Eeckhout. 2017. Reliability-Aware Scheduling on Heterogeneous Multicore Processors. In IEEE HPCA. 397--408.Google ScholarGoogle Scholar
  16. S. J. Pan and Q. Yang. 2010. A Survey on Transfer Learning. IEEE Transactions on Knowledge and Data Engineering 22, 10 (2010), 1345--1359. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. R. Pereira, M. Azambuja, K. Breitman, and M. Endler. 2010. An Architecture for Distributed High Performance Video Processing in the Cloud. In IEEE CLOUD. 482--489. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. D. Silver, A. Huang, and C. J. Maddison et al. 2016. Mastering the Game of Go with Deep Neural Networks and Tree Search. Nature 529, 7587 (2016), 484--489.Google ScholarGoogle Scholar
  19. X. D. Song, X. L. Peng, J. Z. Xu, G. G. Shi, and F. Wu. 2015. Cloud-based Distributed Image Coding. IEEE Transactions on Circuits and Systems for Video Technology 25, 12 (2015), 1926--1940.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. G. Teodoro, T. M. Kurc, T. Pan, L. A. Cooper, J. Kong, P. Widener, and J. H. Saltz. 2012. Accelerating Large Scale Image Analyses on Parallel, CPU-GPU Equipped Systems. In IEEE IPDPS. 1093--1104. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. G. Teodoro, T. Pan, T. M. Kurc, J. Kong, and L. A. Cooper. 2013. High-Throughput Analysis of Large Microscopy Image Datasets on CPU-GPU Cluster Platforms. In IEEE IPDPS. 103--114. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Z. Tong, Z. Xiao, K. Li, and K. Li. 2014. Proactive Scheduling in Distributed Computing---A Reinforcement Learning Approach. Elsevier Journal of Parallel and Distributed Computing 74, 7 (2014), 2662--2672.Google ScholarGoogle ScholarCross RefCross Ref
  23. Y. H. Xiong, S. Y. Wan, J. H. She, M. Wu, and Y. He. 2016. An Energy-Optimization-based Method of Task Scheduling for a Cloud Video Surveillance Center. Journal of Network and Computer Applications 59 (2016), 63--73. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. H. B. Yang, J. H. Guo, C. Liang, Z. Lei, and C. S. Wang. 2016. An Optimization of the Delay Scheduling Algorithm for Real-Time Video Stream Processing. In Frontier Computing. 173--183.Google ScholarGoogle Scholar
  25. H. T. Zhang, B. Xu, J. Yan, L. J. Liu, and H. D. Ma. 2016. Proactive Data Placement for Surveillance Video Processing in Heterogeneous Cluster. In IEEE Cloudcom. 206--213.Google ScholarGoogle Scholar
  26. H. T. Zhang, J. Yan, and Y. Kou. 2016. Efficient Online Surveillance Video Processing Based on Spark Framework. In Bigcom. 309--318.Google ScholarGoogle Scholar
  27. L. X. Zhang, K. L. Li, Y. M. Xu, J. Mei, and F. Zhang. 2015. Maximizing Reliability with Energy Conservation for Parallel Task Scheduling in a Heterogeneous Cluster. Information Sciences 319 (2015), 113--131. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. T. Zhang and J. Li. 2015. Online Task Scheduling for LiDAR Data Preprocessing on Hybrid GPU/CPU Devices: A Reinforcement Learning Approach. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 8, 1 (2015), 386--397.Google ScholarGoogle ScholarCross RefCross Ref

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in
  • Published in

    cover image ACM Other conferences
    ICPP '18: Proceedings of the 47th International Conference on Parallel Processing
    August 2018
    945 pages
    ISBN:9781450365109
    DOI:10.1145/3225058

    Copyright © 2018 ACM

    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 13 August 2018

    Permissions

    Request permissions about this article.

    Request Permissions

    Check for updates

    Qualifiers

    • research-article
    • Research
    • Refereed limited

    Acceptance Rates

    ICPP '18 Paper Acceptance Rate91of313submissions,29%Overall Acceptance Rate91of313submissions,29%

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader