skip to main content
10.1145/2482767.2482794acmconferencesArticle/Chapter ViewAbstractPublication PagescfConference Proceedingsconference-collections
research-article

Load balancing in a changing world: dealing with heterogeneity and performance variability

Published:14 May 2013Publication History

ABSTRACT

Fully utilizing the power of modern heterogeneous systems requires judiciously dividing work across all of the available computational devices. Existing approaches for partitioning work require offline training and generate fixed partitions that fail to respond to fluctuations in device performance that occur at run time. We present a novel dynamic approach to work partitioning that requires no offline training and responds automatically to performance variability to provide consistently good performance. Using six diverse OpenCL™ applications, we demonstrate the effectiveness of our approach in scenarios both with and without run-time performance variability, as well as in more extreme scenarios in which one device is non-functional.

References

  1. A. Acosta, R. Corujo, V. Blanco, and F. Almeida. Dynamic load balancing on heterogeneous multicore/multiGPU systems. In International Conference on High Performance Computing and Simulation (HPCS), July 2010.Google ScholarGoogle ScholarCross RefCross Ref
  2. AMD. AMD accelerated parallel processing (APP) SDK. http://developer.amd.com/appsdk.Google ScholarGoogle Scholar
  3. C. Augonnet, J. Clet-Ortega, S. Thibault, and R. Namyst. Data-aware task scheduling on multi-accelerator based platforms. In International Conference on Parallel and Distributed Systems (ICPADS), Dec. 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. S. Benkner et al. PEPPHER: Efficient and productive usage of hybrid computing systems. IEEE Micro, 31(5):28--41, Sept./Oct. 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. S. Che et al. Rodinia: A benchmark suite for heterogeneous computing. In International Symposium on Workload Characterization (IISWC), Oct. 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. L. Chen, O. Villa, S. Krishnamoorthy, and G. Gao. Dynamic load balancing on single- and multi-GPU systems. In International Symposium on Parallel & Distributed Processing (IPDPS), Apr. 2010.Google ScholarGoogle ScholarCross RefCross Ref
  7. G. Diamos and S. Yalamanchili. Harmony: An execution model and runtime for heterogeneous many core systems. In High Performance Distributed Computing (HPDC), June 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Z. Fan, F. Qiu, and A. E. Kaufman. Zippy: A framework for computation and visualization on a GPU cluster. Computer Graphics Forum, 27(2):341--350, Apr. 2008.Google ScholarGoogle ScholarCross RefCross Ref
  9. C. Gregg, M. Boyer, K. Hazelwood, and K. Skadron. Dynamic heterogeneous scheduling decisions using historical runtime data. In Workshop on Applications for Multi- and Many-Core Processors (A4MMC), June 2011.Google ScholarGoogle Scholar
  10. Q. Hou, K. Zhou, and B. Guo. SPAP: A programming language for heterogeneous many-core systems. Technical report, Zhejiang University Graphics and Parallel Systems Lab, Jan. 2010.Google ScholarGoogle Scholar
  11. J. Kim, H. Kim, J. H. Lee, and J. Lee. Achieving a single compute device image in OpenCL for multiple GPUs. In Symposium on Principles and Practice of Parallel Programming (PPoPP), Feb. 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. C. Kruskal and A. Weiss. Allocating independent subtasks on parallel processors. IEEE Transactions on Software Engineering, 11(10):1001--1016, Oct. 1985. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. M. D. Linderman, J. D. Collins, H. Wang, and T. H. Meng. Merge: A programming model for heterogeneous multi-core systems. ACM SIGPLAN Notices, 43(3):287--296, Mar. 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. C.-K. Luk, S. Hong, and H. Kim. Qilin: Exploiting parallelism on heterogeneous multiprocessors with adaptive mapping. In International Symposium on Microarchitecture (MICRO), Dec. 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. A. Moerschell and J. D. Owens. Distributed texture memory in a multi-GPU environment. In Graphics Hardware, Sept. 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. C. Muller, S. Frey, M. Strengert, C. Dachsbacher, and T. Ertl. A compute unified system architecture for graphics clusters incorporating data locality. IEEE Transactions on Visualization and Computer Graphics, 15(4):605--617, July/Aug. 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. A. Nere, A. Hashmi, and M. Lipasti. Profiling heterogeneous multi-GPU systems to accelerate cortically inspired learning algorithms. In International Symposium on Parallel & Distributed Processing (IPDPS), May 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. C. D. Polychronopoulos and D. J. Kuck. Guided self-scheduling: A practical scheduling scheme for parallel supercomputers. IEEE Transactions on Computers, 36(12):1425--1439, Dec. 1987. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. C.-Y. Shei, P. Ratnalikar, and A. Chauhan. Automating GPU computing in MATLAB. In International Conference on Supercomputing (ICS), May 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. E. Sun, D. Schaa, R. Bagley, N. Rubin, and D. Kaeli. Enabling task-level scheduling on heterogeneous platforms. In Workshop on General Purpose Processing with Graphics Processing Units (GPGPU), Mar. 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. H. Topcuoglu, S. Hariri, and M.-Y. Wu. Performance-effective and low-complexity task scheduling for heterogeneous computing. IEEE Transactions on Parallel and Distributed Systems, 13(3):260--274, Mar. 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. T. H. Tzen and L. M. Ni. Trapezoid self-scheduling: A practical scheduling scheme for parallel compilers. IEEE Transactions on Parallel and Distributed Systems, 4(1):87--98, Jan. 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. G. Wang and X. Ren. Power-efficient work distribution method for CPU-GPU heterogeneous system. In International Symposium on Parallel and Distributed Processing with Applications (ISPA), Sept. 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Load balancing in a changing world: dealing with heterogeneity and performance variability

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in
  • Published in

    cover image ACM Conferences
    CF '13: Proceedings of the ACM International Conference on Computing Frontiers
    May 2013
    302 pages
    ISBN:9781450320535
    DOI:10.1145/2482767

    Copyright © 2013 ACM

    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 14 May 2013

    Permissions

    Request permissions about this article.

    Request Permissions

    Check for updates

    Qualifiers

    • research-article

    Acceptance Rates

    CF '13 Paper Acceptance Rate26of49submissions,53%Overall Acceptance Rate240of680submissions,35%

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader