research-article

Load balancing in a changing world: dealing with heterogeneity and performance variability

Authors:
Michael Boyer

University of Virginia

University of Virginia
View Profile

,
Kevin Skadron

University of Virginia

University of Virginia
View Profile

,
Shuai Che

AMD Research

AMD Research
View Profile

,
Nuwan Jayasena

AMD Research

AMD Research
View Profile

CF '13: Proceedings of the ACM International Conference on Computing FrontiersMay 2013Article No.: 21Pages 1–10https://doi.org/10.1145/2482767.2482794

Published:14 May 2013Publication History

CF '13: Proceedings of the ACM International Conference on Computing Frontiers

Pages 1–10

ABSTRACT

Fully utilizing the power of modern heterogeneous systems requires judiciously dividing work across all of the available computational devices. Existing approaches for partitioning work require offline training and generate fixed partitions that fail to respond to fluctuations in device performance that occur at run time. We present a novel dynamic approach to work partitioning that requires no offline training and responds automatically to performance variability to provide consistently good performance. Using six diverse OpenCL™ applications, we demonstrate the effectiveness of our approach in scenarios both with and without run-time performance variability, as well as in more extreme scenarios in which one device is non-functional.

References

A. Acosta, R. Corujo, V. Blanco, and F. Almeida. Dynamic load balancing on heterogeneous multicore/multiGPU systems. In International Conference on High Performance Computing and Simulation (HPCS), July 2010.Google ScholarCross Ref
AMD. AMD accelerated parallel processing (APP) SDK. http://developer.amd.com/appsdk.Google Scholar
C. Augonnet, J. Clet-Ortega, S. Thibault, and R. Namyst. Data-aware task scheduling on multi-accelerator based platforms. In International Conference on Parallel and Distributed Systems (ICPADS), Dec. 2010. Google ScholarDigital Library
S. Benkner et al. PEPPHER: Efficient and productive usage of hybrid computing systems. IEEE Micro, 31(5):28--41, Sept./Oct. 2011. Google ScholarDigital Library
S. Che et al. Rodinia: A benchmark suite for heterogeneous computing. In International Symposium on Workload Characterization (IISWC), Oct. 2009. Google ScholarDigital Library
L. Chen, O. Villa, S. Krishnamoorthy, and G. Gao. Dynamic load balancing on single- and multi-GPU systems. In International Symposium on Parallel & Distributed Processing (IPDPS), Apr. 2010.Google ScholarCross Ref
G. Diamos and S. Yalamanchili. Harmony: An execution model and runtime for heterogeneous many core systems. In High Performance Distributed Computing (HPDC), June 2008. Google ScholarDigital Library
Z. Fan, F. Qiu, and A. E. Kaufman. Zippy: A framework for computation and visualization on a GPU cluster. Computer Graphics Forum, 27(2):341--350, Apr. 2008.Google ScholarCross Ref
C. Gregg, M. Boyer, K. Hazelwood, and K. Skadron. Dynamic heterogeneous scheduling decisions using historical runtime data. In Workshop on Applications for Multi- and Many-Core Processors (A4MMC), June 2011.Google Scholar
Q. Hou, K. Zhou, and B. Guo. SPAP: A programming language for heterogeneous many-core systems. Technical report, Zhejiang University Graphics and Parallel Systems Lab, Jan. 2010.Google Scholar
J. Kim, H. Kim, J. H. Lee, and J. Lee. Achieving a single compute device image in OpenCL for multiple GPUs. In Symposium on Principles and Practice of Parallel Programming (PPoPP), Feb. 2011. Google ScholarDigital Library
C. Kruskal and A. Weiss. Allocating independent subtasks on parallel processors. IEEE Transactions on Software Engineering, 11(10):1001--1016, Oct. 1985. Google ScholarDigital Library
M. D. Linderman, J. D. Collins, H. Wang, and T. H. Meng. Merge: A programming model for heterogeneous multi-core systems. ACM SIGPLAN Notices, 43(3):287--296, Mar. 2008. Google ScholarDigital Library
C.-K. Luk, S. Hong, and H. Kim. Qilin: Exploiting parallelism on heterogeneous multiprocessors with adaptive mapping. In International Symposium on Microarchitecture (MICRO), Dec. 2009. Google ScholarDigital Library
A. Moerschell and J. D. Owens. Distributed texture memory in a multi-GPU environment. In Graphics Hardware, Sept. 2006. Google ScholarDigital Library
C. Muller, S. Frey, M. Strengert, C. Dachsbacher, and T. Ertl. A compute unified system architecture for graphics clusters incorporating data locality. IEEE Transactions on Visualization and Computer Graphics, 15(4):605--617, July/Aug. 2009. Google ScholarDigital Library
A. Nere, A. Hashmi, and M. Lipasti. Profiling heterogeneous multi-GPU systems to accelerate cortically inspired learning algorithms. In International Symposium on Parallel & Distributed Processing (IPDPS), May 2011. Google ScholarDigital Library
C. D. Polychronopoulos and D. J. Kuck. Guided self-scheduling: A practical scheduling scheme for parallel supercomputers. IEEE Transactions on Computers, 36(12):1425--1439, Dec. 1987. Google ScholarDigital Library
C.-Y. Shei, P. Ratnalikar, and A. Chauhan. Automating GPU computing in MATLAB. In International Conference on Supercomputing (ICS), May 2011. Google ScholarDigital Library
E. Sun, D. Schaa, R. Bagley, N. Rubin, and D. Kaeli. Enabling task-level scheduling on heterogeneous platforms. In Workshop on General Purpose Processing with Graphics Processing Units (GPGPU), Mar. 2012. Google ScholarDigital Library
H. Topcuoglu, S. Hariri, and M.-Y. Wu. Performance-effective and low-complexity task scheduling for heterogeneous computing. IEEE Transactions on Parallel and Distributed Systems, 13(3):260--274, Mar. 2002. Google ScholarDigital Library
T. H. Tzen and L. M. Ni. Trapezoid self-scheduling: A practical scheduling scheme for parallel compilers. IEEE Transactions on Parallel and Distributed Systems, 4(1):87--98, Jan. 1993. Google ScholarDigital Library
G. Wang and X. Ren. Power-efficient work distribution method for CPU-GPU heterogeneous system. In International Symposium on Parallel and Distributed Processing with Applications (ISPA), Sept. 2010. Google ScholarDigital Library

Index Terms

Load balancing in a changing world: dealing with heterogeneity and performance variability
1. Software and its engineering
  1. Software organization and properties
    1. Contextual software domains
      1. Operating systems
        Process management
        Scheduling

Recommendations

Simplifying programming and load balancing of data parallel applications on heterogeneous systems
GPGPU '16: Proceedings of the 9th Annual Workshop on General Purpose Processing using Graphics Processing Unit

Heterogeneous architectures have experienced a great development thanks to their excellent cost/performance ratio and low power consumption. But heterogeneity significantly complicates both programming and efficient use of the resources. As a result, ...
Read More
Load balancing in a heterogeneous world: CPU-Xeon Phi co-execution of data-parallel kernels

Heterogeneous systems composed by a CPU and a set of different hardware accelerators are very compelling thanks to their excellent performance and energy consumption features. One of the most important problems of those systems is the workload ...
Read More
Cooperative CPU, GPU, and FPGA heterogeneous execution with EngineCL

Heterogeneous systems are the core architecture of most of the high-performance computing nodes, due to their excellent performance and energy efficiency. However, a key challenge that remains is programmability, specifically, releasing the programmer ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
CF '13: Proceedings of the ACM International Conference on Computing Frontiers
May 2013
302 pages
ISBN:9781450320535
DOI:10.1145/2482767
General Chairs:
Hubertus Franke
IBM, US
,
Alexander Heinecke
TU München, DE
,
Program Chairs:
Krishna Palem
Rice University, US and Nanyang Technological University, SG
,
Eli Upfal
Brown University
Copyright © 2013 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 14 May 2013
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
GPU
OpenCL
heterogeneous scheduling
load balancing
Qualifiers
- research-article
Conference

Acceptance Rates
CF '13 Paper Acceptance Rate26of49submissions,53%Overall Acceptance Rate240of680submissions,35%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 45
  Total Citations
  View Citations
- 385
  Total Downloads
- Downloads (Last 12 months)16
- Downloads (Last 6 weeks)2
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Load balancing in a changing world: dealing with heterogeneity and performance variability

CF '13: Proceedings of the ACM International Conference on Computing Frontiers

ABSTRACT

References

Cited By

Index Terms

Recommendations

Simplifying programming and load balancing of data parallel applications on heterogeneous systems

Load balancing in a heterogeneous world: CPU-Xeon Phi co-execution of data-parallel kernels

Cooperative CPU, GPU, and FPGA heterogeneous execution with EngineCL