skip to main content
research-article
Open access

Seamlessly portable applications: Managing the diversity of modern heterogeneous systems

Published: 26 January 2012 Publication History

Abstract

Nowadays, many possible configurations of heterogeneous systems exist, posing several new challenges to application development: different types of processing units usually require individual programming models with dedicated runtime systems and accompanying libraries. If these are absent on an end-user system, e.g. because the respective hardware is not present, an application linked against these will break. This handicaps portability of applications being developed on one system and executed on other, differently configured heterogeneous systems. Moreover, the individual profit of different processing units is normally not known in advance.
In this work, we propose a technique to effectively decouple applications from their accelerator-specific parts, respectively code. These parts are only linked on demand and thereby an application can be made portable across systems with different accelerators. As there are usually multiple hardware-specific implementations for a certain task, e.g., a CPU and a GPU version, a method is required to determine which are usable at all and which one is most suitable for execution on the current system. With our approach, application and hardware programmers can express the requirements and the abilities of the application and the hardware-specific implementations in a simplified manner. During runtime, the requirements and abilities are compared with regard to the present hardware in order to determine the usable implementations of a task. If multiple implementations are usable, an online-learning history-based selector is employed to determine the most efficient one.
We show that our approach chooses the fastest usable implementation dynamically on several systems while introducing only a negligible overhead itself. Applied to an MPI application, our mechanism enables exploitation of local accelerators on different heterogeneous hosts without preliminary knowledge or modification of the application.

References

[1]
Augonnet, C., Thibault, S., Namyst, R., and Wacrenier, P.-A. 2009. StarPU: A unified platform for task scheduling on heterogeneous multicore architectures. In Proceedings of the 15th International Euro-Par Conference on Parallel Processing (Euro-Par '09). Springer-Verlag, Berlin, 863--874.
[2]
Cha, S. K., Pak, B., Brumley, D., and Lipton, R. J. 2010. Platform-independent programs. In Proceedings of the 17th ACM Conference on Computer and Communications Security (CCS '10). ACM, New York, 547--558.
[3]
Che, S., Boyer, M., Meng, J., Tarjan, D., Sheaffer, J. W., Lee, S.-H., and Skadron, K. 2009. Rodinia: A benchmark suite for heterogeneous computing. In Proceedings of the IEEE International Symposium on Workload Characterization (IISWC). IEEE Computer Society, Los Alamitos, CA, 44--54.
[4]
Chen, T., Raghavan, R., Dale, J. N., and Iwata, E. 2007. Cell broadband engine architecture and its first implementation: A performance view. IBM J. Res. Devel. 51, 559--572.
[5]
Diamos, G. F., Kerr, A. R., Yalamanchili, S., and Clark, N. 2010. Ocelot: A dynamic optimization framework for bulk-synchronous applications in heterogeneous systems. In Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques (PACT '10). ACM, New York, 353--364.
[6]
Fröning, H., Nüssle, M., Slogsnat, D., Litz, H., and Brüning, U. 2006. The HTX-Board: A rapid prototyping station. In Proceedings of the 3rd Annual FPGAWorld Conference.
[7]
Ghuloum, A., Sharp, A., Clemons, N., Toit, S. D., Malladi, R., Gangadhar, M., McCool, M., and Pabst, H. 2010. Array building blocks: A flexible parallel programming model for multicore and many-core architectures. http://drdobbs.com/parallel/227300084.
[8]
Gummaraju, J., Morichetti, L., Houston, M., Sander, B., Gaster, B. R., and Zheng, B. 2010. Twin peaks: A software platform for heterogeneous computing on general-purpose and graphics processors. In Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques (PACT '10). ACM, New York, 205--216.
[9]
Karimi, K., Dickson, N. G., and Hamze, F. 2010. A performance comparison of CUDA and OpenCL. CoRR abs/1005.2581.
[10]
Kicherer, M., Buchty, R., and Karl, W. 2011. Cost-aware function migration in heterogeneous systems. In Proceedings of the International Conference on High Performance Embeddded Architectures & Compilers (HiPEAC'11). ACM, New York, 137--145.
[11]
Kramer, D., Vogel, T., Buchty, R., Nowak, F., and Karl, W. 2009. A general purpose hypertransport-based application accelerator framework. In Proceedings of the 1st International Workshop on HyperTransport Research and Applications (WHTRA'09). Computer Architecture Group, Institute for Computer Engineering (ZITI), University of Heidelberg, 30--38.
[12]
Linderman, M. D., Balfour, J., Meng, T. H., and Dally, W. J. 2009. Embracing heterogeneity: parallel programming for changing hardware. In Proceedings of the 1st USENIX Conference on Hot Topics in Parallelism (HotPar'09). USENIX Association, Berkeley, CA, 3--3.
[13]
Luk, C.-K., Hong, S., and Kim, H. 2009. Qilin: exploiting parallelism on heterogeneous multiprocessors with adaptive mapping. In Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 42). ACM, New York, 45--55.
[14]
Nowak, F., Kicherer, M., Buchty, R., and Karl, W. 2010. Delivering guidance information in heterogeneous systems. In Parallel-Algorithmen und Rechnerstrukturen, Mitteilungen Series, vol. 27. Gesellschaft für Informatik e. V., 84--90.
[15]
Wang, P. H., Collins, J. D., Chinya, G. N., Jiang, H., Tian, X., Girkar, M., Yang, N. Y., Lueh, G.-Y., and Wang, H. 2007. EXOCHI: Architecture and programming environment for a heterogeneous multi-core multithreaded system. SIGPLAN Not. 42, 6, 156--166.
[16]
Weber, R., Gothandaraman, A., Hinde, R. J., and Peterson, G. D. 2011. Comparing hardware accelerators in scientific applications: A case study. IEEE Trans. Parall. Distrib. Syst. 22, 58--68.
[17]
Whaley, R. C. and Dongarra, J. J. 1998. Automatically tuned linear algebra software. In Proceedings of the ACM/IEEE Conference on Supercomputing (Supercomputing '98). IEEE Computer Society, Los Alamitos, CA, 1--27.

Cited By

View all
  • (2023)Umpalumpa: a framework for efficient execution of complex image processing workloads on heterogeneous nodesComputing10.1007/s00607-023-01190-w105:11(2389-2417)Online publication date: 1-Nov-2023
  • (2021)On the Inevitability of Integrated HPC Systems and How they will Change HPC System OperationsProceedings of the 11th International Symposium on Highly Efficient Accelerators and Reconfigurable Technologies10.1145/3468044.3468046(1-6)Online publication date: 21-Jun-2021
  • (2020)Architecturally truly diverse systems: A reviewFuture Generation Computer Systems10.1016/j.future.2020.03.061Online publication date: Apr-2020
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Architecture and Code Optimization
ACM Transactions on Architecture and Code Optimization  Volume 8, Issue 4
Special Issue on High-Performance Embedded Architectures and Compilers
January 2012
765 pages
ISSN:1544-3566
EISSN:1544-3973
DOI:10.1145/2086696
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 26 January 2012
Accepted: 01 November 2011
Revised: 01 October 2011
Received: 01 July 2011
Published in TACO Volume 8, Issue 4

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Heterogeneity
  2. adaptive systems
  3. programming models

Qualifiers

  • Research-article
  • Research
  • Refereed

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)85
  • Downloads (Last 6 weeks)11
Reflects downloads up to 05 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2023)Umpalumpa: a framework for efficient execution of complex image processing workloads on heterogeneous nodesComputing10.1007/s00607-023-01190-w105:11(2389-2417)Online publication date: 1-Nov-2023
  • (2021)On the Inevitability of Integrated HPC Systems and How they will Change HPC System OperationsProceedings of the 11th International Symposium on Highly Efficient Accelerators and Reconfigurable Technologies10.1145/3468044.3468046(1-6)Online publication date: 21-Jun-2021
  • (2020)Architecturally truly diverse systems: A reviewFuture Generation Computer Systems10.1016/j.future.2020.03.061Online publication date: Apr-2020
  • (2020)Evaluating Dynamic Task Scheduling with Priorities and Adaptive Aging in a Task-Based Runtime SystemArchitecture of Computing Systems – ARCS 202010.1007/978-3-030-52794-5_2(17-31)Online publication date: 25-May-2020
  • (2016)Workload Partitioning for Accelerating Applications on Heterogeneous PlatformsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2015.250997227:9(2766-2780)Online publication date: 1-Sep-2016
  • (2016)Smart Containers and Skeleton Programming for GPU-Based SystemsInternational Journal of Parallel Programming10.1007/s10766-015-0357-644:3(506-530)Online publication date: 1-Jun-2016
  • (2015)Automatic task mapping and heterogeneity-aware fault toleranceJournal of Systems Architecture: the EUROMICRO Journal10.1016/j.sysarc.2015.10.00161:10(628-638)Online publication date: 1-Nov-2015
  • (2015)Performance-aware composition framework for GPU-based systemsThe Journal of Supercomputing10.1007/s11227-014-1105-171:12(4646-4662)Online publication date: 1-Dec-2015
  • (2014)Global Optimization of Execution Mode Selection for the Reconfigurable PRAM-NUMA Multicore Architecture REPLICAProceedings of the 2014 Second International Symposium on Computing and Networking10.1109/CANDAR.2014.72(322-328)Online publication date: 10-Dec-2014
  • (2014)The PEPPHER composition toolComputing10.1007/s00607-013-0371-896:12(1195-1211)Online publication date: 1-Dec-2014
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Full Access

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media