research-article

Open access

Seamlessly portable applications: Managing the diversity of modern heterogeneous systems

Authors:

Mario Kicherer,

Fabian Nowak,

Rainer Buchty,

Wolfgang KarlAuthors Info & Claims

ACM Transactions on Architecture and Code Optimization (TACO), Volume 8, Issue 4

Article No.: 42, Pages 1 - 20

https://doi.org/10.1145/2086696.2086721

Published: 26 January 2012 Publication History

PDF eReader

Abstract

Nowadays, many possible configurations of heterogeneous systems exist, posing several new challenges to application development: different types of processing units usually require individual programming models with dedicated runtime systems and accompanying libraries. If these are absent on an end-user system, e.g. because the respective hardware is not present, an application linked against these will break. This handicaps portability of applications being developed on one system and executed on other, differently configured heterogeneous systems. Moreover, the individual profit of different processing units is normally not known in advance.

In this work, we propose a technique to effectively decouple applications from their accelerator-specific parts, respectively code. These parts are only linked on demand and thereby an application can be made portable across systems with different accelerators. As there are usually multiple hardware-specific implementations for a certain task, e.g., a CPU and a GPU version, a method is required to determine which are usable at all and which one is most suitable for execution on the current system. With our approach, application and hardware programmers can express the requirements and the abilities of the application and the hardware-specific implementations in a simplified manner. During runtime, the requirements and abilities are compared with regard to the present hardware in order to determine the usable implementations of a task. If multiple implementations are usable, an online-learning history-based selector is employed to determine the most efficient one.

We show that our approach chooses the fastest usable implementation dynamically on several systems while introducing only a negligible overhead itself. Applied to an MPI application, our mechanism enables exploitation of local accelerators on different heterogeneous hosts without preliminary knowledge or modification of the application.

References

[1]

Augonnet, C., Thibault, S., Namyst, R., and Wacrenier, P.-A. 2009. StarPU: A unified platform for task scheduling on heterogeneous multicore architectures. In Proceedings of the 15th International Euro-Par Conference on Parallel Processing (Euro-Par '09). Springer-Verlag, Berlin, 863--874.

Abstract

References

Cited By

Index Terms

Recommendations

Evaluation of directive-based performance portable programming models

Evaluating the Support of MTC Applications on Intel Xeon Phi Many-Core Accelerators

Early experiences with the intel many integrated cores accelerated computing technology

Comments

Information

Published In

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

View options

PDF

eReader

Login options

Full Access

Share

Share this Publication link

Share on social media

Affiliations