poster

Automating and optimizing data transfers for many-core coprocessors

Authors:
Bin Ren

Ohio State University, Columbus, OH, USA

Ohio State University, Columbus, OH, USA
View Profile

,
Nishkam Ravi

NEC Laboratories America, Princeton, NJ, USA

NEC Laboratories America, Princeton, NJ, USA
View Profile

,
Yi Yang

NEC Laboratories America, Princeton, NJ, USA

NEC Laboratories America, Princeton, NJ, USA
View Profile

,
Min Feng

NEC Laboratories America, Princeton, NJ, USA

NEC Laboratories America, Princeton, NJ, USA
View Profile

,
Gagan Agrawal

Ohio State University, Columbus, OH, USA

Ohio State University, Columbus, OH, USA
View Profile

,
Srimat Chakradhar

NEC Laboratories America, Princeton, NJ, USA

NEC Laboratories America, Princeton, NJ, USA
View Profile

ICS '14: Proceedings of the 28th ACM international conference on SupercomputingJune 2014Pages 177https://doi.org/10.1145/2597652.2600114

Published:10 June 2014Publication History

ICS '14: Proceedings of the 28th ACM international conference on Supercomputing

Pages 177

ABSTRACT

Orchestrating data transfers between CPUs and a coprocessor manually is cumbersome, particularly for multi-dimensional arrays and other data structures with multi-level pointers, which are common in scientific computations. This work describes a system that includes both compile-time and runtime solutions for this problem, with the overarching goal of improving programmer productivity while maintaining performance.

We implemented our best compile-time solution, partial linearization with pointer reset, as a source-to-source transformation, and evaluated our work by multiple C benchmarks. Our experiment results demonstrate that our best compile-time solution can perform 2.5x-5x faster than original runtime solution, and the CPU-Coprocessor code with it can achieve 1.5x-2.5x speedup over the 16-thread CPU version.

References

S. Lee and R. Eigenmann. OpenMPC: Extended OpenMP Programming and Tuning for GPUs. In SC, 2010. Google ScholarDigital Library
N. Ravi, Y. Yang, T. Bao, and S. Chakradhar. Apricot: an Optimizing Compiler and Productivity Tool for x86-Compatible Many-Core Coprocessors. In ICS, pages 47--58, 2012. Google ScholarDigital Library

Index Terms

Automating and optimizing data transfers for many-core coprocessors
1. Computing methodologies
  1. Parallel computing methodologies
    1. Parallel programming languages
2. Software and its engineering
  1. Software notations and tools
    1. General programming languages
      1. Language types
        Parallel programming languages

Recommendations

Extending a highly parallel data mining algorithm to the intel ^® many integrated core architecture
Euro-Par'11: Proceedings of the 2011 international conference on Parallel Processing - Volume 2

Extracting knowledge from vast datasets is a major challenge in data-driven applications, such as classification and regression, which are mostly compute bound. In this paper, we extend our SG⁺⁺ algorithm to the Intel^® Many Integrated Core Architecture (...
Read More
MrPhi: An Optimized MapReduce Framework on Intel Xeon Phi Coprocessors
In this work, we develop MrPhi, an optimized MapReduce framework on a heterogeneous computing platform, particularly equipped with multiple Intel Xeon Phi coprocessors. To the best of our knowledge, this is the first work to optimize the MapReduce ...
Read More
Vectorizing Unstructured Mesh Computations for Many-core Architectures
PMAM'14: Proceedings of Programming Models and Applications on Multicores and Manycores

Achieving optimal performance on the latest multi-core and many-core architectures depends more and more on making efficient use of the hardware's vector processing capabilities. While auto-vectorizing compilers do not require the use of vector ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ICS '14: Proceedings of the 28th ACM international conference on Supercomputing
June 2014
378 pages
ISBN:9781450326421
DOI:10.1145/2597652
General Chairs:
Arndt Bode
Technische Universität München and Leibniz Rechenzentrum, Germany
,
Michael Gerndt
Technische Universität München, Germany
,
Program Chairs:
Per Stenström
Chalmers University of Technology, Sweden
,
Lawrence Rauchwerger
Texas A&M University, USA
,
Barton Miller
University of Wisconsin, USA
,
Martin Schulz
Lawrence Livermore National Laboratory, USA
Copyright © 2014 Owner/Author
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 10 June 2014
Check for updates
Author Tags
coprocessors
offloading
runtime analysis
static analysis
Qualifiers
- poster
Conference

Acceptance Rates
ICS '14 Paper Acceptance Rate34of160submissions,21%Overall Acceptance Rate584of2,055submissions,28%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 119
  Total Downloads
- Downloads (Last 12 months)0
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Automating and optimizing data transfers for many-core coprocessors

ICS '14: Proceedings of the 28th ACM international conference on Supercomputing

ABSTRACT

References

Cited By

Index Terms

Recommendations

Extending a highly parallel data mining algorithm to the intel ^® many integrated core architecture

MrPhi: An Optimized MapReduce Framework on Intel Xeon Phi Coprocessors

Vectorizing Unstructured Mesh Computations for Many-core Architectures

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Automating and optimizing data transfers for many-core coprocessors

ICS '14: Proceedings of the 28th ACM international conference on Supercomputing

ABSTRACT

References

Cited By

Index Terms

Recommendations

Extending a highly parallel data mining algorithm to the intel ® many integrated core architecture

MrPhi: An Optimized MapReduce Framework on Intel Xeon Phi Coprocessors

Vectorizing Unstructured Mesh Computations for Many-core Architectures

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media

Extending a highly parallel data mining algorithm to the intel ^® many integrated core architecture