skip to main content
10.1145/2016039.2016077acmconferencesArticle/Chapter ViewAbstractPublication Pagesacm-seConference Proceedingsconference-collections
research-article

Performance-aware multicore programming

Published: 24 March 2011 Publication History

Abstract

Multicore processors have become the CPU trend currently due to the fact that performance is hard to be gained by simply increasing clock rates, which had been true over the past decades in computer industry. Yet, multicore programming is still in its infant stage as programmers are not trained to write parallel programs and technology constraints require manual tuning to achieve high performance. We report our multicore programming experience via optimization techniques such as global memory coalescence and thread divergence avoidance with a detailed performance evaluation on a classical dot product application. After applying these optimization techniques, the dot product application achieves a speedup of 3.57 compared to its non-optimization counterpart. These techniques can be directly applied to other applications as dot product has been used in many scientific applications.

References

[1]
Lapack -- linear algebra package. http://www.netlib.org/lapack/.
[2]
Page-locked host memory, mapped memory, stream, and event in nvidia cuda programming guide version 2.2. http://www.nvidia.com/object/cuda, Apr. 2009.
[3]
Nvidia cuda programming guide version 2.3. http://www.nvidia.com/object/cuda, Jan. 2010.
[4]
A. Ebnenasir and R. Beik. Developing parallel programs: A design-oriented perspective. In Proceedings of the 2009 ICSE Workshop on Multicore Software Engineering, pages 1--8, 2009.
[5]
C. McDowell, L. Werner, H. E. Bullock, and J. Fernald. Pair programming improves student retention, confidence, and program quality, 2006.
[6]
M. Ravindran and J. Meisel. Multicore programming techniques for high-performance ate. In IEEE AUTOTESTCON, pages 442--446, 2008.
[7]
M. M. Rundungsfehler and L. Kobbelt. A fast dot-product algorithm with minimal rounding errors. In COMPUTING 52, pages 355--369. Springer Verlag, 1994.
[8]
N. R. Tallent and J. M. Mellor-Crummey. Effective performance measurement and analysis of multithreaded applications. In Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming, pages 229--240, 2009.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ACMSE '11: Proceedings of the 49th annual ACM Southeast Conference
March 2011
399 pages
ISBN:9781450306867
DOI:10.1145/2016039
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 24 March 2011

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. GPU
  2. dot product
  3. multicore
  4. multicore programming
  5. programming

Qualifiers

  • Research-article

Conference

ACM SE '11
Sponsor:
ACM SE '11: ACM Southeast Regional Conference
March 24 - 26, 2011
Georgia, Kennesaw

Acceptance Rates

Overall Acceptance Rate 502 of 1,023 submissions, 49%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 169
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 20 Feb 2025

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media