research-article

A comparison of programming models for multiprocessors with explicitly managed memory hierarchies

Authors:

Scott Schneider,

Jae-Seung Yeom,

John C. Linford,

Dimitrios S. NikolopoulosAuthors Info & Claims

ACM SIGPLAN Notices, Volume 44, Issue 4

Pages 131 - 140

https://doi.org/10.1145/1594835.1504197

Published: 14 February 2009 Publication History

Abstract

On multiprocessors with explicitly managed memory hierarchies (EMM), software has the responsibility of moving data in and out of fast local memories. This task can be complex and error-prone even for expert programmers. Before we can allow compilers to handle this complexity for us, we must identify the abstractions that are general enough to allow us to write applications with reasonable effort, yet specific enough to exploit the vast on-chip memory bandwidth of EMM multi-processors. To this end, we compare two programming models against hand-tuned codes on the STI Cell, paying attention to programmability and performance. The first programming model, Sequoia, abstracts the memory hierarchy as private address spaces, each corresponding to a parallel task. The second, Cellgen, is a new framework which provides OpenMP-like semantics and the abstraction of a shared address space divided into private and shared data. We compare three applications programmed using these models against their hand-optimized counterparts in terms of abstractions, programming complexity, and performance.

References

[1]

A. M. Aji, W. Feng, F. Blagojevic, and D. S. Nikolopoulos. Cell-SWat: Modeling and Scheduling Wavefront Computations on the Cell Broadband Engine. In Proceedings of the 2008 ACM Conference on Computing Frontiers (CF08), pages 13--22, 2008.

Digital Library

[2]

J. Balart, M. González, X. Martorell, E. Ayguadé, Z. Sura, T. Chen, T. Zhang, K. O'Brien, and K. M. O'Brien. A Novel Asynchronous Software Cache Implementation for the Cell-BE Processor. In Proc. of the 20th International Workshop on Languages and Compilers for Parallel Computing, LNCS Vol. 5234, pages 125--140, Oct. 2007.

[3]

P. Bellens, J. M. Pérez, R. M. Badia, and J. Labarta. CellSs: A Programming Model for the Cell BE Architecture. In Proceedings of the ACM/IEEE SC2006 Conference on High Performance Networking and Computing (Supercomputing'2006), page 86, 2006.

Digital Library

[4]

W. P. L. Carter. Documentation Of The Saprc-99 Chemical Mechanism For Voc Reactivity Assessment. Final Report Contract No. 92-329, California Air Resources Board, May 8 2000.

[5]

JT. Chen, R. Raghavan, J. N. Dale, and E. Iwata. Cell Broadband Engine and Its First Implementation -- A Performance View. IBM Journal of Research and Development, 51(5):559--572, Sept. 2007.

Digital Library

[6]

T. Chen, Z. Sura, K. M. O'Brien, and J. K. O'Brien. Optimizing the Use of Static Buffers for DMA on a CELL Chip. In Languages and Compilers for Parallel Computing, 19th International Workshop (LCPC), pages 314--329, 2006.

Digital Library

[7]

C. H. Crawford, P. Henning, M. Kistler, and C. Wright. Accelerating Computing With the Cell Broadband Engine Processor. In Proceedings of the 2008 ACM Conference on Computing Frontiers (CF08), pages 3--12, 2008.

Digital Library

[8]

W. J. Dally, F. Labonte, A. Das, P. Hanrahan, J. H. Ahn, J. Gummaraju, M. Erez, N. Jayasena, I. Buck, T. J. Knight, and U. J. Kapasi. Merri-mac: Supercomputing with Streams. In Proceedings of the ACM/IEEE SC2003 Conference on High Performance Networking and Computin (Supercomputing'2003), page 35, 2003.

Digital Library

[9]

A. Duran, J. M. Perez, E. Ayguade, R. M. Badia, and J. Labarta. Extending the OpenMP Tasking Model to Allow Dependent Tasks. In OpenMP in a New Era of Parallelism, Proceedings of the 4th International Workshop on OpenMP, LNCS Vol. 5004, pages 111--122, July 2008.

Digital Library

[10]

K. Fatahalian, D. R. Horn, T. J. Knight, L. Leem, M. Houston, J. Y. Park, M. Erez, M. Ren, A. Aiken, W. J. Dally, and P. Hanrahan. Sequoia: Programming the Memory Hierarchy. In Proceedings of the ACM/IEEE SC2006 Conference on High Performance Networking and Computing (Supercomputing'2006), page 83, 2006.

Digital Library

[11]

X. Feng, K. W. Cameron, and D. A. Buell. PBPI: A High Performance Implementation of Bayesian Phylogenetic Inference. In Proceedings of the ACM/IEEE SC2006 Conference on High Performance Networking and Computing (Supercomputing'2006), page 75, 2006.

Digital Library

[12]

M. I. Gordon, W. Thies, and S. P. Amarasinghe. Exploiting Coarse-Grained Task, Data, and Pipeline Parallelism in Stream Programs. In Proceedings of the 12th International Conference on Architectural Support for Programming Languages and Operating Systems (ASP-LOS), pages 151--162, 2006.

Digital Library

[13]

J. Gummaraju, J. Coburn, Y. Turner, and M. Rosenblum. Streamware: Programming General-Purpose Multicore Processors Using Streams. In Proceedings of the 13th International Conference on Architectural Support for Programming Languages and Operating Systems (ASP-LOS), pages 297--307, 2008.

Digital Library

[14]

W. Hundsdorfer. Numerical Solution of Advection-Diffusion-Reaction Equations. Technical report, Centrum voor Wiskunde en Informatica, 1996.

[15]

IBM Corporation. Software development kit for multi-core acceleration version 3.0. Oct. 2007.

[16]

D. Jimenez-Gonzalez, X. Martorell, and A. Ramirez. Performance Analysis of Cell Broadband Engine for High Memory Bandwidth Applications. Performance Analysis of Systems & Software, 2007. ISPASS 2007. IEEE International Symposium on, pages 210--219, April 2007.

[17]

J. C. Linford and A. Sandu. Optimizing Large Scale Chemical Transport Models for Multicore Platforms. In Proceedings of the 2008 Spring Simulation Multiconference, Ottawa, Canada, April 14-18 2008.

Digital Library

[18]

T. Mattson. Introduction to OpenMP -- Tutorial. In Proceedings of the ACM/IEEE SC2006 Conference on High Performance Networking and Computing (Supercomputing'2006), page 209, 2006.

Digital Library

[19]

M. D. McCool and B. D'Amora. Programming using RapidMind on the Cell BE -- Tutorial. In Proceedings of the ACM/IEEE SC2006 Conference on High Performance Networking and Computing (Super-computing'2006), page 222, 2006.

Digital Library

[20]

N. Mitchell, L. Carter, and J. Ferrante. Localizing Non-Affine Array References. In Proceedings of the 1999 International Conference on Parallel Architectures and Compilation Techniques (PACT), pages 192--202, 1999.

Digital Library

[21]

J. D. Owens, M. Houston, D. Luebke, S. Green, J. E. Stone, and J. C. Phillips. GPU Computing. Proceedings of the IEEE, 95(6):879--899, May 2008.

[22]

B. Rose. Cellstream. http://www.cs.vt.edu/~bar234/cellstream.

[23]

A. Sandu, D. Daescu, G. Carmichael, and T. Chai. Adjoint Sensitivity Analysis of Regional Air Quality Models. Journal of Computational Physics, 204:222--252, 2005.

Digital Library

[24]

P. H. Wang, J. D. Collins, G. N. Chinya, H. Jiang, X. Tian, M. Girkar, N. Y. Yang, G.-Y. Lueh, and H. Wang. EXOCHI: Architecture and Programming Environment for a Heterogeneous Multi-core Multi-threaded System. In PLDI'07: Proceedings of the 2007 ACM SIG-PLAN conference on Programming Language Design and Implemen-tation, pages 156--166, 2007.

Digital Library

Cited By

Khan NLatif MPervaiz NBaig MKhatoon HBaig MBurney A(2019)Smart Scheduler for CUDA Programming in Heterogeneous CPU/GPU EnvironmentProceedings of the 11th International Conference on Computer Modeling and Simulation10.1145/3307363.3307377(250-253)Online publication date: 16-Jan-2019
https://dl.acm.org/doi/10.1145/3307363.3307377
Saidi STendulkar PLepley TMaler O(2013)Optimizing two-dimensional DMA transfers for scratchpad Based MPSoCs platformsMicroprocessors & Microsystems10.1016/j.micpro.2013.04.00637:8(848-857)Online publication date: 1-Nov-2013
https://dl.acm.org/doi/10.1016/j.micpro.2013.04.006
Lira JJones TMolina CGonzález A(2012)The migration prefetcherACM Transactions on Architecture and Code Optimization10.1145/2086696.20867248:4(1-20)Online publication date: 26-Jan-2012
https://dl.acm.org/doi/10.1145/2086696.2086724
Show More Cited By

Index Terms

A comparison of programming models for multiprocessors with explicitly managed memory hierarchies
1. Computing methodologies
  1. Parallel computing methodologies
    1. Parallel programming languages
2. Software and its engineering
  1. Software notations and tools
    1. Compilers
    2. General programming languages
      1. Language types
        Parallel programming languages

Recommendations

A comparison of programming models for multiprocessors with explicitly managed memory hierarchies
PPoPP '09: Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming

On multiprocessors with explicitly managed memory hierarchies (EMM), software has the responsibility of moving data in and out of fast local memories. This task can be complex and error-prone even for expert programmers. Before we can allow compilers to ...
Region-based parallelization of irregular reductions on explicitly managed memory hierarchies

Multicore architectures are evolving with the promise of extreme performance for the classes of applications that require high performance and large bandwidth of memory. Irregular reduction is one of important computation patterns for many complex ...
Programming Multiprocessors with Explicitly Managed Memory Hierarchies

A study of two applications programmed using three models of varying complexity reveals that implicit management of locality can produce code with performance comparable to code generated from explicit management of locality.

Comments

Information & Contributors

Information

Published In

cover image ACM SIGPLAN Notices

ACM SIGPLAN Notices Volume 44, Issue 4

PPoPP '09

April 2009

294 pages

ISSN:0362-1340

EISSN:1558-1160

DOI:10.1145/1594835

Issue’s Table of Contents

PPoPP '09: Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
February 2009
322 pages
ISBN:9781605583976
DOI:10.1145/1504176
General Chair:
Daniel Reed
Microsoft Research, USA
,
Program Chair:
Vivek Sarkar
Rice University, USA

Copyright © 2009 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 February 2009

Published in SIGPLAN Volume 44, Issue 4

Check for updates

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

19
Total Citations
View Citations
961
Total Downloads

Downloads (Last 12 months)8
Downloads (Last 6 weeks)0

Reflects downloads up to 14 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Khan NLatif MPervaiz NBaig MKhatoon HBaig MBurney A(2019)Smart Scheduler for CUDA Programming in Heterogeneous CPU/GPU EnvironmentProceedings of the 11th International Conference on Computer Modeling and Simulation10.1145/3307363.3307377(250-253)Online publication date: 16-Jan-2019
https://dl.acm.org/doi/10.1145/3307363.3307377
Saidi STendulkar PLepley TMaler O(2013)Optimizing two-dimensional DMA transfers for scratchpad Based MPSoCs platformsMicroprocessors & Microsystems10.1016/j.micpro.2013.04.00637:8(848-857)Online publication date: 1-Nov-2013
https://dl.acm.org/doi/10.1016/j.micpro.2013.04.006
Lira JJones TMolina CGonzález A(2012)The migration prefetcherACM Transactions on Architecture and Code Optimization10.1145/2086696.20867248:4(1-20)Online publication date: 26-Jan-2012
https://dl.acm.org/doi/10.1145/2086696.2086724
Saidi STendulkar PLepley TMaler O(2012)Optimizing explicit data transfers for data parallel applications on the cell architectureACM Transactions on Architecture and Code Optimization10.1145/2086696.20867168:4(1-20)Online publication date: 26-Jan-2012
https://dl.acm.org/doi/10.1145/2086696.2086716
Choo HChoi MShon MKim D(2011)Efficient hole bypass routing scheme using observer packets for geographic routing in wireless sensor networksACM SIGAPP Applied Computing Review10.1145/2107756.210775711:4(7-16)Online publication date: 1-Dec-2011
https://dl.acm.org/doi/10.1145/2107756.2107757
Montañés Laborda MTorres Moreno EMartínez del Rincón JHerrero Jaraba J(2011)Real-time GPU color-based segmentation of football playersJournal of Real-Time Image Processing10.1007/s11554-011-0194-97:4(267-279)Online publication date: 3-Feb-2011
https://doi.org/10.1007/s11554-011-0194-9
Tzenakis GKapelonis KAlvanos MKoukos KNikolopoulos DBilas A(2010)Tagged procedure calls (TPC)Proceedings of the 5th international conference on High Performance Embedded Architectures and Compilers10.1007/978-3-642-11515-8_23(307-321)Online publication date: 25-Jan-2010
https://dl.acm.org/doi/10.1007/978-3-642-11515-8_23
Tian GHammami O(2009)Automatic parallelization experiments on 16PE NOC based MPSOC2009 IEEE 8th International Conference on ASIC10.1109/ASICON.2009.5351532(967-970)Online publication date: Oct-2009
https://doi.org/10.1109/ASICON.2009.5351532
Lim JKim H(2014)Design Space Exploration of Memory Model for Heterogeneous ComputingProceedings of the 2014 IEEE 26th International Symposium on Computer Architecture and High Performance Computing10.1109/SBAC-PAD.2014.9(160-167)Online publication date: 22-Oct-2014
https://dl.acm.org/doi/10.1109/SBAC-PAD.2014.9
Kessler C(2012)Programming the Cell ProcessorFundamentals of Multicore Software Development10.1201/b11417-12(155-198)Online publication date: 9-Jan-2012
https://doi.org/10.1201/b11417-12
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Issue’s Table of Contents