skip to main content
research-article

A comparison of programming models for multiprocessors with explicitly managed memory hierarchies

Published: 14 February 2009 Publication History

Abstract

On multiprocessors with explicitly managed memory hierarchies (EMM), software has the responsibility of moving data in and out of fast local memories. This task can be complex and error-prone even for expert programmers. Before we can allow compilers to handle this complexity for us, we must identify the abstractions that are general enough to allow us to write applications with reasonable effort, yet specific enough to exploit the vast on-chip memory bandwidth of EMM multi-processors. To this end, we compare two programming models against hand-tuned codes on the STI Cell, paying attention to programmability and performance. The first programming model, Sequoia, abstracts the memory hierarchy as private address spaces, each corresponding to a parallel task. The second, Cellgen, is a new framework which provides OpenMP-like semantics and the abstraction of a shared address space divided into private and shared data. We compare three applications programmed using these models against their hand-optimized counterparts in terms of abstractions, programming complexity, and performance.

References

[1]
A. M. Aji, W. Feng, F. Blagojevic, and D. S. Nikolopoulos. Cell-SWat: Modeling and Scheduling Wavefront Computations on the Cell Broadband Engine. In Proceedings of the 2008 ACM Conference on Computing Frontiers (CF08), pages 13--22, 2008.
[2]
J. Balart, M. González, X. Martorell, E. Ayguadé, Z. Sura, T. Chen, T. Zhang, K. O'Brien, and K. M. O'Brien. A Novel Asynchronous Software Cache Implementation for the Cell-BE Processor. In Proc. of the 20th International Workshop on Languages and Compilers for Parallel Computing, LNCS Vol. 5234, pages 125--140, Oct. 2007.
[3]
P. Bellens, J. M. Pérez, R. M. Badia, and J. Labarta. CellSs: A Programming Model for the Cell BE Architecture. In Proceedings of the ACM/IEEE SC2006 Conference on High Performance Networking and Computing (Supercomputing'2006), page 86, 2006.
[4]
W. P. L. Carter. Documentation Of The Saprc-99 Chemical Mechanism For Voc Reactivity Assessment. Final Report Contract No. 92-329, California Air Resources Board, May 8 2000.
[5]
JT. Chen, R. Raghavan, J. N. Dale, and E. Iwata. Cell Broadband Engine and Its First Implementation -- A Performance View. IBM Journal of Research and Development, 51(5):559--572, Sept. 2007.
[6]
T. Chen, Z. Sura, K. M. O'Brien, and J. K. O'Brien. Optimizing the Use of Static Buffers for DMA on a CELL Chip. In Languages and Compilers for Parallel Computing, 19th International Workshop (LCPC), pages 314--329, 2006.
[7]
C. H. Crawford, P. Henning, M. Kistler, and C. Wright. Accelerating Computing With the Cell Broadband Engine Processor. In Proceedings of the 2008 ACM Conference on Computing Frontiers (CF08), pages 3--12, 2008.
[8]
W. J. Dally, F. Labonte, A. Das, P. Hanrahan, J. H. Ahn, J. Gummaraju, M. Erez, N. Jayasena, I. Buck, T. J. Knight, and U. J. Kapasi. Merri-mac: Supercomputing with Streams. In Proceedings of the ACM/IEEE SC2003 Conference on High Performance Networking and Computin (Supercomputing'2003), page 35, 2003.
[9]
A. Duran, J. M. Perez, E. Ayguade, R. M. Badia, and J. Labarta. Extending the OpenMP Tasking Model to Allow Dependent Tasks. In OpenMP in a New Era of Parallelism, Proceedings of the 4th International Workshop on OpenMP, LNCS Vol. 5004, pages 111--122, July 2008.
[10]
K. Fatahalian, D. R. Horn, T. J. Knight, L. Leem, M. Houston, J. Y. Park, M. Erez, M. Ren, A. Aiken, W. J. Dally, and P. Hanrahan. Sequoia: Programming the Memory Hierarchy. In Proceedings of the ACM/IEEE SC2006 Conference on High Performance Networking and Computing (Supercomputing'2006), page 83, 2006.
[11]
X. Feng, K. W. Cameron, and D. A. Buell. PBPI: A High Performance Implementation of Bayesian Phylogenetic Inference. In Proceedings of the ACM/IEEE SC2006 Conference on High Performance Networking and Computing (Supercomputing'2006), page 75, 2006.
[12]
M. I. Gordon, W. Thies, and S. P. Amarasinghe. Exploiting Coarse-Grained Task, Data, and Pipeline Parallelism in Stream Programs. In Proceedings of the 12th International Conference on Architectural Support for Programming Languages and Operating Systems (ASP-LOS), pages 151--162, 2006.
[13]
J. Gummaraju, J. Coburn, Y. Turner, and M. Rosenblum. Streamware: Programming General-Purpose Multicore Processors Using Streams. In Proceedings of the 13th International Conference on Architectural Support for Programming Languages and Operating Systems (ASP-LOS), pages 297--307, 2008.
[14]
W. Hundsdorfer. Numerical Solution of Advection-Diffusion-Reaction Equations. Technical report, Centrum voor Wiskunde en Informatica, 1996.
[15]
IBM Corporation. Software development kit for multi-core acceleration version 3.0. Oct. 2007.
[16]
D. Jimenez-Gonzalez, X. Martorell, and A. Ramirez. Performance Analysis of Cell Broadband Engine for High Memory Bandwidth Applications. Performance Analysis of Systems & Software, 2007. ISPASS 2007. IEEE International Symposium on, pages 210--219, April 2007.
[17]
J. C. Linford and A. Sandu. Optimizing Large Scale Chemical Transport Models for Multicore Platforms. In Proceedings of the 2008 Spring Simulation Multiconference, Ottawa, Canada, April 14-18 2008.
[18]
T. Mattson. Introduction to OpenMP -- Tutorial. In Proceedings of the ACM/IEEE SC2006 Conference on High Performance Networking and Computing (Supercomputing'2006), page 209, 2006.
[19]
M. D. McCool and B. D'Amora. Programming using RapidMind on the Cell BE -- Tutorial. In Proceedings of the ACM/IEEE SC2006 Conference on High Performance Networking and Computing (Super-computing'2006), page 222, 2006.
[20]
N. Mitchell, L. Carter, and J. Ferrante. Localizing Non-Affine Array References. In Proceedings of the 1999 International Conference on Parallel Architectures and Compilation Techniques (PACT), pages 192--202, 1999.
[21]
J. D. Owens, M. Houston, D. Luebke, S. Green, J. E. Stone, and J. C. Phillips. GPU Computing. Proceedings of the IEEE, 95(6):879--899, May 2008.
[22]
B. Rose. Cellstream. http://www.cs.vt.edu/~bar234/cellstream.
[23]
A. Sandu, D. Daescu, G. Carmichael, and T. Chai. Adjoint Sensitivity Analysis of Regional Air Quality Models. Journal of Computational Physics, 204:222--252, 2005.
[24]
P. H. Wang, J. D. Collins, G. N. Chinya, H. Jiang, X. Tian, M. Girkar, N. Y. Yang, G.-Y. Lueh, and H. Wang. EXOCHI: Architecture and Programming Environment for a Heterogeneous Multi-core Multi-threaded System. In PLDI'07: Proceedings of the 2007 ACM SIG-PLAN conference on Programming Language Design and Implemen-tation, pages 156--166, 2007.

Cited By

View all
  • (2019)Smart Scheduler for CUDA Programming in Heterogeneous CPU/GPU EnvironmentProceedings of the 11th International Conference on Computer Modeling and Simulation10.1145/3307363.3307377(250-253)Online publication date: 16-Jan-2019
  • (2013)Optimizing two-dimensional DMA transfers for scratchpad Based MPSoCs platformsMicroprocessors & Microsystems10.1016/j.micpro.2013.04.00637:8(848-857)Online publication date: 1-Nov-2013
  • (2012)The migration prefetcherACM Transactions on Architecture and Code Optimization10.1145/2086696.20867248:4(1-20)Online publication date: 26-Jan-2012
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM SIGPLAN Notices
ACM SIGPLAN Notices  Volume 44, Issue 4
PPoPP '09
April 2009
294 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/1594835
Issue’s Table of Contents
  • cover image ACM Conferences
    PPoPP '09: Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
    February 2009
    322 pages
    ISBN:9781605583976
    DOI:10.1145/1504176
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 February 2009
Published in SIGPLAN Volume 44, Issue 4

Check for updates

Author Tags

  1. cell be
  2. explicitly managed memory hierarchies
  3. programming models

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)8
  • Downloads (Last 6 weeks)0
Reflects downloads up to 14 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2019)Smart Scheduler for CUDA Programming in Heterogeneous CPU/GPU EnvironmentProceedings of the 11th International Conference on Computer Modeling and Simulation10.1145/3307363.3307377(250-253)Online publication date: 16-Jan-2019
  • (2013)Optimizing two-dimensional DMA transfers for scratchpad Based MPSoCs platformsMicroprocessors & Microsystems10.1016/j.micpro.2013.04.00637:8(848-857)Online publication date: 1-Nov-2013
  • (2012)The migration prefetcherACM Transactions on Architecture and Code Optimization10.1145/2086696.20867248:4(1-20)Online publication date: 26-Jan-2012
  • (2012)Optimizing explicit data transfers for data parallel applications on the cell architectureACM Transactions on Architecture and Code Optimization10.1145/2086696.20867168:4(1-20)Online publication date: 26-Jan-2012
  • (2011)Efficient hole bypass routing scheme using observer packets for geographic routing in wireless sensor networksACM SIGAPP Applied Computing Review10.1145/2107756.210775711:4(7-16)Online publication date: 1-Dec-2011
  • (2011)Real-time GPU color-based segmentation of football playersJournal of Real-Time Image Processing10.1007/s11554-011-0194-97:4(267-279)Online publication date: 3-Feb-2011
  • (2010)Tagged procedure calls (TPC)Proceedings of the 5th international conference on High Performance Embedded Architectures and Compilers10.1007/978-3-642-11515-8_23(307-321)Online publication date: 25-Jan-2010
  • (2009)Automatic parallelization experiments on 16PE NOC based MPSOC2009 IEEE 8th International Conference on ASIC10.1109/ASICON.2009.5351532(967-970)Online publication date: Oct-2009
  • (2014)Design Space Exploration of Memory Model for Heterogeneous ComputingProceedings of the 2014 IEEE 26th International Symposium on Computer Architecture and High Performance Computing10.1109/SBAC-PAD.2014.9(160-167)Online publication date: 22-Oct-2014
  • (2012)Programming the Cell ProcessorFundamentals of Multicore Software Development10.1201/b11417-12(155-198)Online publication date: 9-Jan-2012
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media