ACM Home Page
Please provide us with feedback. Feedback
Implicit and explicit optimizations for stencil computations
Full text PdfPdf (564 KB)
Source Memory System Performance archive
Proceedings of the 2006 workshop on Memory system performance and correctness table of contents
San Jose, California
SESSION: Workload optimization table of contents
Pages: 51 - 60  
Year of Publication: 2006
ISBN:1-59593-578-9
Authors
Shoaib Kamil  Lawrence Berkeley National Laboratory, Berkeley, CA
Kaushik Datta  University of California, Berkeley, CA
Samuel Williams  University of California, Berkeley, CA
Leonid Oliker  Lawrence Berkeley National Laboratory, Berkeley, CA
John Shalf  Lawrence Berkeley National Laboratory, Berkeley, CA
Katherine Yelick  Lawrence Berkeley National Laboratory, Berkeley, CA and University of California, Berkeley, CA
Sponsor
SIGPLAN: ACM Special Interest Group on Programming Languages
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 8,   Downloads (12 Months): 78,   Citation Count: 4
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
Save this Article to a Binder    Display Formats: BibTex  EndNote ACM Ref   
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1178597.1178605
What is a DOI?

ABSTRACT

Stencil-based kernels constitute the core of many scientific applications on block-structured grids. Unfortunately, these codes achieve a low fraction of peak performance, due primarily to the disparity between processor and main memory speeds. We examine several optimizations on both the conventional cache-based memory systems of the Itanium 2, Opteron, and Power5, as well as the heterogeneous multicore design of the Cell processor. The optimizations target cache reuse across stencil sweeps, including both an implicit cache oblivious approach and a cache-aware algorithm blocked to match the cache structure. Finally, we consider stencil computations on a machine with an explicitly-managed memory hierarchy, the Cell processor. Overall, results show that a cache-aware approach is significantly faster than a cache oblivious approach and that the explicitly managed memory on Cell is more efficient: Relative to the Power5, it has almost 2x more memory bandwidth and is 3.7x faster.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
Applied Numerical Algorithms Group (ANAG), Lawrence Berkeley National Laboratory, Berkeley, CA. Chombo website. http://seesar.lbl.gov/ANAG/software.html.
 
2
M. Berger and J. Oliger. Adaptive mesh refinement for hyperbolic partial differential equations. Journal of Computational Physics, 53:484--512, 1984.
 
3
M. Frigo, C. E. Leiserson, H. Prokop, and S. Ramachandran. Cache-oblivious algorithms (extended abstract).
 
4
5
 
6
J. McCalpin and D. Wonnacott. Time skewing: A value-based approach to optimizing for memory locality. Technical Report DCS-TR-379, Department of Computer Science, Rugers University, 1999.
 
7
Performance Application Programming Interface. http://icl.cs.utk.edu/papi/.
 
8
H. Prokop. Cache-oblivious algorithms, June 1999. Master's thesis, MIT Department of Electrical Engineering and Computer Science.
 
9
10
11
 
12
 
13


Collaborative Colleagues:
Shoaib Kamil: colleagues
Kaushik Datta: colleagues
Samuel Williams: colleagues
Leonid Oliker: colleagues
John Shalf: colleagues
Katherine Yelick: colleagues