| Implicit and explicit optimizations for stencil computations |
| Full text |
Pdf
(564 KB)
|
| Source
|
Memory System Performance
archive
Proceedings of the 2006 workshop on Memory system performance and correctness
table of contents
San Jose, California
SESSION: Workload optimization
table of contents
Pages: 51 - 60
Year of Publication: 2006
ISBN:1-59593-578-9
|
|
Authors
|
|
Shoaib Kamil
|
Lawrence Berkeley National Laboratory, Berkeley, CA
|
|
Kaushik Datta
|
University of California, Berkeley, CA
|
|
Samuel Williams
|
University of California, Berkeley, CA
|
|
Leonid Oliker
|
Lawrence Berkeley National Laboratory, Berkeley, CA
|
|
John Shalf
|
Lawrence Berkeley National Laboratory, Berkeley, CA
|
|
Katherine Yelick
|
Lawrence Berkeley National Laboratory, Berkeley, CA and University of California, Berkeley, CA
|
|
| Sponsor |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 8, Downloads (12 Months): 78, Citation Count: 4
|
|
|
ABSTRACT
Stencil-based kernels constitute the core of many scientific applications on block-structured grids. Unfortunately, these codes achieve a low fraction of peak performance, due primarily to the disparity between processor and main memory speeds. We examine several optimizations on both the conventional cache-based memory systems of the Itanium 2, Opteron, and Power5, as well as the heterogeneous multicore design of the Cell processor. The optimizations target cache reuse across stencil sweeps, including both an implicit cache oblivious approach and a cache-aware algorithm blocked to match the cache structure. Finally, we consider stencil computations on a machine with an explicitly-managed memory hierarchy, the Cell processor. Overall, results show that a cache-aware approach is significantly faster than a cache oblivious approach and that the explicitly managed memory on Cell is more efficient: Relative to the Power5, it has almost 2x more memory bandwidth and is 3.7x faster.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Applied Numerical Algorithms Group (ANAG), Lawrence Berkeley National Laboratory, Berkeley, CA. Chombo website. http://seesar.lbl.gov/ANAG/software.html.
|
| |
2
|
M. Berger and J. Oliger. Adaptive mesh refinement for hyperbolic partial differential equations. Journal of Computational Physics, 53:484--512, 1984.
|
| |
3
|
M. Frigo, C. E. Leiserson, H. Prokop, and S. Ramachandran. Cache-oblivious algorithms (extended abstract).
|
| |
4
|
Leonid Oliker , Andrew Canning , Jonathan Carter , John Shalf , David Skinner , Ethier Ethier , Rupak Biswas , Jahed Djomehri , Rob Van der Wijngaart, Evaluation of Cache-based Superscalar and Cacheless Vector Architectures for Scientific Computations, Proceedings of the 2003 ACM/IEEE conference on Supercomputing, p.38, November 15-21, 2003
|
 |
5
|
Shoaib Kamil , Parry Husbands , Leonid Oliker , John Shalf , Katherine Yelick, Impact of modern memory subsystems on cache optimizations for stencil computations, Proceedings of the 2005 workshop on Memory system performance, June 12-12, 2005, Chicago, Illinois
[doi> 10.1145/1111583.1111589]
|
| |
6
|
J. McCalpin and D. Wonnacott. Time skewing: A value-based approach to optimizing for memory locality. Technical Report DCS-TR-379, Department of Computer Science, Rugers University, 1999.
|
| |
7
|
Performance Application Programming Interface. http://icl.cs.utk.edu/papi/.
|
| |
8
|
H. Prokop. Cache-oblivious algorithms, June 1999. Master's thesis, MIT Department of Electrical Engineering and Computer Science.
|
| |
9
|
|
 |
10
|
|
 |
11
|
Samuel Williams , John Shalf , Leonid Oliker , Shoaib Kamil , Parry Husbands , Katherine Yelick, The potential of the cell processor for scientific computing, Proceedings of the 3rd conference on Computing frontiers, May 03-05, 2006, Ischia, Italy
[doi> 10.1145/1128022.1128027]
|
| |
12
|
|
| |
13
|
|
CITED BY 4
|
|
|
|
|
|
|
|
Samuel Williams , John Shalf , Leonid Oliker , Shoaib Kamil , Parry Husbands , Katherine Yelick, Scientific computing Kernels on the cell processor, International Journal of Parallel Programming, v.35 n.3, p.263-298, June 2007
|
|