skip to main content
10.1145/1989493.1989508acmconferencesArticle/Chapter ViewAbstractPublication PagesspaaConference Proceedingsconference-collections
research-article

The pochoir stencil compiler

Published:04 June 2011Publication History

ABSTRACT

A stencil computation repeatedly updates each point of a d-dimensional grid as a function of itself and its near neighbors. Parallel cache-efficient stencil algorithms based on "trapezoidal decompositions" are known, but most programmers find them difficult to write. The Pochoir stencil compiler allows a programmer to write a simple specification of a stencil in a domain-specific stencil language embedded in C++ which the Pochoir compiler then translates into high-performing Cilk code that employs an efficient parallel cache-oblivious algorithm. Pochoir supports general d-dimensional stencils and handles both periodic and aperiodic boundary conditions in one unified algorithm. The Pochoir system provides a C++ template library that allows the user's stencil specification to be executed directly in C++ without the Pochoir compiler (albeit more slowly), which simplifies user debugging and greatly simplified the implementation of the Pochoir compiler itself. A host of stencil benchmarks run on a modern multicore machine demonstrates that Pochoir outperforms standard parallelloop implementations, typically running 2-10 times faster. The algorithm behind Pochoir improves on prior cache-efficient algorithms on multidimensional grids by making "hyperspace" cuts, which yield asymptotically more parallelism for the same cache efficiency.

References

  1. T. Akutsu. Dynamic programming algorithms for RNA secondary structure prediction with pseudoknots. Discrete Applied Mathematics, 104:45--62, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. R. Bleck, C. Rooth, D. Hu, and L. T. Smith. Salinity-driven Thermocline Transients in a wind- and Thermohaline-forced Isopycnic Coordinate Model of the North Atlantic. Journal of Physical Oceanography, 22(12):1486--1505, 1992.Google ScholarGoogle ScholarCross RefCross Ref
  3. R. G. Brickner, W. George, S. L. Johnsson, and A. Ruttenberg. A stencil compiler for the Connection Machine models CM-2/200. In Proceedings of the Fourth Workshop on Compilers for Parallel Computers, 1993.Google ScholarGoogle Scholar
  4. M. Bromley, S. Heller, T. McNerney, and G. L. Steele Jr. Fortran at ten Gigaflops: The Connection Machine convolution compiler. In PLDI, pages 145--156, Toronto, Ontario, Canada, June 26-28 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. C++ Standards Committee. Working draft, standard for programming language C++. available from http://www.open-std.org/jtc1/ sc22/wg21/docs/papers/2011/n3242.pdf, 2011. ISO/IEC Document Number N3242=11-0012.Google ScholarGoogle Scholar
  6. R. A. Chowdhury, H.-S. Le, and V. Ramachandran. Cache-oblivious dynamic programming for bioinformatics. TCBB, 7(3):495--510, July-Sept. 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein. Introduction to Algorithms. The MIT Press, third edition, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. K. Datta. Auto-tuning Stencil Codes for Cache-Based Multicore Platforms. PhD thesis, EECS Department, University of California, Berkeley, Dec 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. K. Datta, M. Murphy, V. Volkov, S. Williams, J. Carter, L. Oliker, D. Patterson, J. Shalf, and K. Yelick. Stencil computation optimization and auto-tuning on state-of-the-art multicore architectures. In SC, pages 4:1--4:12, Austin, TX, Nov. 15-18 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. H. Dursun, K.-i. Nomura, L. Peng, R. Seymour, W. Wang, R. K. Kalia, A. Nakano, and P. Vashishta. A multilevel parallelization framework for high-order stencil computations. In Euro-Par, pages 642--653, Delft, The Netherlands, Aug. 25-28 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. H. Dursun, K.-i. Nomura, W. Wang, M. Kunaseth, L. Peng, R. Seymour, R. K. Kalia, A. Nakano, and P. Vashishta. In-core optimization of high-order stencil computations. In PDPTA, pages 533--538, Las Vegas, NV, July13-16 2009.Google ScholarGoogle Scholar
  12. J. F. Epperson. An Introduction to Numerical Methods and Analysis. Wiley-Interscience, 2007.Google ScholarGoogle Scholar
  13. H. Feshbach and P. Morse. Methods of Theoretical Physics. Feshbach Publishing, 1981.Google ScholarGoogle Scholar
  14. M. Frigo, C. E. Leiserson, H. Prokop, and S. Ramachandran. Cache-oblivious algorithms. In FOCS, pages 285--297, New York, NY, Oct. 17-19 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. M. Frigo and V. Strumpen. Cache oblivious stencil computations. In ICS, pages 361--366, Cambridge, MA, June 20-22 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. M. Frigo and V. Strumpen. The cache complexity of multithreaded cache oblivious algorithms. Theory of Computing Systems, 45(2):203--233, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. M. Gardner. Mathematical Games. Scientific American, 223(4):120--123, 1970.Google ScholarGoogle ScholarCross RefCross Ref
  18. O. Gotoh. An improved algorithm for matching biological sequences. Journal of Molecular Biology, 162:705--708, 1982.Google ScholarGoogle ScholarCross RefCross Ref
  19. Y. He, C. E. Leiserson, and W. M. Leiserson. The Cilkview scalability analyzer. In SPAA, pages 145--156, Santorini, Greece, June 13--15 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. P. Hudak. Building domain-specific embedded languages. ACM Computing Surveys, 28(4), December 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Intel software autotuning tool. http://software.intel.com/en- us/articles/intel-software-autotuning-tool/, 2010.Google ScholarGoogle Scholar
  22. Intel Corporation. Intel Cilk Plus Language Specification, 2010. Document Number: 324396-001US. Available from http://software.intel.com/sites/products/cilk-plus/ cilk_plus_language_specification.pdf.Google ScholarGoogle Scholar
  23. C. John. Options, futures, and other derivatives. Prentice Hall, 2006.Google ScholarGoogle Scholar
  24. S. Kamil, C. Chan, L. Oliker, J. Shalf, and S. Williams. An auto-tuning framework for parallel multicore stencil computations. In IPDPS, pages 1--12, 2010.Google ScholarGoogle ScholarCross RefCross Ref
  25. S. Kamil, K. Datta, S. Williams, L. Oliker, J. Shalf, and K. Yelick. Implicit and explicit optimizations for stencil computations. In MSPC, pages 51--60, San Jose, CA, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. S. Kamil, P. Husbands, L. Oliker, J. Shalf, and K. Yelick. Impact of modern memory subsystems on cache optimizations for stencil computations. In MSP, pages 36--43, Chicago, IL, June 12 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. S. Krishnamoorthy, M. Baskaran, U. Bondhugula, J. Ramanujam, A. Rountev, and P. Sadayappan. Effective automatic parallelization of stencil computations. In PLDI, San Diego, CA, June 10-13 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. https://perf.wiki.kernel.org/index.php/Main_Page.Google ScholarGoogle Scholar
  29. R. Mei, W. Shyy, D. Yu, and L. Luo. Lattice Boltzmann method for 3-D flows with curved boundary. J. of Comput. Phys, 161(2):680--699, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. M. Mernik, J. Heering, and A. M. Sloane. When and how to develop domain-specific languages. ACM Computing Surveys, 37:316--344, December 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. P. Micikevicius. 3D finite difference computation on GPUs using CUDA. In GPPGPU, pages 79--84, Washington, DC, Mar. 8 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. A. Nakano, R. Kalia, and P. Vashishta. Multiresolution molecular dynamics algorithm for realistic materials modeling on parallel computers. Computer Physics Communications, 83(2-3):197--214, 1994.Google ScholarGoogle ScholarCross RefCross Ref
  33. A. Nitsure. Implementation and optimization of a cache oblivious lattice boltzmann algorithm. Master's thesis, Institut für Informatic, Friedrich-Alexander-Universität Erlangen-Nürnberg, July 2006.Google ScholarGoogle Scholar
  34. OpenMP application program interface, version 2.5. OpenMP specification, May 2005.Google ScholarGoogle Scholar
  35. L. Peng, R. Seymour, K.-i. Nomura, R. K. Kalia, A. Nakano, P. Vashishta, A. Loddoch, M. Netzband, W. R. Volz, and C. C. Wong. High-order stencil computations on multicore clusters. In IPDPS, pages 1--11, Rome, Italy, May 23-29 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. S. Peyton Jones. Haskell 98 Language and Libraries: The Revised Report. Cambridge University Press, 1998.Google ScholarGoogle Scholar
  37. H. Prokop. Cache-oblivious algorithms. Master's thesis, Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, June 1999.Google ScholarGoogle Scholar
  38. G. Roth, J. Mellor-Crummey, K. Kennedy, and R. G. Brickner. Compiling stencils in high performance Fortran. In SC, pages 1--20, San Jose, CA, Nov. 16-20 1997. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. A. Taflove and S. Hagness. Computational Electrodynamics: The Finite-Difference Time-Domain Method. Artech House Norwood, MA, 2000.Google ScholarGoogle Scholar
  40. A. van Deursen, P. Klint, and J. Visser. Domain-specific languages: An annotated bibliography. SIGPLAN Not., 35(6):26--36, June 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. S. Williams, J. Carter, L. Oliker, J. Shalf, and K. Yelick. Lattice Boltzmann simulation optimization on leading multicore platforms. In IPDPS, pages 1--14, Miami, FL, Apr. 14-18 2008.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. The pochoir stencil compiler

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader