skip to main content
10.1145/1088149.1088195acmconferencesArticle/Chapter ViewAbstractPublication PagesicsConference Proceedingsconference-collections
Article

Improving the computational intensity of unstructured mesh applications

Published: 20 June 2005 Publication History

Abstract

Although unstructured mesh algorithms are a popular means of solving problems across a broad range of disciplines---from texture mapping to computational fluid dynamics---they are often dominated not by computation, but by mesh overhead. Our study of an object-oriented mesh-based benchmark reveals that 72% of its execution time is spent on mesh-related operations, such as iterating over faces or chasing pointers. We report a series of optimizations---some traditional, some novel---that dramatically improve the benchmark's computational intensity---the ratio of floating point operations to memory accesses. This improvement is attributable to an eight-fold reduction in memory operations and results in a 4.7x speedup in execution time.Our work demonstrates that common subexpression elimination and code motion are important optimizations for mesh-based codes. However, conservative analysis prevents their application. We discuss these barriers to analysis and argue that an understanding of mesh semantics complements more traditional analyses, such as pointer alias analysis, and certifies the correctness of these optimizations. Our identification of overheads in mesh-based codes, optimizations that address them, and limitations of current compiler analyses are required for our eventual goal of automating these optimizations in a semantics-aware compiler.

References

[1]
N. Ahmed, N. Mateev, and K. Pingali, Synthesizing transformations for locality enhancement of imperfectly-nested loop nests. In Proceedings of the International Conference on Supercomputing, pages 141--152, May 2000.]]
[2]
W. Anderson, W. Gropp, D. Kaushik, D. Keyes, and B. Smith. Achieving high sustained performance in an unstructured mesh CFD application. In Proceedings of IEEE/ACM Supercomputing '99, Nov. 1999.]]
[3]
P. V. Artigas, M. Gupta, S. P. Midkiff, and J. E. Moreira. Automatic loop transformations and parallelization of java. In Proceedings of the International Conference on Supercomputing, pages 1--10, May 2000.]]
[4]
M. W. Beall and M. S. Shephard. An object-oriented framework for reliable numerical simulations. Engineering with Computers, 15(1):61--72, Apr. 1999.]]
[5]
I. Corporation. HPM Tool Kit --- Home Page. http://www.alphaworks.bim.com/tech/hpmtoolkit, Jan. 2005.]]
[6]
R. Das, M. Uysal, J. H. Saltz, and Y.-S. Hwang. Communication optimizations for irregular scientific computations on distributed memory architectures. Journal of Parallel and Distributed Computing, 22(3):462--479, 1994.]]
[7]
C. Ding and K. Kennedy. Improving cache performance in dynamic applications through data and computation reorganization at run time. In Proceedings of the 1999 ACM SIGPLAN Conference on Programming Language Design and Implementation, pages 229--241, May 1999.]]
[8]
D. R. Engler. Incorporating application semantics and control into compilation. In Proceedings of the 1st Conference on Domain-specific Languages, pages 103--118, 1997.]]
[9]
W. D. Gropp, D. K. Kaushik, D. E. Keyes, and B. F. Smith. Toward realistic performance bounds for implicit cfd codes. In Proceedings of Parallel CFD'99, May 1999.]]
[10]
W. D. Gropp, D. K. Kaushik, D. E. Keyes, and B. F. Smith. Performance modeling and tuning of an unstructured mesh cfd application. In Proceedings of IEEE/ACM Supercomputing '00, Nov. 2000.]]
[11]
S. Z. Guyer and C. Lin. An annotation language for optimizing software libraries. In Proceedings of the 2nd Conference on Domain-specific Languages, pages 39--52, 1999.]]
[12]
S. Z. Guyer and C. Lin. Broadway: A compiler for exploiting the domain-specific semantics of software libraries. Proceedings of the IEEE, 93(2), Feb, 2005. Special Issue on Program Generation, Optimization, and Platform Adaption.]]
[13]
H. Han and C.-W. Tseng. A comparison of locality transformations for irregular codes. In Proceedings of Workshop on Languages, Compilers, and Runtime-Systems for Scalable Computers, pages 70--84, 2000.]]
[14]
G. Jin and J. Mellor-Crummey. Experiences tuning smg98---a semicoarsening multigrid benchmark based on the hypre library. In Proceedings of the International Conference on Supercomputing, pages 305--314, June 2002.]]
[15]
K. Kennedy, B. Broom, K. D. Cooper, J. Dongarra, R. J. Fowler, D. Gannon, S. L. Johnsson, J. M. Mellor-Crummey, and L. Torczon. Telescoping languages: A strategy for automatic generation of scientific problem-solving systems from annotated libraries. Journal of Parallel and Distributed Computing, 61(12):1803--1826, Dec. 2001.]]
[16]
J. Mellor-Crummey, D. Whalley, and K. Kennedy. Improving memory hierarchy performance for irregular applications using data and computation reorderings. International Journal of Parallel Programming, 28(3), June 2001.]]
[17]
V. Menon and K. Pingali. High-level semantic optimization of numerical codes. In Proceedings of the International Conference on Supercomputing, pages 434--443, June 1999.]]
[18]
N. Mitchell, L. Carter, and J. Ferrante. Localizing non-affine array references. In Proceedings of the 1999 International Conference on Parallel Architectures and Compilation Techniques, pages 192--202, 1999.]]
[19]
T. Mohan, B. de Supinski, S. McKee, F. Mueller, A. Yoo, and M. Schulz. Identifying and exploiting spatial regularity in data memory references. In Proceedings of IEEE/ACM Supercomputing '03, Nov. 2003.]]
[20]
P. J. Moran. Field model: An object-oriented data model for fields. Technical Report NAS-01-005, NASA Ames Research Center, 2001.]]
[21]
D. J. Quinlan, M. Schordan, B. Miller, and M. Kowarschik. Parallel object-oriented framework optimization. Concurrency and Computation: Practice and Experience, 16(2--3):293--302, Feb. 2004.]]
[22]
S. Rus, L. Rauchwerger, and J. Hoeflinger. Hybrid analysis: static & dynamic memory reference analysis. In Proceedings of the International Conference on Supercomputing, pages 274--284, June 2002.]]
[23]
M. Schordan and D. Quinlan. A source-to-source architecture for user-defined optimizations. In Proceedings of the Joint Modular Languages Conference (JMLC'03), Lecture Notes in Computer Science, pages 214--223, Aug. 2003. vol. 2789, Springer-Verlag.]]
[24]
M. M. Strout, L. Carter, and J. Ferrante. Compile-time composition of run-time data and iteration reorderings. In Proceedings of the 2003 ACM SIGPLAN Conference on Programming Language Design and Implementation, pages 91--102, June 2003.]]
[25]
N. Trickett, T. Nakagawa, R. Mani, and D. Gfroerer. Understanding IBM e-server pSeries Performance and Sizing. IBM, Feb. 2001. POWER3 node specs.]]
[26]
J. S. Vetter and A. Yoo. An empirical performance evaluation of scalable scientific applications. In Proceedings of IEEE/ACM Supercomputing '02, Nov. 2002.]]
[27]
K. A. Yelick, L. Semenzato, G. Pike, C. Miyamoto, B. Liblit, A. Krishnamurthy, P. N. Hilfinger, S. L. Graham, D. Gay, P. Colella, and A. Aiken. Titanium: A high-performance java dialect. Concurrency: Practice and Experience, 10(11--13):825--836, Sept. 1998.]]
[28]
Q. Yi and D. Quinlan. Applying loop optimizations to object-oriented abstractions through general classification of array semantics. In Proceedings of the 17th International Workshop on Languages and Compilers for Parallel Computing, Sept. 2004.]]

Cited By

View all
  • (2022)An estimation of water levels associated with a storm along the coast of Bangladesh using a non-central difference method of linesOcean Engineering10.1016/j.oceaneng.2022.110776248(110776)Online publication date: Mar-2022
  • (2020)Generation of Block Structured Grids on Complex Domains for High Performance SimulationComputational Mathematics and Mathematical Physics10.1134/S096554251912018259:12(2108-2123)Online publication date: 18-Feb-2020
  • (2019)Generation of Block Structured Grids on Complex Domains for High Performance SimulationNumerical Geometry, Grid Generation and Scientific Computing10.1007/978-3-030-23436-2_6(87-99)Online publication date: 11-Oct-2019
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ICS '05: Proceedings of the 19th annual international conference on Supercomputing
June 2005
414 pages
ISBN:1595931678
DOI:10.1145/1088149
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 June 2005

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Article

Conference

ICS05
Sponsor:
ICS05: International Conference on Supercomputing 2005
June 20 - 22, 2005
Massachusetts, Cambridge

Acceptance Rates

Overall Acceptance Rate 629 of 2,180 submissions, 29%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)4
  • Downloads (Last 6 weeks)1
Reflects downloads up to 08 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2022)An estimation of water levels associated with a storm along the coast of Bangladesh using a non-central difference method of linesOcean Engineering10.1016/j.oceaneng.2022.110776248(110776)Online publication date: Mar-2022
  • (2020)Generation of Block Structured Grids on Complex Domains for High Performance SimulationComputational Mathematics and Mathematical Physics10.1134/S096554251912018259:12(2108-2123)Online publication date: 18-Feb-2020
  • (2019)Generation of Block Structured Grids on Complex Domains for High Performance SimulationNumerical Geometry, Grid Generation and Scientific Computing10.1007/978-3-030-23436-2_6(87-99)Online publication date: 11-Oct-2019
  • (2014)Real-time non-rigid reconstruction using an RGB-D cameraACM Transactions on Graphics10.1145/2601097.260116533:4(1-12)Online publication date: 27-Jul-2014
  • (2014)Computer performance analysis and the Pi TheoremComputer Science - Research and Development10.1007/s00450-010-0147-829:1(45-71)Online publication date: 1-Feb-2014
  • (2013)Simulation of tropical-cyclone-like vortices in shallow-water ICON-hex using goal-oriented r-adaptivityTheoretical and Computational Fluid Dynamics10.1007/s00162-013-0303-428:1(107-128)Online publication date: 30-May-2013
  • (2013)Heterogeneous combinatorial candidate generationProceedings of the 19th international conference on Parallel Processing10.1007/978-3-642-40047-6_75(751-762)Online publication date: 26-Aug-2013
  • (2012)Developing an ordering-based renumbering approach for triangular unstructured gridsEngineering with Computers10.1007/s00366-012-0259-929:2(225-243)Online publication date: 22-Feb-2012
  • (2009)A component model of spatial localityProceedings of the 2009 international symposium on Memory management10.1145/1542431.1542446(99-108)Online publication date: 19-Jun-2009
  • (2009)CLOMPInternational Journal of Parallel Programming10.1007/s10766-009-0096-737:3(250-265)Online publication date: 1-Jun-2009
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media