research-article

Public Access

WorkStream -- A Design Pattern for Multicore-Enabled Finite Element Computations

Authors:

Bruno Turcksin,

Martin Kronbichler,

Wolfgang BangerthAuthors Info & Claims

ACM Transactions on Mathematical Software (TOMS), Volume 43, Issue 1

Article No.: 2, Pages 1 - 29

https://doi.org/10.1145/2851488

Published: 29 August 2016 Publication History

Abstract

Many operations that need to be performed in modern finite element codes can be described as an operation that needs to be done independently on every cell, followed by a reduction of these local results into a global data structure. For example, matrix assembly, estimating discretization errors, or converting nodal values into data structures that can be output in visualization file formats all fall into this class of operations. Using this realization, we identify a software design pattern that we call WorkStream and that can be used to model such operations and enables the use of multicore shared memory parallel processing. We also describe in detail how this design pattern can be efficiently implemented, and we provide numerical scalability results from its use in the deal.II software library.

References

[1]

G. Amdahl. 1967. Validity of the single processor approach to achieving large scale computing capabilities. A FIPS Conference Proceedings 30 (1967), 483--485.

Digital Library

[2]

Krzysztof Banaś. 2004. A modular design for parallel adaptive finite element computational kernels. In Computational Science -- ICCS 2004, Marian Bubak, GeertDick van Albada, Peter M. A. Sloot, and Jack Dongarra (Eds.). Lecture Notes in Computer Science, Vol. 3037. Springer, 155--162.

[3]

W. Bangerth, R. Hartmann, and G. Kanschat. 2007. deal.II -- A general purpose object oriented finite element library. ACM Transactions on Mathematical Software 33, 4 (2007), 24/1--24/27.

Digital Library

[4]

Andrew C. Bauer and Abani K. Patra. 2004. Robust and efficient domain decomposition preconditioners for adaptive hp finite element approximations of linear elasticity with and without discontinuous coefficients. International Journal for Numerical Methods in Engineering 59 (2004), 337--364.

[5]

B. Bergen, F. Hülsemann, and U. Rüde. 2005. Is 1.7 × 10¹⁰ unknowns the largest finite element system that can be solved today? In Proceedings of the ACM/IEEE Conference on Supercomputing (SC’05). 5:1--5:14.

Digital Library

[6]

P. Berger, P. Brouaye, and J. C. Syre. 1982. A mesh coloring method for efficient MIMD processing in finite element problems. In Proceedings of the International Conference on Parallelism (ICPP’82) August 24--27. Bellaire, Michigan, 41--46.

[7]

Daniel Brélaz. 1979. New methods to color the vertices of a graph. Communication of the ACM 22, 4 (April 1979), 251--256.

Digital Library

[8]

G. F. Carey, E. Barragy, R. McLay, and M. Sharma. 1988. Element-by-element vector and parallel computations. Communications in Applied Numerical Methods 4 (1988), 299--307.

[9]

Cris Cecka, Adrian J. Lew, and Eric Darve. 2011. Assembly of finite element methods on graphics processors. International Journal for Numerical Methods in Engineering 85, 5 (2011), 640--669.

[10]

T. Y. P. Chang, J. A. Hulzen, and P. S. Wang. 1984. Code generation and optimization for finite element analysis. In EUROSAM 84, John Fitch (Ed.). Lecture Notes in Computer Science, Vol. 174. Springer, 237--247.

Digital Library

[11]

Cheng-Tao Chu, Sang Kyun Kim, Yi-An Lin, Yuan Yuan Yu, Gary Bradski, Andrew Y. Ng, and Kunle Olukotun. 2006. Map-reduce for machine learning on multicore. In Proceedings of Neural Information Processing Systems Conference (NIPS). Vancouver, Canada.

Digital Library

[12]

Andrew Corrigan, Fernando F. Camelli, Rainald Löhner, and John Wallin. 2011. Running unstructured grid-based CFD solvers on modern graphics hardware. International Journal for Numerical Methods in Fluids 66, 2 (2011), 221--229.

[13]

Jeffrey Dean and Sanjay Ghemawat. 2008. MapReduce: Simplified data processing on large clusters. Communications of the ACM 51, 1 (2008), 107--113.

Digital Library

[14]

Karen D. Devine and Joseph E. Flaherty. 1996. Parallel adaptive hp-refinement techniques for conservation laws. Applied Numerical Methods. 20 (1996), 367--386.

Digital Library

[15]

Charbel Farhat. 1988. A simple and efficient automatic fem domain decomposer. Computers & Structures 28, 5 (1988), 579--602.

[16]

C. Farhat and L. Crivelli. 1989. A general approach to nonlinear finite-element computations on shared-memory multiprocessors. Computer Methods in Applied Mechanics and Engineering 72, 2 (1989), 153--171.

Digital Library

[17]

Mike Folk, Albert Cheng, and Kim Yates. 1999. HDF5: A file format and I/O library for high performance computing applications. In Proceedings of the ACM/IEEE Conference on Supercomputing (SC’99).

[18]

J. Frohne, T. Heister, and W. Bangerth. 2015. Efficient numerical methods for the large-scale, parallel solution of elastoplastic contact problems. Submitted (2015).

[19]

E. Gamma, R. Helm, R. Johnson, and J. Vlissides. 1994. Design Patterns: Elements of Reusable Object-Oriented Software. Addison-Wesley Professional.

Digital Library

[20]

M. W. Gee, C. M. Siefert, J. J. Hu, R. S. Tuminaro, and M. G. Sala. 2006. ML 5.0 Smoothed Aggregation User’s Guide. Technical Report SAND2006-2649. Sandia National Laboratories.

[21]

Alan George. 1971. Computer Implementation of the Finite Element Method. Technical Report STAN-CS-7J-208. Stanford University.

[22]

Alan George. 1973. Nested dissection of a regular finite element mesh. SIAM Journal on Numerical Analysis 10, 2 (1973), 345--363.

Digital Library

[23]

Konstantinos M. Giannoutakis and George A. Gravvanis. 2009. Design and implementation of parallel approximate inverse classes using OpenMP. Concurrency and Computation Practice E. 21, 2 (2009), 115--131.

Digital Library

[24]

D. Göddeke, R. Strzodka, J. Mohd-Yusof, P. McCormick, H. Wobker, C. Becker, and S. Turek. 2008. Using GPUs to improve multigrid solver performance on a cluster. International Journal of Computational Science and Engineering 4, 1 (2008), 36--55.

Digital Library

[25]

J. L. Henning. 2007. Performance counters and development of SPEC CPU2006. ACM SIGARCH Computer Architecture News 35 (2007), 118--121.

Digital Library

[26]

M. A. Heroux and others. 2014. Trilinos web page. Retrieved from http://trilinos.sandia.gov.

[27]

M. A. Heroux, R. A. Bartlett, V. E. Howle, R. J. Hoekstra, J. J. Hu, T. G. Kolda, R. B. Lehoucq, K. R. Long, R. P. Pawlowski, E. T. Phipps, A. G. Salinger, H. K. Thornquist, R. S. Tuminaro, J. M. Willenbring, A. Williams, and K. S. Stanley. 2005. An overview of the Trilinos project. ACM Transactions on Mathematical Software 31 (2005), 397--423.

Digital Library

[28]

Grand Roman Joldes, Adam Wittek, and Karol Miller. 2010. Real-time nonlinear finite element computations on GPU application to neurosurgical simulation. Computer Methods in Applied Mechanics and Engineering 199, 49--52 (2010), 3305--3314.

[29]

A. Klöckner, T. Warburton, J. Bridge, and J. S. Hesthaven. 2009. Nodal discontinuous Galerkin methods on graphics processors. Journal of Computational Physics 228, 21 (2009), 7863--7882.

Digital Library

[30]

Matthew G. Knepley and Andy R. Terrel. 2013. Finite element integration on GPUs. ACM Transactions on Mathematical Software 39, 2 (2013), 10:1--10:13.

Digital Library

[31]

D. Komatitsch, G. Erlebacher, D. Göddeke, and D. Michéa. 2010. High-order finite-element seismic wave propagation modeling with MPI on a large GPU cluster. Journal of Computational Physics 229 (2010), 7692--7714.

Digital Library

[32]

D. Komatitsch, D. Michéa, and G. Erlebacher. 2009. Porting a high-order finite-element earthquake modeling application to NVIDIA graphics cards using CUDA. Journal of Parallel and Distributed Computing 69, 5 (2009), 451--460.

Digital Library

[33]

K. Kormann and M. Kronbichler. 2011. Parallel finite element operator application: Graph partitioning and coloring. In Proceedings of the 7th IEEE International Conference on eScience. 332--339.

Digital Library

[34]

M. Kronbichler, W. Bangerth, and T. Heister. 2013. The deal.II Tutorial: Step-32. http://www.dealii.org/ developer/doxygen/deal.II/step_32.html.

[35]

M. Kronbichler, T. Heister, and W. Bangerth. 2012. High accuracy mantle convection simulation through modern numerical methods. Geophysics Journal International 191 (2012), 12--29.

[36]

M. Kronbichler and K. Kormann. 2012. A generic interface for parallel cell-based finite element operator application. Computers & Fluids 63 (2012), 135--147.

[37]

Marek Kubale. 2004. Graph Colorings. American Mathematical Society.

[38]

Andras Laszloffy, Jingping Long, and Abani K. Patra. 2000. Simple data management, scheduling and solution strategies for managing the irregularities in parallel adaptive hp finite element simulations. Parallel Computing 26 (2000), 1765--1788.

Digital Library

[39]

Kincho H. Law. 1986. A parallel finite element solution method. Computers & Structures 23, 6 (1986), 845--858.

[40]

Jimmy Lin and Chris Dyer. 2010. Data-Intensive Text Processsing with MapReduce. Morgan & Claypool.

Digital Library

[41]

Anders Logg, Kent-Andre Mardal, and Garth Wells. 2012. Automated Solution of Differential Equations by the Finite Element Method: The FEniCS Book. Lecture Notes in Computational Science and Engineering, Vol. 84. Springer.

[42]

Rainald Löhner and Joseph D. Baum. 2013. Handling tens of thousands of cores with industrial/legacy codes: Approaches, implementation and timings. Computers & Fluids 85 (2013), 53--62.

[43]

Rainald Löhner and Martin Galle. 2002. Minimization of indirect addressing for edge-based field solvers. Communications in Numerical Methods in Engineering 18, 5 (2002), 335--343.

[44]

G. Mahinthakumar and F. Saied. 2002. A hyrbid MPI-OpenMP implementation of an implicit finite-element code on parallel architectures. International Journal of High Performance Computing 16, 4 (2002), 371--393.

[45]

G. R. Markall, A. Slemmer, D. A. Ham, P. H. J. Kelly, C. D. Cantwell, and S. J. Sherwin. 2013. Finite element assembly strategies on multi-core and many-core architectures. International Journal for Numerical Methods in Fluids 71, 1 (2013), 80--97.

[46]

Timothy G. Mattson, Beverly A. Sanders, and Berna L. Massingill. 2004. Patterns for Parallel Programming. Addison-Wesley.

Digital Library

[47]

Francis Thomas McKenna. 1997. Object-oriented finite element programming: frameworks for analysis, algorithms and parallel computing. (1997). Ph. D. Thesis, University of California.

[48]

J. M. Melenk, K. Gerdes, and C. Schwab. 2001. Fully discrete hp-finite elements: Fast quadrature. Computer Methods in Applied Mechanics and Engineering 190 (2001), 4339--4364.

[49]

Message Passing Interface Forum. 2012. MPI: A Message-Passing Interface Standard (Version 3.0). Technical Report. http://www.mpi-forum.org/.

[50]

Kengo Nakajima. 2003. OpenMP/MPI hybrid vs. flat MPI on the earth simulator: Parallel iterative solvers for finite element method. In High Performance Computing. Lecture Notes in Computer Science, Vol. 2858. Springer, 486--499.

[51]

Kengo Nakajima. 2005. Parallel iterative solvers for finite-element methods using an OpenMP/MPI hybrid programming model on the earth simulator. Parallel Computing 31, 10--12 (2005), 1048--1065.

Digital Library

[52]

L. Oliker, X. Li, P. Husbands, and R. Biswas. 2002. Effects of ordering strategies and programming paradigms on sparse matrix computations. SIAM Review 44, 3 (2002), 373--393.

Digital Library

[53]

OpenACC. 2014. OpenACC web page. Retrieved from http://www.openacc.org.

[54]

OpenMP. 2014. OpenMP Architecture Review Board. Retrieved from http://www.openmp.org.

[55]

O. Pantalé. 2005. Parallelization of an object-oriented FEM dynamics code: Influence of the strategies on the Speedup. Advances in Engineering Software 36 (2005), 361--373.

Digital Library

[56]

M. Paszyński and L. Demkowicz. 2006. Parallel, fully automatic hp-adaptive 3D finite element package. Engineering with Computers 22, 3--4 (2006), 255--276.

Digital Library

[57]

M. Paszyński, J. Kurtz, and L. Demkowicz. 2006. Parallel, fully automatic hp-adaptive 2d finite element package. Computer Methods in Applied Mechanics and Engineering 195 (2006), 711--741.

[58]

Maciej Paszyński, David Pardo, Carlos Torres-Verdín, Leszek Demkowicz, and Victor Calo. 2010. A parallel direct solver for the self-adaptive hp finite element method. Journal of Parallel and Distributed Computing 70, 3 (2010), 270--281.

Digital Library

[59]

D. A. Patterson and J. L. Hennessy. 2009. Computer Organization and Design (4th ed.). Morgan Kaufmann, Burlington.

Digital Library

[60]

Juan C. Pichel, Francisco F. Rivera, Marcos Fernández, and Aurelio Rodríguez. 2012. Optimization of sparse matrix--Vector multiplication using reordering techniques on GPUs. Microprocessors and Microsystems 36, 2 (2012), 65--77.

Digital Library

[61]

J. Reinders. 2007. Intel Threading Building Blocks. O’Reilly.

Digital Library

[62]

Jean-François Remacle, Ottmar Klaas, Joseph E. Flaherty, and Mark S. Shephard. 2002. Parallel algorithm oriented mesh database. Engineering with Computers 18, 3 (2002), 274--284. 10.1007/s003660200024

[63]

Francis P. Russell and Paul H. J. Kelly. 2013. Optimized code generation for finite element local assembly using symbolic manipulation. ACM Transactions on Mathematical Software 39, 4 (2013), 26:1--26:29. http://dx.doi.org/10.1145/2491491.2491496

Digital Library

[64]

Y. Saad. 2003. Iterative Methods for Sparse Linear Systems (2nd ed.). SIAM, Philadelphia.

Digital Library

[65]

W. Schroeder, K. Martin, and B. Lorensen. 2006. The Visualization Toolkit: An Object-Oriented Approach to 3D Graphics (3rd ed.). Kitware, Inc.

[66]

D. Silvester and A. Wathen. 1994. Fast iterative solution of stabilised stokes systems. Part II: Using general block preconditioners. SIAM Journal of Numerical Analysis 31 (1994), 1352--1367.

Digital Library

[67]

T. Tezduyar, S. Aliabadi, M. Behr, A. Johnson, and S. Mittal. 1993. Parallel finite-element computations of 3d flows. Computer 26 (1993), 27--36.

Digital Library

[68]

R. Tuminaro and C. Tong. 2000. Parallel smoothed aggregation multigrid: Aggregation strategies on massively parallel machines. In Proceedings of the ACM/IEEE Conference on Supercomputing (SC’00), J. Donnelley (Ed.).

Digital Library

[69]

R. Vuduc, A. Chandramowlishwaran, J. Choi, M. Guney, and A. Shringarpure. 2010. On the limits of GPU acceleration. In Proceedings of the 2nd USENIX Conference on Hot Topics in Parallelism (HotPar’10). USENIX Association, Berkeley, CA, USA, 13--19. http://dl.acm.org/citation.cfm?id=1863086.1863099

Digital Library

[70]

E. Wadbro and M. Berggren. 2009. Megapixel topology optimization on a graphics processing unit. SIAM Review 51 (2009), 707--721. Issue 4.

Digital Library

[71]

J. White and R. I. Borja. 2011. Block-preconditioned Newton-Krylov solvers for fully coupled flow and geomechanics. Computers & Geosciences 15 (2011), 647--659.

[72]

S. Williams, N. Bell, J. Choi, M. Garland, L. Oliker, and R. Vuduc. 2010. Sparse matrix vector multiplication on multicore and accelerator systems. In Scientific Computing with Multicore Processors and Accelerators, J. Dongarra, D. A. Bader, and J. Kurzak (Eds.). CRC Press.

[73]

S. Williams, L. Oliker, R. Vuduc, J. Shalf, K. Yelick, and J. W. Demmel. 2007. Optimization of sparse matrix-vector multiplication on emerging multicore platforms. In Proceedings of the High Performance Computing, Networking, and Storage Conference (SC 2007). 10--16.

Digital Library

[74]

G. Yagawa and R. Shioya. 1993. Parallel finite elements on a massively parallel computer with domain decomposition. Computing Systems in Science and Engineering 4, 4--6 (1993), 495--503.

Cited By

Africa PArndt DBangerth WBlais BFehling MGassmöller RHeister THeltai LKinnewig SKronbichler MMaier MMunch PSchreter-Fleischhacker MThiele JTurcksin BWells DYushutin V(2024)The deal.II library, Version 9.6Journal of Numerical Mathematics10.1515/jnma-2024-013732:4(369-380)Online publication date: 26-Nov-2024
https://doi.org/10.1515/jnma-2024-0137
Krysl P(2024)Parallel assembly of finite element matrices on multicore computersComputer Methods in Applied Mechanics and Engineering10.1016/j.cma.2024.117076428(117076)Online publication date: Aug-2024
https://doi.org/10.1016/j.cma.2024.117076
Ogiermann DBalzani DPerotti L(2024)An explicit local space-time adaptive framework for monodomain models in cardiac electrophysiologyComputer Methods in Applied Mechanics and Engineering10.1016/j.cma.2024.116806422(116806)Online publication date: Mar-2024
https://doi.org/10.1016/j.cma.2024.116806
Show More Cited By

WorkStream -- A Design Pattern for Multicore-Enabled Finite Element Computations

Recommendations

A guide to the finite and virtual element methods for elasticity
Abstract
We present a systematic description and comparison of the Finite Element Method (FEM) with the relatively new Virtual Element Method (VEM) for solving boundary value problems in linear elasticity, including primal and mixed ...
Mortar finite volume element method with Crouzeix--Raviart element for parabolic problems

In this paper, we consider a semi-discrete mortar finite volume element method for two-dimensional parabolic problems. This method is based on the mortar Crouzeix-Raviart non-conforming finite element space. It is proved that the mortar finite volume ...
Superconvergence of Mixed Finite Element Approximations over Quadrilaterals

A superconvergence result is established in this article for approximate solutions of second-order elliptic equations by mixed finite element methods over quadrilaterals. The superconvergence indicates an accuracy of ${\cal O}(h^{k+2})$ for the mixed ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Mathematical Software

ACM Transactions on Mathematical Software Volume 43, Issue 1

March 2017

202 pages

ISSN:0098-3500

EISSN:1557-7295

DOI:10.1145/2987591

Editor:
Michael A. Heroux
Sandia National Laboratories, USA

Issue’s Table of Contents

Copyright © 2016 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 29 August 2016

Accepted: 01 November 2015

Revised: 01 November 2015

Received: 01 December 2013

Published in TOMS Volume 43, Issue 1

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Funding Sources

NSF

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

19
Total Citations
View Citations
527
Total Downloads

Downloads (Last 12 months)131
Downloads (Last 6 weeks)14

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Africa PArndt DBangerth WBlais BFehling MGassmöller RHeister THeltai LKinnewig SKronbichler MMaier MMunch PSchreter-Fleischhacker MThiele JTurcksin BWells DYushutin V(2024)The deal.II library, Version 9.6Journal of Numerical Mathematics10.1515/jnma-2024-013732:4(369-380)Online publication date: 26-Nov-2024
https://doi.org/10.1515/jnma-2024-0137
Krysl P(2024)Parallel assembly of finite element matrices on multicore computersComputer Methods in Applied Mechanics and Engineering10.1016/j.cma.2024.117076428(117076)Online publication date: Aug-2024
https://doi.org/10.1016/j.cma.2024.117076
Ogiermann DBalzani DPerotti L(2024)An explicit local space-time adaptive framework for monodomain models in cardiac electrophysiologyComputer Methods in Applied Mechanics and Engineering10.1016/j.cma.2024.116806422(116806)Online publication date: Mar-2024
https://doi.org/10.1016/j.cma.2024.116806
Mzid RSelvi SAbid M(2024)Research Landscape of Patterns in Software Engineering: Taxonomy, State-of-the-Art, and Future DirectionsSN Computer Science10.1007/s42979-024-02767-85:4Online publication date: 8-Apr-2024
https://dl.acm.org/doi/10.1007/s42979-024-02767-8
Balzani DKöhler MNeumeier TPeter MPeterseim D(2024)Multidimensional rank-one convexification of incremental damage models at finite strainsComputational Mechanics10.1007/s00466-023-02354-373:1(27-47)Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.1007/s00466-023-02354-3
Arndt DBangerth WBergbauer MFeder MFehling MHeinz JHeister THeltai LKronbichler MMaier MMunch PPelteret JTurcksin BWells DZampini S(2023)The deal.II Library, Version 9.5Journal of Numerical Mathematics10.1515/jnma-2023-008931:3(231-246)Online publication date: 22-Aug-2023
https://doi.org/10.1515/jnma-2023-0089
Wú QLin SUnger A(2023)Modeling Multi‐Material Structural Patterns in Tectonic Flow With a Discontinuous Galerkin Level Set MethodJournal of Geophysical Research: Solid Earth10.1029/2023JB026806128:11Online publication date: 20-Nov-2023
https://doi.org/10.1029/2023JB026806
Arndt DBangerth WFeder MFehling MGassmöller RHeister THeltai LKronbichler MMaier MMunch PPelteret JSticko STurcksin BWells D(2022)The deal.II library, Version 9.4Journal of Numerical Mathematics10.1515/jnma-2022-005430:3(231-246)Online publication date: 17-Jul-2022
https://doi.org/10.1515/jnma-2022-0054
Arndt DBangerth WBlais BFehling MGassmöller RHeister THeltai LKöcher UKronbichler MMaier MMunch PPelteret JProell SSimon KTurcksin BWells DZhang J(2021)The deal.II Library, Version 9.3Journal of Numerical Mathematics10.1515/jnma-2021-0081Online publication date: 1-Aug-2021
https://doi.org/10.1515/jnma-2021-0081
Kumar PSteinmann PMergheim J(2021)Enhanced computational homogenization techniques for modelling size effects in polymer compositesComputational Mechanics10.1007/s00466-021-02037-xOnline publication date: 18-Jun-2021
https://doi.org/10.1007/s00466-021-02037-x
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Figures

Tables

Media

View Issue’s Table of Contents