skip to main content
research-article
Public Access

WorkStream -- A Design Pattern for Multicore-Enabled Finite Element Computations

Published: 29 August 2016 Publication History

Abstract

Many operations that need to be performed in modern finite element codes can be described as an operation that needs to be done independently on every cell, followed by a reduction of these local results into a global data structure. For example, matrix assembly, estimating discretization errors, or converting nodal values into data structures that can be output in visualization file formats all fall into this class of operations. Using this realization, we identify a software design pattern that we call WorkStream and that can be used to model such operations and enables the use of multicore shared memory parallel processing. We also describe in detail how this design pattern can be efficiently implemented, and we provide numerical scalability results from its use in the deal.II software library.

References

[1]
G. Amdahl. 1967. Validity of the single processor approach to achieving large scale computing capabilities. A FIPS Conference Proceedings 30 (1967), 483--485.
[2]
Krzysztof Banaś. 2004. A modular design for parallel adaptive finite element computational kernels. In Computational Science -- ICCS 2004, Marian Bubak, GeertDick van Albada, Peter M. A. Sloot, and Jack Dongarra (Eds.). Lecture Notes in Computer Science, Vol. 3037. Springer, 155--162.
[3]
W. Bangerth, R. Hartmann, and G. Kanschat. 2007. deal.II -- A general purpose object oriented finite element library. ACM Transactions on Mathematical Software 33, 4 (2007), 24/1--24/27.
[4]
Andrew C. Bauer and Abani K. Patra. 2004. Robust and efficient domain decomposition preconditioners for adaptive hp finite element approximations of linear elasticity with and without discontinuous coefficients. International Journal for Numerical Methods in Engineering 59 (2004), 337--364.
[5]
B. Bergen, F. Hülsemann, and U. Rüde. 2005. Is 1.7 × 1010 unknowns the largest finite element system that can be solved today? In Proceedings of the ACM/IEEE Conference on Supercomputing (SC’05). 5:1--5:14.
[6]
P. Berger, P. Brouaye, and J. C. Syre. 1982. A mesh coloring method for efficient MIMD processing in finite element problems. In Proceedings of the International Conference on Parallelism (ICPP’82) August 24--27. Bellaire, Michigan, 41--46.
[7]
Daniel Brélaz. 1979. New methods to color the vertices of a graph. Communication of the ACM 22, 4 (April 1979), 251--256.
[8]
G. F. Carey, E. Barragy, R. McLay, and M. Sharma. 1988. Element-by-element vector and parallel computations. Communications in Applied Numerical Methods 4 (1988), 299--307.
[9]
Cris Cecka, Adrian J. Lew, and Eric Darve. 2011. Assembly of finite element methods on graphics processors. International Journal for Numerical Methods in Engineering 85, 5 (2011), 640--669.
[10]
T. Y. P. Chang, J. A. Hulzen, and P. S. Wang. 1984. Code generation and optimization for finite element analysis. In EUROSAM 84, John Fitch (Ed.). Lecture Notes in Computer Science, Vol. 174. Springer, 237--247.
[11]
Cheng-Tao Chu, Sang Kyun Kim, Yi-An Lin, Yuan Yuan Yu, Gary Bradski, Andrew Y. Ng, and Kunle Olukotun. 2006. Map-reduce for machine learning on multicore. In Proceedings of Neural Information Processing Systems Conference (NIPS). Vancouver, Canada.
[12]
Andrew Corrigan, Fernando F. Camelli, Rainald Löhner, and John Wallin. 2011. Running unstructured grid-based CFD solvers on modern graphics hardware. International Journal for Numerical Methods in Fluids 66, 2 (2011), 221--229.
[13]
Jeffrey Dean and Sanjay Ghemawat. 2008. MapReduce: Simplified data processing on large clusters. Communications of the ACM 51, 1 (2008), 107--113.
[14]
Karen D. Devine and Joseph E. Flaherty. 1996. Parallel adaptive hp-refinement techniques for conservation laws. Applied Numerical Methods. 20 (1996), 367--386.
[15]
Charbel Farhat. 1988. A simple and efficient automatic fem domain decomposer. Computers & Structures 28, 5 (1988), 579--602.
[16]
C. Farhat and L. Crivelli. 1989. A general approach to nonlinear finite-element computations on shared-memory multiprocessors. Computer Methods in Applied Mechanics and Engineering 72, 2 (1989), 153--171.
[17]
Mike Folk, Albert Cheng, and Kim Yates. 1999. HDF5: A file format and I/O library for high performance computing applications. In Proceedings of the ACM/IEEE Conference on Supercomputing (SC’99).
[18]
J. Frohne, T. Heister, and W. Bangerth. 2015. Efficient numerical methods for the large-scale, parallel solution of elastoplastic contact problems. Submitted (2015).
[19]
E. Gamma, R. Helm, R. Johnson, and J. Vlissides. 1994. Design Patterns: Elements of Reusable Object-Oriented Software. Addison-Wesley Professional.
[20]
M. W. Gee, C. M. Siefert, J. J. Hu, R. S. Tuminaro, and M. G. Sala. 2006. ML 5.0 Smoothed Aggregation User’s Guide. Technical Report SAND2006-2649. Sandia National Laboratories.
[21]
Alan George. 1971. Computer Implementation of the Finite Element Method. Technical Report STAN-CS-7J-208. Stanford University.
[22]
Alan George. 1973. Nested dissection of a regular finite element mesh. SIAM Journal on Numerical Analysis 10, 2 (1973), 345--363.
[23]
Konstantinos M. Giannoutakis and George A. Gravvanis. 2009. Design and implementation of parallel approximate inverse classes using OpenMP. Concurrency and Computation Practice E. 21, 2 (2009), 115--131.
[24]
D. Göddeke, R. Strzodka, J. Mohd-Yusof, P. McCormick, H. Wobker, C. Becker, and S. Turek. 2008. Using GPUs to improve multigrid solver performance on a cluster. International Journal of Computational Science and Engineering 4, 1 (2008), 36--55.
[25]
J. L. Henning. 2007. Performance counters and development of SPEC CPU2006. ACM SIGARCH Computer Architecture News 35 (2007), 118--121.
[26]
M. A. Heroux and others. 2014. Trilinos web page. Retrieved from http://trilinos.sandia.gov.
[27]
M. A. Heroux, R. A. Bartlett, V. E. Howle, R. J. Hoekstra, J. J. Hu, T. G. Kolda, R. B. Lehoucq, K. R. Long, R. P. Pawlowski, E. T. Phipps, A. G. Salinger, H. K. Thornquist, R. S. Tuminaro, J. M. Willenbring, A. Williams, and K. S. Stanley. 2005. An overview of the Trilinos project. ACM Transactions on Mathematical Software 31 (2005), 397--423.
[28]
Grand Roman Joldes, Adam Wittek, and Karol Miller. 2010. Real-time nonlinear finite element computations on GPU application to neurosurgical simulation. Computer Methods in Applied Mechanics and Engineering 199, 49--52 (2010), 3305--3314.
[29]
A. Klöckner, T. Warburton, J. Bridge, and J. S. Hesthaven. 2009. Nodal discontinuous Galerkin methods on graphics processors. Journal of Computational Physics 228, 21 (2009), 7863--7882.
[30]
Matthew G. Knepley and Andy R. Terrel. 2013. Finite element integration on GPUs. ACM Transactions on Mathematical Software 39, 2 (2013), 10:1--10:13.
[31]
D. Komatitsch, G. Erlebacher, D. Göddeke, and D. Michéa. 2010. High-order finite-element seismic wave propagation modeling with MPI on a large GPU cluster. Journal of Computational Physics 229 (2010), 7692--7714.
[32]
D. Komatitsch, D. Michéa, and G. Erlebacher. 2009. Porting a high-order finite-element earthquake modeling application to NVIDIA graphics cards using CUDA. Journal of Parallel and Distributed Computing 69, 5 (2009), 451--460.
[33]
K. Kormann and M. Kronbichler. 2011. Parallel finite element operator application: Graph partitioning and coloring. In Proceedings of the 7th IEEE International Conference on eScience. 332--339.
[34]
M. Kronbichler, W. Bangerth, and T. Heister. 2013. The deal.II Tutorial: Step-32. http://www.dealii.org/ developer/doxygen/deal.II/step_32.html.
[35]
M. Kronbichler, T. Heister, and W. Bangerth. 2012. High accuracy mantle convection simulation through modern numerical methods. Geophysics Journal International 191 (2012), 12--29.
[36]
M. Kronbichler and K. Kormann. 2012. A generic interface for parallel cell-based finite element operator application. Computers & Fluids 63 (2012), 135--147.
[37]
Marek Kubale. 2004. Graph Colorings. American Mathematical Society.
[38]
Andras Laszloffy, Jingping Long, and Abani K. Patra. 2000. Simple data management, scheduling and solution strategies for managing the irregularities in parallel adaptive hp finite element simulations. Parallel Computing 26 (2000), 1765--1788.
[39]
Kincho H. Law. 1986. A parallel finite element solution method. Computers & Structures 23, 6 (1986), 845--858.
[40]
Jimmy Lin and Chris Dyer. 2010. Data-Intensive Text Processsing with MapReduce. Morgan & Claypool.
[41]
Anders Logg, Kent-Andre Mardal, and Garth Wells. 2012. Automated Solution of Differential Equations by the Finite Element Method: The FEniCS Book. Lecture Notes in Computational Science and Engineering, Vol. 84. Springer.
[42]
Rainald Löhner and Joseph D. Baum. 2013. Handling tens of thousands of cores with industrial/legacy codes: Approaches, implementation and timings. Computers & Fluids 85 (2013), 53--62.
[43]
Rainald Löhner and Martin Galle. 2002. Minimization of indirect addressing for edge-based field solvers. Communications in Numerical Methods in Engineering 18, 5 (2002), 335--343.
[44]
G. Mahinthakumar and F. Saied. 2002. A hyrbid MPI-OpenMP implementation of an implicit finite-element code on parallel architectures. International Journal of High Performance Computing 16, 4 (2002), 371--393.
[45]
G. R. Markall, A. Slemmer, D. A. Ham, P. H. J. Kelly, C. D. Cantwell, and S. J. Sherwin. 2013. Finite element assembly strategies on multi-core and many-core architectures. International Journal for Numerical Methods in Fluids 71, 1 (2013), 80--97.
[46]
Timothy G. Mattson, Beverly A. Sanders, and Berna L. Massingill. 2004. Patterns for Parallel Programming. Addison-Wesley.
[47]
Francis Thomas McKenna. 1997. Object-oriented finite element programming: frameworks for analysis, algorithms and parallel computing. (1997). Ph. D. Thesis, University of California.
[48]
J. M. Melenk, K. Gerdes, and C. Schwab. 2001. Fully discrete hp-finite elements: Fast quadrature. Computer Methods in Applied Mechanics and Engineering 190 (2001), 4339--4364.
[49]
Message Passing Interface Forum. 2012. MPI: A Message-Passing Interface Standard (Version 3.0). Technical Report. http://www.mpi-forum.org/.
[50]
Kengo Nakajima. 2003. OpenMP/MPI hybrid vs. flat MPI on the earth simulator: Parallel iterative solvers for finite element method. In High Performance Computing. Lecture Notes in Computer Science, Vol. 2858. Springer, 486--499.
[51]
Kengo Nakajima. 2005. Parallel iterative solvers for finite-element methods using an OpenMP/MPI hybrid programming model on the earth simulator. Parallel Computing 31, 10--12 (2005), 1048--1065.
[52]
L. Oliker, X. Li, P. Husbands, and R. Biswas. 2002. Effects of ordering strategies and programming paradigms on sparse matrix computations. SIAM Review 44, 3 (2002), 373--393.
[53]
OpenACC. 2014. OpenACC web page. Retrieved from http://www.openacc.org.
[54]
OpenMP. 2014. OpenMP Architecture Review Board. Retrieved from http://www.openmp.org.
[55]
O. Pantalé. 2005. Parallelization of an object-oriented FEM dynamics code: Influence of the strategies on the Speedup. Advances in Engineering Software 36 (2005), 361--373.
[56]
M. Paszyński and L. Demkowicz. 2006. Parallel, fully automatic hp-adaptive 3D finite element package. Engineering with Computers 22, 3--4 (2006), 255--276.
[57]
M. Paszyński, J. Kurtz, and L. Demkowicz. 2006. Parallel, fully automatic hp-adaptive 2d finite element package. Computer Methods in Applied Mechanics and Engineering 195 (2006), 711--741.
[58]
Maciej Paszyński, David Pardo, Carlos Torres-Verdín, Leszek Demkowicz, and Victor Calo. 2010. A parallel direct solver for the self-adaptive hp finite element method. Journal of Parallel and Distributed Computing 70, 3 (2010), 270--281.
[59]
D. A. Patterson and J. L. Hennessy. 2009. Computer Organization and Design (4th ed.). Morgan Kaufmann, Burlington.
[60]
Juan C. Pichel, Francisco F. Rivera, Marcos Fernández, and Aurelio Rodríguez. 2012. Optimization of sparse matrix--Vector multiplication using reordering techniques on GPUs. Microprocessors and Microsystems 36, 2 (2012), 65--77.
[61]
J. Reinders. 2007. Intel Threading Building Blocks. O’Reilly.
[62]
Jean-François Remacle, Ottmar Klaas, Joseph E. Flaherty, and Mark S. Shephard. 2002. Parallel algorithm oriented mesh database. Engineering with Computers 18, 3 (2002), 274--284. 10.1007/s003660200024
[63]
Francis P. Russell and Paul H. J. Kelly. 2013. Optimized code generation for finite element local assembly using symbolic manipulation. ACM Transactions on Mathematical Software 39, 4 (2013), 26:1--26:29. http://dx.doi.org/10.1145/2491491.2491496
[64]
Y. Saad. 2003. Iterative Methods for Sparse Linear Systems (2nd ed.). SIAM, Philadelphia.
[65]
W. Schroeder, K. Martin, and B. Lorensen. 2006. The Visualization Toolkit: An Object-Oriented Approach to 3D Graphics (3rd ed.). Kitware, Inc.
[66]
D. Silvester and A. Wathen. 1994. Fast iterative solution of stabilised stokes systems. Part II: Using general block preconditioners. SIAM Journal of Numerical Analysis 31 (1994), 1352--1367.
[67]
T. Tezduyar, S. Aliabadi, M. Behr, A. Johnson, and S. Mittal. 1993. Parallel finite-element computations of 3d flows. Computer 26 (1993), 27--36.
[68]
R. Tuminaro and C. Tong. 2000. Parallel smoothed aggregation multigrid: Aggregation strategies on massively parallel machines. In Proceedings of the ACM/IEEE Conference on Supercomputing (SC’00), J. Donnelley (Ed.).
[69]
R. Vuduc, A. Chandramowlishwaran, J. Choi, M. Guney, and A. Shringarpure. 2010. On the limits of GPU acceleration. In Proceedings of the 2nd USENIX Conference on Hot Topics in Parallelism (HotPar’10). USENIX Association, Berkeley, CA, USA, 13--19. http://dl.acm.org/citation.cfm?id=1863086.1863099
[70]
E. Wadbro and M. Berggren. 2009. Megapixel topology optimization on a graphics processing unit. SIAM Review 51 (2009), 707--721. Issue 4.
[71]
J. White and R. I. Borja. 2011. Block-preconditioned Newton-Krylov solvers for fully coupled flow and geomechanics. Computers & Geosciences 15 (2011), 647--659.
[72]
S. Williams, N. Bell, J. Choi, M. Garland, L. Oliker, and R. Vuduc. 2010. Sparse matrix vector multiplication on multicore and accelerator systems. In Scientific Computing with Multicore Processors and Accelerators, J. Dongarra, D. A. Bader, and J. Kurzak (Eds.). CRC Press.
[73]
S. Williams, L. Oliker, R. Vuduc, J. Shalf, K. Yelick, and J. W. Demmel. 2007. Optimization of sparse matrix-vector multiplication on emerging multicore platforms. In Proceedings of the High Performance Computing, Networking, and Storage Conference (SC 2007). 10--16.
[74]
G. Yagawa and R. Shioya. 1993. Parallel finite elements on a massively parallel computer with domain decomposition. Computing Systems in Science and Engineering 4, 4--6 (1993), 495--503.

Cited By

View all
  • (2024)The deal.II library, Version 9.6Journal of Numerical Mathematics10.1515/jnma-2024-013732:4(369-380)Online publication date: 26-Nov-2024
  • (2024)Parallel assembly of finite element matrices on multicore computersComputer Methods in Applied Mechanics and Engineering10.1016/j.cma.2024.117076428(117076)Online publication date: Aug-2024
  • (2024)An explicit local space-time adaptive framework for monodomain models in cardiac electrophysiologyComputer Methods in Applied Mechanics and Engineering10.1016/j.cma.2024.116806422(116806)Online publication date: Mar-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Mathematical Software
ACM Transactions on Mathematical Software  Volume 43, Issue 1
March 2017
202 pages
ISSN:0098-3500
EISSN:1557-7295
DOI:10.1145/2987591
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 29 August 2016
Accepted: 01 November 2015
Revised: 01 November 2015
Received: 01 December 2013
Published in TOMS Volume 43, Issue 1

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Finite element algorithms
  2. assembly
  3. pipeline software pattern

Qualifiers

  • Research-article
  • Research
  • Refereed

Funding Sources

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)131
  • Downloads (Last 6 weeks)14
Reflects downloads up to 05 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2024)The deal.II library, Version 9.6Journal of Numerical Mathematics10.1515/jnma-2024-013732:4(369-380)Online publication date: 26-Nov-2024
  • (2024)Parallel assembly of finite element matrices on multicore computersComputer Methods in Applied Mechanics and Engineering10.1016/j.cma.2024.117076428(117076)Online publication date: Aug-2024
  • (2024)An explicit local space-time adaptive framework for monodomain models in cardiac electrophysiologyComputer Methods in Applied Mechanics and Engineering10.1016/j.cma.2024.116806422(116806)Online publication date: Mar-2024
  • (2024)Research Landscape of Patterns in Software Engineering: Taxonomy, State-of-the-Art, and Future DirectionsSN Computer Science10.1007/s42979-024-02767-85:4Online publication date: 8-Apr-2024
  • (2024)Multidimensional rank-one convexification of incremental damage models at finite strainsComputational Mechanics10.1007/s00466-023-02354-373:1(27-47)Online publication date: 1-Jan-2024
  • (2023)The deal.II Library, Version 9.5Journal of Numerical Mathematics10.1515/jnma-2023-008931:3(231-246)Online publication date: 22-Aug-2023
  • (2023)Modeling Multi‐Material Structural Patterns in Tectonic Flow With a Discontinuous Galerkin Level Set MethodJournal of Geophysical Research: Solid Earth10.1029/2023JB026806128:11Online publication date: 20-Nov-2023
  • (2022)The deal.II library, Version 9.4Journal of Numerical Mathematics10.1515/jnma-2022-005430:3(231-246)Online publication date: 17-Jul-2022
  • (2021)The deal.II Library, Version 9.3Journal of Numerical Mathematics10.1515/jnma-2021-0081Online publication date: 1-Aug-2021
  • (2021)Enhanced computational homogenization techniques for modelling size effects in polymer compositesComputational Mechanics10.1007/s00466-021-02037-xOnline publication date: 18-Jun-2021
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Full Access

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media