ABSTRACT
High-resolution climate simulations are increasingly in demand and require tremendous computing resources. In the Community Earth SystemModel (CESM), the Parallel Ocean Model (POP) is computationally expensive for high-resolution grids (e.g., 0.1°) and is frequently the least scalable component of CESM for certain production simulations. In particular, the modified Preconditioned Conjugate Gradient (PCG), used to solve the elliptic system of equations in the barotropic mode, scales poorly at the high core counts, which is problematic for high-resolution simulations. In this work, we demonstrate that the communication costs in the barotropic solver occupy an increasing portion of the total POP execution time as core counts are increased. To mitigate this problem, we implement a preconditioned Chebyshev-type iterative method in POP (called P-CSI), which requires far fewer global reductions than PCG. We also develop an effective block preconditioner based on the Error Vector Propagation Method to attain a competitive convergence rate for P-CSI. We demonstrate that the improved scalability of P-CSI results in a 5.2x speedup of the barotropic mode in high-resolution POP on 16,875 cores, which yields a 1.7x speedup of the overall POP simulation. Further, we ensure that the new solver produces an ocean climate consistent with the original one via an ensemble-based statistical method.
- P. Adamidis, V. Heuveline, and F. Wilhelm. A high-efficient scalable solver for the global ocean/sea-ice model MPIOM. KIT, 2011.Google Scholar
- A. H. Baker, H. Xu, J. M. Dennis, M. N. Levy, D. Nychka, S. A. Mickelson, J. Edwards, M. Vertenstein, and A. Wegener. A methodology for evaluating the impact of data compression on climate simulation data. In Proceedings of the 23rd international symposium on High-performance parallel and distributed computing, pages 203--214. ACM, 2014. Google ScholarDigital Library
- M. Beare and D. Stevens. Optimisation of a parallel ocean general circulation model. In Annales Geophysicae, volume 15, pages 1369--1377. Springer, 1997.Google ScholarCross Ref
- M. Benzi. Preconditioning techniques for large linear systems: a survey. Journal of Computational Physics, 182(2):418--477, 2002. Google ScholarDigital Library
- L. Bergamaschi, G. Gambolati, and G. Pini. A numerical experimental study of inverse preconditioning for the parallel iterative solution to 3d finite element flow equations. Journal of Computational and Applied Mathematics, 210(1):64--70, 2007. Google ScholarDigital Library
- F. O. Bryan, R. Tomas, J. M. Dennis, D. B. Chelton, N. G. Loeb, and J. L. McClean. Frontal scale air-sea interaction in high-resolution coupled climate models. Journal of Climate, 23(23):6277--6291, 2010.Google ScholarCross Ref
- A. T. Chronopoulos and C. W. Gear. S-step iterative methods for symmetric linear systems. J. Comput. Appl. Math., 25(2):153--168, Feb. 1989. Google ScholarDigital Library
- P. Concus, G. Golub, and G. Meurant. Block preconditioning for the conjugate gradient method. SIAM Journal on Scientific and Statistical Computing, 6(1):220--252, 1985.Google ScholarDigital Library
- E. D'Azevedo, V. Eijkhout, and C. Romine. Conjugate gradient algorithms with reduced synchronization overhead on distributed memory multiprocessors. 1999.Google Scholar
- J. Dennis. Inverse space-filling curve partitioning of a global ocean model. In Parallel and Distributed Processing Symposium, 2007. IPDPS 2007. IEEE International, pages 1--10. IEEE, 2007.Google ScholarCross Ref
- J. Dennis, M. Vertenstein, P. Worley, A. Mirin, A. Craig, R. Jacob, and S. Mickelson. Computational performance of ultra-high-resolution capability in the community earth system model. International Journal of High Performance Computing Applications, 26(1):5--16, 2012. Google ScholarDigital Library
- J. M. Dennis and H. M. Tufo. Scaling climate simulation applications on the IBM Blue Gene/L system. IBM Journal of Research and Development, 52(1.2):117--126, jan. 2008. Google ScholarDigital Library
- D. E. Dietrich, M. Marietta, and P. J. Roache. An ocean modelling system with turbulent boundary layers and topography: Numerical description. International journal for numerical methods in fluids, 7(8):833--855, 1987.Google ScholarCross Ref
- K. B. Ferreira, P. Bridges, and R. Brightwell. Characterizing application sensitivity to OS interference using kernel-level noise injection. In Proceedings of SC Conference, pages 1--12. SC Conference, 2008. doi:10.1145/1413370.1413390. Google ScholarDigital Library
- S. R. Fulton, P. E. Ciesielski, and W. H. Schubert. Multigrid methods for elliptic problems: A review. Monthly Weather Review, 114(5):943--959, 1986.Google ScholarCross Ref
- P. Ghysels and W. Vanroose. Hiding global synchronization latency in the preconditioned conjugate gradient algorithm. Parallel Computing, 40(7):224--238, 2014. Google ScholarDigital Library
- T. Graham. The importance of eddy permitting model resolution for simulation of the heat budget of tropical instability waves. Ocean Modelling, 79:21--32, 2014.Google ScholarCross Ref
- M. Gutknecht and S. Röllin. The Chebyshev iteration revisited. Parallel Computing, 28(2):263--283, 2002. Google ScholarDigital Library
- M. Hoemmen. Communication-avoiding Krylov subpace methods. PhD thesis, University of California, Berkeley, 2010. Google ScholarDigital Library
- Y. Hu, X. Huang, X. Wang, H. Fu, S. Xu, H. Ruan, W. Xue, and G. Yang. A scalable barotropic mode solver for the parallel ocean program. In Euro-Par 2013 Parallel Processing, pages 739--750. Springer, 2013. Google ScholarDigital Library
- P. W. Jones, P. Worley, Y. Yoshida, and J. B. White III. Practical performance portability in the parallel ocean program (POP). Concurrency and Computation: Practice and Experience, 17:1317--1327, August 2005. Google ScholarDigital Library
- Y. Kanarska, A. Shchepetkin, and J. McWilliams. Algorithm for non-hydrostatic dynamics in the regional oceanic modeling system. Ocean Modelling, 18(3):143--174, 2007.Google ScholarCross Ref
- R. Loft, A. Andersen, F. Bryan, J. M. Dennis, T. Engel, P. Gillman, D. Hart, I. Elahi, S. Ghosh, R. Kelly, A. Kamrath, G. Pfister, M. Rempel, J. Small, W. Skamarock, M. Wiltberger, B. Shader, P. Chen, and B. Cash. Yellowstone: A dedicated resource for earth system science. In J. S. Vetter, editor, Contemporary High Performance Computing: From Petascale Toward Exascale, Volume Two, volume 2 of CRC Computational Science Series, page 262. Chapman and Hall/CRC, Boca Raton, 1 edition, 2015.Google Scholar
- Y. Matsumura and H. Hasumi. A non-hydrostatic ocean model with a scalable multigrid poisson solver. Ocean Modelling, 24(1):15--28, 2008.Google ScholarCross Ref
- J. L. McClean, D. C. Bader, F. O. Bryan, M. E. Maltrud, J. M. Dennis, A. A. Mirin, P. W. Jones, Y. Y. Kim, D. P. Ivanova, M. Vertenstein, et al. A prototype two-decade fully-coupled fine-resolution CCSM simulation. Ocean Modelling, 39(1):10--30, 2011.Google ScholarCross Ref
- P. D. Meyer, A. J. Valocchi, S. F. Ashby, and P. E. Saylor. A numerical investigation of the conjugate gradient method as applied to three-dimensional groundwater flow problems in randomly heterogeneous porous media. Water Resources Research, 25(6):1440--1446, 1989.Google ScholarCross Ref
- E. H. Müller and R. Scheichl. Massively parallel solvers for elliptic partial differential equations in numerical weather and climate prediction. Quarterly Journal of the Royal Meteorological Society, 140(685):2608--2624, 2014.Google ScholarCross Ref
- C. Paige. Accuracy and effectiveness of the lanczos algorithm for the symmetric eigenproblem. Linear Algebra and its Applications, 34(0):235--258, 1980.Google ScholarCross Ref
- G. Pini and G. Gambolati. Is a simple diagonal scaling the best preconditioner for conjugate gradients on supercomputers? Advances in Water Resources, 13(3):147--153, 1990.Google ScholarCross Ref
- R. S. Reddy and M. M. Kumar. Comparison of conjugate gradient methods and strongly implicit procedure for groundwater flow simulation. Journal of the Indian Institute of Science, 75(6):667, 2013.Google Scholar
- P. J. Roache. Elliptic marching methods and domain decomposition, volume 5. CRC press, 1995.Google Scholar
- J. Sheng, D. G. Wright, R. J. Greatbatch, and D. E. Dietrich. Candie: A new version of the diecast ocean circulation model. Journal of Atmospheric and Oceanic Technology, 15(6):1414--1432, 1998.Google ScholarCross Ref
- R. Smith, J. Dukowicz, and R. Malone. Parallel ocean general circulation modeling. Physica D: Nonlinear Phenomena, 60(1):38--61, 1992. Google ScholarDigital Library
- R. Smith, P. Jones, B. Briegleb, F. Bryan, G. Danabasoglu, J. Dennis, J. Dukowicz, C. E. B. Fox-Kemper, P. Gent, M. Hecht, et al. The parallel ocean program (POP) reference manual ocean component of the community climate system model (CCSM). 2010.Google Scholar
- T. Stocker, D. Qin, G. Plattner, M. Tignor, S. Allen, J. Boschung, A. Nauels, Y. Xia, B. Bex, and B. Midgley. IPCC, 2013: Climate change 2013: the physical science basis. contribution of working group I to the fifth assessment report of the Intergovernmental Panel on Climate Change. 2013.Google Scholar
- K. Stüben. A review of algebraic multigrid. Journal of Computational and Applied Mathematics, 128(1):281--309, 2001. Google ScholarDigital Library
- Y.-h. Tseng and M.-h. Chien. Parallel domain-decomposed Taiwan multi-scale community ocean model (pd-timcom). Computers & Fluids, 45(1):77--83, 2011.Google ScholarCross Ref
- Y.-h. Tseng and J. H. Ferziger. A ghost-cell immersed boundary method for flow in complex geometry. Journal of computational physics, 192(2):593--623, 2003. Google ScholarDigital Library
- D. Wang, A. Bhatele, and D. Ghosal. Performance variability due to job placement on edison. Poster presented at SC14, Nov 16--21, New Orleans.Google Scholar
- J. A. White and R. I. Borja. Block-preconditioned newton--krylov solvers for fully coupled flow and geomechanics. Computational Geosciences, 15(4):647--659, 2011.Google ScholarCross Ref
- P. H. Worley, A. A. Mirin, A. P. Craig, M. A. Taylor, J. M. Dennis, and M. Vertenstein. Performance of the community earth system model. In Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis, SC '11, pages 54:1--54:11, New York, NY, USA, 2011. ACM. Google ScholarDigital Library
- Yellowstone workload study, v4.1. https://www2.cisl.ucar.edu/NWSC-2, September 2014.Google Scholar
Index Terms
- Improving the scalability of the ocean barotropic solver in the community earth system model
Recommendations
Accelerating iterative linear solvers using multiple graphical processing units
In this paper, we develop, study and implement iterative linear solvers and preconditioners using multiple graphical processing units GPUs. Techniques for accelerating sparse matrix–vector SpMV multiplication, linear solvers and preconditioners are ...
Parallel implementation of efficient preconditioned linear solver for grid-based applications in chemical physics. I
Linear systems in chemical physics often involve matrices with a certain sparse block structure. These can often be solved very effectively using iterative methods (sequence of matrix-vector products) in conjunction with a block Jacobi preconditioner ...
A scalable barotropic mode solver for the parallel ocean program
Euro-Par'13: Proceedings of the 19th international conference on Parallel ProcessingThis paper represents a novel strategy to improve the scalability of the barotropic mode in the Parallel Ocean Program (POP), by theoretically analyzing the barotropic communications bottleneck. POP discretizes the elliptic equations of the barotropic ...
Comments