skip to main content
10.1145/2835857.2835859acmconferencesArticle/Chapter ViewAbstractPublication PagesscConference Proceedingsconference-collections
research-article

Performance and productivity of parallel python programming: a study with a CFD test case

Authors Info & Claims
Published:15 November 2015Publication History

ABSTRACT

The programming language Python is widely used to create rapidly compact software. However, compared to low-level programming languages like C or Fortran low performance is preventing its use for HPC applications. Efficient parallel programming of multi-core systems and graphic cards is generally a complex task. Python with add-ons might provide a simple approach to program those systems. This paper evaluates the performance of Python implementations with different libraries and compares it to implementations in C or Fortran. As a test case from the field of computational fluid dynamics (CFD) a part of a rotor simulation code was selected. Fortran versions of this code were available for use on single-core, multi-core and graphic-card systems. For all these computer systems, multiple compact versions of the code were implemented in Python with different libraries. For performance analysis of the rotor simulation kernel, a performance model was developed. This model was then employed to assess the performance reached with the different implementations. Performance tests showed that an implementation with Python syntax is six times slower than Fortran on single-core systems. The performance on multi-core systems and graphic cards is about a tenth of the Fortran implementations. A higher performance was achieved by a hybrid implementation in C and Python using Cython. The latter reached about half of the performance of the Fortran implementation.

References

  1. H. M. Atassi. The biot-savart law. https://www3.nd.edu/~atassi/Teaching/ame%2060639/Notes/biotsavart.pdf, 2015. Accessed: 4th September 2015.Google ScholarGoogle Scholar
  2. A. Basermann, M. Röhrig-Zöllner, and J. Hoffmann. Porting a parallel rotor wake simulation to gpgpu accelerators using openacc. http://www.t-systems-sfr.com/e/deu/abstract.2014_7.php, 2014. Accessed: 3rd September 2015.Google ScholarGoogle Scholar
  3. Blas --- basic linear algebra subprograms. http://www.netlib.org/blas/, 2015. Accessed: 4th September 2015.Google ScholarGoogle Scholar
  4. D. A. Boxwell, F. H. Schmitz, W. R. Splettstößer, and K. J. Schultz. Helicopter model rotor-blade vortex interaction impulsive noise: Scalability and parametric variations. Journal of the American Helicopter Society, 32(1):3--12, 1. Januar 1987.Google ScholarGoogle ScholarCross RefCross Ref
  5. Cuda toolkit documentation - multiprocessor level. http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#multiprocessor-level, 2015. Accessed: 3rd September 2015.Google ScholarGoogle Scholar
  6. The cython compiler for writing c extensions for the python language. https://pypi.python.org/pypi/Cython/, 2015. Accessed: 3rd September 2015.Google ScholarGoogle Scholar
  7. Using the cython compiler to write fast python code. http://www.behnel.de/cython200910/talk.html, 2015. Accessed: 3rd September 2015.Google ScholarGoogle Scholar
  8. J. Daily, P. Saddayappan, B. Palmer, S. K. Manojkumar Krishnan, A. Vishnu, D. Chavarría, and P. Nichols. High performance computing in python using numpy and the global arrays toolkit, 08 2011. Remarks by Chairman Alan Greenspan at the Annual Dinner and Francis Boyer Lecture of The American Enterprise Institute for Public Policy Research, Washington, D.C. {Accessed: 3rd September 2015}.Google ScholarGoogle Scholar
  9. Intel xeon processor e5645 specifications. http://ark.intel.com/de/products/48768/Intel-Xeon-Processor-E5645-12M-Cache-2_40-GHz-5_86-GTs-Intel-QPI?q=e5645, 2010. Accessed: 3rd September 2015.Google ScholarGoogle Scholar
  10. G. Hager and G. Wellein. Introduction to High Performance Computing for Scientists and Engineers. Chapman & Hall/CRC Computational Science. Taylor & Francis, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Intel math kernel library. https://software.intel.com/en-us/intel-mkl, 2015. Accessed: 28th July 2015.Google ScholarGoogle Scholar
  12. Intel xeon processor 5600 series. http://download.intel.com/support/processors/xeon/sb/xeon_5600.pdf, 2011. Accessed: 3rd September 2015.Google ScholarGoogle Scholar
  13. Likwidbench wiki. https://code.google.com/p/likwid/wiki/LikwidBench, 2015. Accessed: 3rd September 2015.Google ScholarGoogle Scholar
  14. Homepage of matlab. http://de.mathworks.com/products/matlab/, 2015. Accessed: 3rd September 2015.Google ScholarGoogle Scholar
  15. The message passing interface (mpi) standard. http://www.mcs.anl.gov/research/projects/mpi/, 2015. Accessed: 3rd September 2015.Google ScholarGoogle Scholar
  16. Numba --- mode of operation. http://on-demand.gputechconf.com/supercomputing/2013/presentation/SC3121-Programming-GPU-Python-Using-NumbaPro.pdf, 2015. Accessed: 3rd September 2015.Google ScholarGoogle Scholar
  17. Homepage of numba. http://numba.pydata.org/, 2015. Accessed: 3rd September 2015.Google ScholarGoogle Scholar
  18. Ways to parallelize - numba-users mailinglist. https://groups.google.com/a/continuum.io/forum/#!topic/numba-users/UN4sDSr8Iew, 2014. Accessed: 3rd September 2015.Google ScholarGoogle Scholar
  19. Numba mailinglist. https://groups.google.com/a/continuum.io/forum/#!topic/numba-users/iOnkSJTcF0A, 2014. Accessed: 3rd September 2015.Google ScholarGoogle Scholar
  20. Numbapro --- continuum analytics. http://docs.continuum.io/numbapro/index, 2015. Accessed: 3rd September 2015.Google ScholarGoogle Scholar
  21. Homepage of numpy. http://www.numpy.org/, 2015. Accessed: 3rd September 2015.Google ScholarGoogle Scholar
  22. Nvidia cuda. https://developer.nvidia.com/about-cuda, 2015. Accessed: 3rd September 2015.Google ScholarGoogle Scholar
  23. Nvidia tesla c2075 companion processor. http://www.nvidia.de/content/PDF/data-sheet/NV_DS_Tesla_C2075_Sept11_US_HR.pdf, 2011. Accessed: 3rd September 2015.Google ScholarGoogle Scholar
  24. Homepage der openacc api. http://www.openacc-standard.org/, 2015. Accessed: 3rd September 2015.Google ScholarGoogle Scholar
  25. Opencl - the open standard for parallel programming of heterogeneous systems. https://www.khronos.org/opencl/, 2015. Accessed: 3rd September 2015.Google ScholarGoogle Scholar
  26. Openmp specification for parallel programming. http://openmp.org/wp/, 2015. Accessed: 3rd September 2015.Google ScholarGoogle Scholar
  27. W. Splettstößer, R. Kube, U. Seelhorst, W. Wagner, A. Boutier, F. Micheli, and K. Pengel. Higher harmonic control aeroacustic rotor test (hart) - test documentation and representative results. http://elib.dlr.de/36398/, 1996. Accessed: 3rd September 2015.Google ScholarGoogle Scholar
  28. Top500 list - june 2015. http://www.top500.org/list/2015/06/, 2014. Accessed: 3rd September 2015.Google ScholarGoogle Scholar
  29. The abstraction-optimization tradeoff. http://blog.vivekhaldar.com/post/12785508353/the-abstraction-optimization-tradeoff, 2015. Accessed: 3rd September 2015.Google ScholarGoogle Scholar
  30. S. W. Williams, A. Waterman, and D. A. Patterson. Roofline: An insightful visual performance model for floating-point programs and multicore architectures. UCB/EECS 2008-134, Univ. of California, Berkeley, CA, oct 2008.Google ScholarGoogle Scholar

Index Terms

  1. Performance and productivity of parallel python programming: a study with a CFD test case

              Recommendations

              Comments

              Login options

              Check if you have access through your login credentials or your institution to get full access on this article.

              Sign in
              • Published in

                cover image ACM Conferences
                PyHPC '15: Proceedings of the 5th Workshop on Python for High-Performance and Scientific Computing
                November 2015
                59 pages
                ISBN:9781450340106
                DOI:10.1145/2835857

                Copyright © 2015 ACM

                Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

                Publisher

                Association for Computing Machinery

                New York, NY, United States

                Publication History

                • Published: 15 November 2015

                Permissions

                Request permissions about this article.

                Request Permissions

                Check for updates

                Qualifiers

                • research-article

                Acceptance Rates

                PyHPC '15 Paper Acceptance Rate7of7submissions,100%Overall Acceptance Rate7of7submissions,100%

              PDF Format

              View or Download as a PDF file.

              PDF

              eReader

              View online with eReader.

              eReader