ABSTRACT
The programming language Python is widely used to create rapidly compact software. However, compared to low-level programming languages like C or Fortran low performance is preventing its use for HPC applications. Efficient parallel programming of multi-core systems and graphic cards is generally a complex task. Python with add-ons might provide a simple approach to program those systems. This paper evaluates the performance of Python implementations with different libraries and compares it to implementations in C or Fortran. As a test case from the field of computational fluid dynamics (CFD) a part of a rotor simulation code was selected. Fortran versions of this code were available for use on single-core, multi-core and graphic-card systems. For all these computer systems, multiple compact versions of the code were implemented in Python with different libraries. For performance analysis of the rotor simulation kernel, a performance model was developed. This model was then employed to assess the performance reached with the different implementations. Performance tests showed that an implementation with Python syntax is six times slower than Fortran on single-core systems. The performance on multi-core systems and graphic cards is about a tenth of the Fortran implementations. A higher performance was achieved by a hybrid implementation in C and Python using Cython. The latter reached about half of the performance of the Fortran implementation.
- H. M. Atassi. The biot-savart law. https://www3.nd.edu/~atassi/Teaching/ame%2060639/Notes/biotsavart.pdf, 2015. Accessed: 4th September 2015.Google Scholar
- A. Basermann, M. Röhrig-Zöllner, and J. Hoffmann. Porting a parallel rotor wake simulation to gpgpu accelerators using openacc. http://www.t-systems-sfr.com/e/deu/abstract.2014_7.php, 2014. Accessed: 3rd September 2015.Google Scholar
- Blas --- basic linear algebra subprograms. http://www.netlib.org/blas/, 2015. Accessed: 4th September 2015.Google Scholar
- D. A. Boxwell, F. H. Schmitz, W. R. Splettstößer, and K. J. Schultz. Helicopter model rotor-blade vortex interaction impulsive noise: Scalability and parametric variations. Journal of the American Helicopter Society, 32(1):3--12, 1. Januar 1987.Google ScholarCross Ref
- Cuda toolkit documentation - multiprocessor level. http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#multiprocessor-level, 2015. Accessed: 3rd September 2015.Google Scholar
- The cython compiler for writing c extensions for the python language. https://pypi.python.org/pypi/Cython/, 2015. Accessed: 3rd September 2015.Google Scholar
- Using the cython compiler to write fast python code. http://www.behnel.de/cython200910/talk.html, 2015. Accessed: 3rd September 2015.Google Scholar
- J. Daily, P. Saddayappan, B. Palmer, S. K. Manojkumar Krishnan, A. Vishnu, D. Chavarría, and P. Nichols. High performance computing in python using numpy and the global arrays toolkit, 08 2011. Remarks by Chairman Alan Greenspan at the Annual Dinner and Francis Boyer Lecture of The American Enterprise Institute for Public Policy Research, Washington, D.C. {Accessed: 3rd September 2015}.Google Scholar
- Intel xeon processor e5645 specifications. http://ark.intel.com/de/products/48768/Intel-Xeon-Processor-E5645-12M-Cache-2_40-GHz-5_86-GTs-Intel-QPI?q=e5645, 2010. Accessed: 3rd September 2015.Google Scholar
- G. Hager and G. Wellein. Introduction to High Performance Computing for Scientists and Engineers. Chapman & Hall/CRC Computational Science. Taylor & Francis, 2010. Google ScholarDigital Library
- Intel math kernel library. https://software.intel.com/en-us/intel-mkl, 2015. Accessed: 28th July 2015.Google Scholar
- Intel xeon processor 5600 series. http://download.intel.com/support/processors/xeon/sb/xeon_5600.pdf, 2011. Accessed: 3rd September 2015.Google Scholar
- Likwidbench wiki. https://code.google.com/p/likwid/wiki/LikwidBench, 2015. Accessed: 3rd September 2015.Google Scholar
- Homepage of matlab. http://de.mathworks.com/products/matlab/, 2015. Accessed: 3rd September 2015.Google Scholar
- The message passing interface (mpi) standard. http://www.mcs.anl.gov/research/projects/mpi/, 2015. Accessed: 3rd September 2015.Google Scholar
- Numba --- mode of operation. http://on-demand.gputechconf.com/supercomputing/2013/presentation/SC3121-Programming-GPU-Python-Using-NumbaPro.pdf, 2015. Accessed: 3rd September 2015.Google Scholar
- Homepage of numba. http://numba.pydata.org/, 2015. Accessed: 3rd September 2015.Google Scholar
- Ways to parallelize - numba-users mailinglist. https://groups.google.com/a/continuum.io/forum/#!topic/numba-users/UN4sDSr8Iew, 2014. Accessed: 3rd September 2015.Google Scholar
- Numba mailinglist. https://groups.google.com/a/continuum.io/forum/#!topic/numba-users/iOnkSJTcF0A, 2014. Accessed: 3rd September 2015.Google Scholar
- Numbapro --- continuum analytics. http://docs.continuum.io/numbapro/index, 2015. Accessed: 3rd September 2015.Google Scholar
- Homepage of numpy. http://www.numpy.org/, 2015. Accessed: 3rd September 2015.Google Scholar
- Nvidia cuda. https://developer.nvidia.com/about-cuda, 2015. Accessed: 3rd September 2015.Google Scholar
- Nvidia tesla c2075 companion processor. http://www.nvidia.de/content/PDF/data-sheet/NV_DS_Tesla_C2075_Sept11_US_HR.pdf, 2011. Accessed: 3rd September 2015.Google Scholar
- Homepage der openacc api. http://www.openacc-standard.org/, 2015. Accessed: 3rd September 2015.Google Scholar
- Opencl - the open standard for parallel programming of heterogeneous systems. https://www.khronos.org/opencl/, 2015. Accessed: 3rd September 2015.Google Scholar
- Openmp specification for parallel programming. http://openmp.org/wp/, 2015. Accessed: 3rd September 2015.Google Scholar
- W. Splettstößer, R. Kube, U. Seelhorst, W. Wagner, A. Boutier, F. Micheli, and K. Pengel. Higher harmonic control aeroacustic rotor test (hart) - test documentation and representative results. http://elib.dlr.de/36398/, 1996. Accessed: 3rd September 2015.Google Scholar
- Top500 list - june 2015. http://www.top500.org/list/2015/06/, 2014. Accessed: 3rd September 2015.Google Scholar
- The abstraction-optimization tradeoff. http://blog.vivekhaldar.com/post/12785508353/the-abstraction-optimization-tradeoff, 2015. Accessed: 3rd September 2015.Google Scholar
- S. W. Williams, A. Waterman, and D. A. Patterson. Roofline: An insightful visual performance model for floating-point programs and multicore architectures. UCB/EECS 2008-134, Univ. of California, Berkeley, CA, oct 2008.Google Scholar
Index Terms
- Performance and productivity of parallel python programming: a study with a CFD test case
Comments