ABSTRACT
The biomolecular simulation program NAMD is used heavily at many HPC centers. Supporting NAMD users requires knowledge of the Charm++ parallel runtime system on which NAMD is built. Introduced in 1993, Charm++ supports message-driven, task-based, and other programming models and has demonstrated its portability across generations of architectures, interconnects, and operating systems. While Charm++ can use MPI as a portable communication layer, specialized high-performance layers are preferred for Cray, IBM, and InfiniBand networks and a new OFI layer supports Omni-Path. NAMD binaries using some specialized layers can be launched directly with mpiexec or its equivalent, or mpiexec can be called by the charmrun program to leverage system job-launch mechanisms. Charm++ supports multi-threaded parallelism within each process, with a single thread dedicated to communication and the rest for computation. The optimal balance between thread and process parallelism depends on the size of the simulation, features used, memory limitations, nodes count, and the core count and NUMA structure of each node. It is also important to enable the Charm++ built-in CPU affinity settings to bind worker and communication threads appropriately to processor cores. Appropriate execution configuration and CPU affinity settings are particularly non-intuitive on Intel KNL processors due to their high core counts and flat NUMA hierarchy. Rules and heuristics for default settings can provide good default performance in most cases and dramatically reduce the search space when optimizing for a specific simulation on particular machine. Upcoming Charm++ and NAMD releases will simplify and automate launch configuration and affinity settings.
- Bilge Acun, Abhishek Gupta, Nikhil Jain, Akhil Langer, Harshitha Menon, Eric Mikida, Xiang Ni, Michael Robson, Yanhua Sun, Ehsan Totoni, Lukasz Wesolowski, and Laxmikant Kale. 2014. Parallel Programming with Migratable Objects: Charm++ in Practice (SC). Google ScholarDigital Library
- Wei Jiang, James Phillips, Lei Huang, Mikolai Fajer, Yilin Meng, James Gumbart, Yun Luo, Klaus Schulten, and Benoit Roux. 2014. Generalized Scalable Multiple Copy Algorithms for Molecular Dynamics Simulations in NAMD. Comput. Phys. Commun. 185 (2014), 908--916.Google ScholarCross Ref
- Laxmikant Kale, Anshu Arya, Nikhil Jain, Akhil Langer, Jonathan Lifflander, Harshitha Menon, Xiang Ni, Yanhua Sun, Ehsan Totoni, Ramprasad Venkataraman, and Lukasz Wesolowski. 2012. Migratable Objects + Active Messages + Adaptive Runtime = Productivity + Performance A Submission to 2012 HPC Class II Challenge. Technical Report 12--47. Parallel Programming Laboratory.Google Scholar
- Chao Mei, Yanhua Sun, Gengbin Zheng, Eric J. Bohm, Laxmikant V. Kalé, James C. Phillips, and Chris Harrison. 2011. Enabling and Scaling Biomolecular Simulations of 100 Million Atoms on Petascale Machines with a Multicore-optimized Message-driven Runtime. In Proceedings of the 2011 ACM/IEEE conference on Supercomputing. Seattle, WA, 61:1--61:11. Google ScholarDigital Library
- Juan R. Perilla, Jodi A. Hadden, Boon Chong Goh, Christopher G. Mayne, and Klaus Schulten. 2016. All-atom molecular dynamics of virus capsids as drug targets. J. Phys. Chem. Lett. 7 (2016), 1836--1844.Google ScholarCross Ref
- James C.Phillips, Rosemary Braun, Wei Wang, James Gumbart, Emad Tajkhorshid, Elizabeth Villa, Christophe Chipot, Robert D. Skeel, Laxmikant Kale, and Klaus Schulten. 2005. Scalable Molecular Dynamics with NAMD. J. Comp. Chem. 26 (2005), 1781--1802.Google ScholarCross Ref
- James C. Phillips, John E. Stone, and Klaus Schulten. 2008. Adapting a Message-Driven Parallel Application to GPU-Accelerated Clusters. In SC '08: Proceedings of the 2008 ACM/IEEE Conference on Supercomputing. IEEE Press, Piscataway, NJ, USA. (9 pages). Google ScholarDigital Library
- James C. Phillips, John E. Stone, Kirby L. Vandivort, Timothy G. Armstrong, Justin M. Wozniak, Michael Wilde, and Klaus Schulten. 2014. Petascale Tcl with NAMD, VMD, and Swift/T. In SC'14 workshop on High Performance Technical Computing in Dynamic Languages (SC '14). IEEE Press, 6--17. Google ScholarDigital Library
- James C. Phillips, Yanhua Sun, Nikhil Jain, Eric J. Bohm, and Laximant V. Kalé. 2014. Mapping to Irregular Torus Topologies and Other Techniques for Petascale Biomolecular Simulation. In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (SC '14). IEEE Press, 81--91. Google ScholarDigital Library
- John E. Stone, Antti-Pekka Hynninen, James C. Phillips, and Klaus Schulten. 2016. Early Experiences Porting the NAMD and VMD Molecular Simulation and Analysis Software to GPU-Accelerated OpenPOWER Platforms. International Workshop on OpenPOWER for HPC (IWOPH'16) (2016), 188--206.Google ScholarCross Ref
Index Terms
- What You Should Know About NAMD and Charm++ But Were Hoping to Ignore
Recommendations
Integrating OpenMP into the Charm++ Programming Model
ESPM2'17: Proceedings of the Third International Workshop on Extreme Scale Programming Models and MiddlewareThe recent trend of rapid increase in the number of cores per chip has resulted in vast amounts of on-node parallelism. These high core counts result in hardware variability that introduces imbalance. Applications are also becoming more complex ...
G-Charm: an adaptive runtime system for message-driven parallel applications on hybrid systems
ICS '13: Proceedings of the 27th international ACM conference on International conference on supercomputingThe effective use of GPUs for accelerating applications depends on a number of factors including effective asynchronous use of heterogeneous resources, reducing memory transfer between CPU and GPU, increasing occupancy of GPU kernels, overlapping data ...
Charm++ and MPI: Combining the Best of Both Worlds
IPDPS '15: Proceedings of the 2015 IEEE International Parallel and Distributed Processing SymposiumCharm++ and MPI embody two distinct perspectives for writing parallel programs. While MPI provides a process-centric, user-driven model for developing parallel codes, Charm++ supports work-centric, system-driven parallel programming. One of them might ...
Comments