ABSTRACT
Discrete event simulations (DES) are central to exploration of "what-if" scenarios in many domains including networks, storage devices, and chip design. Accurate simulation of dynamically varying behavior of large components in these domains requires the DES engines to be scalable and adaptive in order to complete simulations in a reasonable time. This paper takes a step towards development of such a simulation engine by redesigning ROSS, a parallel DES engine in MPI, in Charm++, a parallel programming framework based on the concept of message-driven migratable objects managed by an adaptive runtime system. In this paper, we first show that the programming model of Charm++ is highly suitable for implementing a PDES engine such as ROSS. Next, the design and implementation of the Charm++ version of ROSS is described and its benefits are discussed. Finally, we demonstrate the performance benefits of the Charm++ version of ROSS over its MPI counterpart on IBM's Blue Gene/Q supercomputers. We obtain up to 40% higher event rate for the PHOLD benchmark on two million processes, and improve the strong-scaling of the dragonfly network model to 524, 288 processes with up to 5x speed up at lower process counts.
- The charm++ parallel programming system manual. http://charm.cs.illinois.edu/manuals/html/charm++/manual.html, visited 2016--3--20.Google Scholar
- MPI: A Message Passing Interface Standard. In MPI Forum. http://www.mpi-forum.org/, visited 2016-03--20.Google Scholar
- Ross source code on github. https://github.com/carothersc/ROSS, visited 2016-03--20.Google Scholar
- B. Acun, A. Gupta, N. Jain, A. Langer, H. Menon, E. Mikida, X. Ni, M. Robson, Y. Sun, E. Totoni, L. Wesolowski, and L. Kale. Parallel Programming with Migratable Objects: Charm++ in Practice. SC, 2014. Google ScholarDigital Library
- P. D. Barnes, Jr., C. D. Carothers, D. R. Jefferson, and J. M. LaPre. Warp speed: executing time warp on 1,966,080 cores. In Proceedings of the 2013 ACM SIGSIM conference on Principles of advanced discrete simulation, SIGSIM-PADS'13, pages 327--336, New York, NY, USA, 2013. ACM. Google ScholarDigital Library
- D. W. Bauer Jr., C. D. Carothers, and A. Holder. Scalable time warp on blue gene supercomputers. In Proceedings of the 2009 ACM/IEEE/SCS 23rd Workshop on Principles of Advanced and Distributed Simulation, pages 35--44, Washington, DC, USA, 2009. IEEE Computer Society. Google ScholarDigital Library
- C. D. Carothers, D. Bauer, and S. Pearce. ROSS: A high-performance, low-memory, modular Time Warp system. Journal of Parallel and Distributed Computing, 62(11):1648--1669, 2002.Google ScholarDigital Library
- C. D. Carothers and K. S. Perumalla. On deciding between conservative and optimistic approaches on massively parallel platforms. In Winter Simulation Conference'10, pages 678--687, 2010. Google ScholarDigital Library
- C. D. Carothers, K. S. Perumalla, and R. M. Fujimoto. Efficient optimistic parallel simulations using reverse computation. ACM Trans. Model. Comput. Simul., 9(3):224--253, July 1999. Google ScholarDigital Library
- M. Choe and C. Tropper. On learning algorithms and balancing loads in time warp. In Workshop on Parallel and Distributed Simulation, pages 101--108, 1999. Google ScholarDigital Library
- I.-H. Chung, R. E. Walkup, H.-F. Wen, and H. Yu. MPI tools and performance studies--MPI performance analysis tools on Blue Gene/L. In SC'06: Proceedings of the 2006 ACM/IEEE conference on Supercomputing, page 123, New York, NY, USA, 2006. ACM Press. Google ScholarDigital Library
- E. Deelman and B. K. Szymanski. Dynamic load balancing in parallel discrete event simulation for spatially explicit problems. In Workshop on Parallel and Distributed Simulation, pages 46--53, 1998. Google ScholarDigital Library
- R. Fujimoto. Parallel Discrete Event Simulation. Comm. of the ACM, 33(10):30--53, 1990. Google ScholarDigital Library
- F. Gygi, E. W. Draeger, M. Schulz, B. R. de Supinski, J. A. Gunnels, V. Austel, J. C. Sexton, F. Franchetti, S. Kral, C. W. Ueberhuber, and J. Lorenz. Large-scale electronic structure calculations of high-z metals on the bluegene/l platform. In Proceedings of the 2006 ACM/IEEE Conference on Supercomputing, SC'06, New York, NY, USA, 2006. ACM. Google ScholarDigital Library
- N. Jain, A. Bhatele, J.-S. Yeom, M. F. Adams, F. Miniati, C. Mei, and L. V. Kale. Charm++ & MPI: Combining the best of both worlds. In Proceedings of the IEEE International Parallel & Distributed Processing Symposium (to appear), IPDPS'15. IEEE Computer Society, May 2015. LLNL-CONF-663041. Google ScholarDigital Library
- N. Jain, E. Bohm, E. Mikida, S. Mandal, M. Kim, P. Jindal, Q. Li, S. Ismail-Beigi, G. Martyna, and L. Kale. Openatom: Scalable ab-initio molecular dynamics with diverse capabilities. In International Supercomputing Conference, ISC HPC'16 (to appear), 2016.Google ScholarCross Ref
- D. Jefferson and H. Sowizral. Fast Concurrent Simulation Using the Time Warp Mechanism. In Proceedings of the Conference on Distributed Simulation, pages 63--69, July 1985.Google Scholar
- L. V. Kale, G. Zheng, C. W. Lee, and S. Kumar. Scaling applications to massively parallel machines using projections performance analysis tool. In Future Generation Computer Systems Special Issue on: Large-Scale System Performance Modeling and Analysis, volume 22, pages 347--358, February 2006. Google ScholarDigital Library
- N. Liu, C. Carothers, J. Cope, P. Carns, R. Ross, A. Crume, and C. Maltzahn. Modeling a leadership-scale storage system. In Proceedings of the 9th international conference on Parallel Processing and Applied Mathematics - Volume Part I, PPAM'11, pages 10--19, Berlin, Heidelberg, 2012. Springer-Verlag. Google ScholarDigital Library
- N. Liu, J. Cope, P. Carns, C. Carothers, R. Ross, G. Grider, A. Crume, and C. Maltzahn. On the role of burst buffers in leadership-class storage systems. In Proceedings of the 2012 IEEE Conference on Massive Data Storage, Pacific Grove, CA, Apr. 2012.Google ScholarCross Ref
- F. Mattern. Efficient algorithms for distributed snapshopts and global virtual time approximation. Journal of Parallel and Distributed Computing, 18:423--434, 1993. Google ScholarDigital Library
- H. Menon, L. Wesolowski, G. Zheng, P. Jetley, L. Kale, T. Quinn, and F. Governato. Adaptive techniques for clustered n-body cosmological simulations. Computational Astrophysics and Cosmology, 2(1):1--16, 2015.Google ScholarCross Ref
- R. B. R. Misbah Mubarak, Christopher D. Carothers and P. Carns. Modeling a million-node dragonfly network using massively parallel discrete-event simulation. SCC, SC Companion, 2012. Google ScholarDigital Library
- R. B. R. Misbah Mubarak, Christopher D. Carothers and P. Carns. A case study in using massively parallel simulation for extreme-scale torus network codesign. In Proceedings of the 2nd ACM SIGSIM PADS, pages 27--38. ACM, 2014. Google ScholarDigital Library
- M. Mubarak, C. D. Carothers, R. Ross, and P. Carns. Modeling a million-node dragonfly network using massively parallel discrete-event simulation. In High Performance Computing, Networking, Storage and Analysis (SCC), 2012 SC Companion,, pages 366--376. IEEE, 2012. Google ScholarDigital Library
- D. M. Nicol. The cost of conservative synchronization in parallel discrete event simulations. J. ACM. Google ScholarDigital Library
- J. Phillips, G. Zheng, and L. V. Kalé. Namd: Biomolecular simulation on thousands of processors. In Workshop: Scaling to New Heights, Pittsburgh, PA, May 2002.Google ScholarCross Ref
- A. B. Sinha, L. V. Kale, and B. Ramkumar. A dynamic and adaptive quiescence detection algorithm. Technical Report 93--11, Parallel Programming Laboratory, Department of Computer Science, University of Illinois, Urbana-Champaign, 1993.Google Scholar
- L. Wesolowski, R. Venkataraman, A. Gupta, J.-S. Yeom, K. Bisset, Y. Sun, P. Jetley, T. R. Quinn, and L. V. Kale. TRAM: Optimizing Fine-grained Communication with Topological Routing and Aggregation of Messages. In Proceedings of the International Conference on Parallel Processing, ICPP'14, Minneapolis, MN, September 2014.Google ScholarDigital Library
- T. L. Wilmarth. POSE: Scalable General-purpose Parallel Discrete Event Simulation. PhD thesis, Department of Computer Science, University of Illinois at Urbana-Champaign, 2005. Google ScholarDigital Library
- J.-S. Yeom, A. Bhatele, K. R. Bisset, E. Bohm, A. Gupta, L. V. Kale, M. Marathe, D. S. Nikolopoulos, M. Schulz, and L. Wesolowski. Overcoming the scalability challenges of epidemic simulations on blue waters. In Proceedings of the IEEE International Parallel & Distributed Processing Symposium, IPDPS'14. IEEE Computer Society, May 2014. Google ScholarDigital Library
Index Terms
- Towards PDES in a Message-Driven Paradigm: A Preliminary Case Study Using Charm++
Recommendations
Adaptive Methods for Irregular Parallel Discrete Event Simulation Workloads
SIGSIM-PADS '18: Proceedings of the 2018 ACM SIGSIM Conference on Principles of Advanced Discrete SimulationParallel Discrete Event Simulations (PDES) running at large scales involve the coordination of billions of very fine grain events distributed across a large number of processes. At such large scales optimistic synchronization protocols, such as TimeWarp,...
Performance and modularity benefits of message-driven execution
Processor idling due to communication delays and load imbalances are among the major factors that affect the performance of parallel programs. Need to optimize performance often forces programmers to sacrifice modularity. This paper focuses on the ...
Partitioning parallel discrete event simulation
CompSysTech '08: Proceedings of the 9th International Conference on Computer Systems and Technologies and Workshop for PhD Students in ComputingParallel Discrete Event Simulation (PDES) allows to speed up the execution of a simulation by distributing the simulation's workload between multiple processors. The mapping of the simulation entities over the processors is very important with regard to ...
Comments