Towards PDES in a Message-Driven Paradigm: A Preliminary Case Study Using Charm++

Authors:
Eric Mikida

University of Illinois at Urbana-Champaign, Champaign, IL, USA

University of Illinois at Urbana-Champaign, Champaign, IL, USA
View Profile

,
Nikhil Jain

University of Illinois at Urbana-Champaign, Champaign, IL, USA

University of Illinois at Urbana-Champaign, Champaign, IL, USA
View Profile

,
Laxmikant Kale

University of Illinois at Urbana-Champaign, Champaign, IL, USA

University of Illinois at Urbana-Champaign, Champaign, IL, USA
View Profile

,
Elsa Gonsiorowski

Rensselaer Polytechnic Institute, Troy, NY, USA

Rensselaer Polytechnic Institute, Troy, NY, USA
View Profile

,
Christopher D. Carothers

Rensselaer Polytechnic Institute, Troy, NY, USA

Rensselaer Polytechnic Institute, Troy, NY, USA
View Profile

,
Peter D. Barnes

Lawrence Livermore National Laboratory, Livermore, CA, USA

Lawrence Livermore National Laboratory, Livermore, CA, USA
View Profile

,
David Jefferson

Lawrence Livermore National Laboratory, Livermore, CA, USA

Lawrence Livermore National Laboratory, Livermore, CA, USA
View Profile

SIGSIM-PADS '16: Proceedings of the 2016 ACM SIGSIM Conference on Principles of Advanced Discrete SimulationMay 2016Pages 99–110https://doi.org/10.1145/2901378.2901393

Published:15 May 2016Publication History

SIGSIM-PADS '16: Proceedings of the 2016 ACM SIGSIM Conference on Principles of Advanced Discrete Simulation

Pages 99–110

ABSTRACT

Discrete event simulations (DES) are central to exploration of "what-if" scenarios in many domains including networks, storage devices, and chip design. Accurate simulation of dynamically varying behavior of large components in these domains requires the DES engines to be scalable and adaptive in order to complete simulations in a reasonable time. This paper takes a step towards development of such a simulation engine by redesigning ROSS, a parallel DES engine in MPI, in Charm++, a parallel programming framework based on the concept of message-driven migratable objects managed by an adaptive runtime system. In this paper, we first show that the programming model of Charm++ is highly suitable for implementing a PDES engine such as ROSS. Next, the design and implementation of the Charm++ version of ROSS is described and its benefits are discussed. Finally, we demonstrate the performance benefits of the Charm++ version of ROSS over its MPI counterpart on IBM's Blue Gene/Q supercomputers. We obtain up to 40% higher event rate for the PHOLD benchmark on two million processes, and improve the strong-scaling of the dragonfly network model to 524, 288 processes with up to 5x speed up at lower process counts.

References

The charm++ parallel programming system manual. http://charm.cs.illinois.edu/manuals/html/charm++/manual.html, visited 2016--3--20.Google Scholar
MPI: A Message Passing Interface Standard. In MPI Forum. http://www.mpi-forum.org/, visited 2016-03--20.Google Scholar
Ross source code on github. https://github.com/carothersc/ROSS, visited 2016-03--20.Google Scholar
B. Acun, A. Gupta, N. Jain, A. Langer, H. Menon, E. Mikida, X. Ni, M. Robson, Y. Sun, E. Totoni, L. Wesolowski, and L. Kale. Parallel Programming with Migratable Objects: Charm++ in Practice. SC, 2014. Google ScholarDigital Library
P. D. Barnes, Jr., C. D. Carothers, D. R. Jefferson, and J. M. LaPre. Warp speed: executing time warp on 1,966,080 cores. In Proceedings of the 2013 ACM SIGSIM conference on Principles of advanced discrete simulation, SIGSIM-PADS'13, pages 327--336, New York, NY, USA, 2013. ACM. Google ScholarDigital Library
D. W. Bauer Jr., C. D. Carothers, and A. Holder. Scalable time warp on blue gene supercomputers. In Proceedings of the 2009 ACM/IEEE/SCS 23rd Workshop on Principles of Advanced and Distributed Simulation, pages 35--44, Washington, DC, USA, 2009. IEEE Computer Society. Google ScholarDigital Library
C. D. Carothers, D. Bauer, and S. Pearce. ROSS: A high-performance, low-memory, modular Time Warp system. Journal of Parallel and Distributed Computing, 62(11):1648--1669, 2002.Google ScholarDigital Library
C. D. Carothers and K. S. Perumalla. On deciding between conservative and optimistic approaches on massively parallel platforms. In Winter Simulation Conference'10, pages 678--687, 2010. Google ScholarDigital Library
C. D. Carothers, K. S. Perumalla, and R. M. Fujimoto. Efficient optimistic parallel simulations using reverse computation. ACM Trans. Model. Comput. Simul., 9(3):224--253, July 1999. Google ScholarDigital Library
M. Choe and C. Tropper. On learning algorithms and balancing loads in time warp. In Workshop on Parallel and Distributed Simulation, pages 101--108, 1999. Google ScholarDigital Library
I.-H. Chung, R. E. Walkup, H.-F. Wen, and H. Yu. MPI tools and performance studies--MPI performance analysis tools on Blue Gene/L. In SC'06: Proceedings of the 2006 ACM/IEEE conference on Supercomputing, page 123, New York, NY, USA, 2006. ACM Press. Google ScholarDigital Library
E. Deelman and B. K. Szymanski. Dynamic load balancing in parallel discrete event simulation for spatially explicit problems. In Workshop on Parallel and Distributed Simulation, pages 46--53, 1998. Google ScholarDigital Library
R. Fujimoto. Parallel Discrete Event Simulation. Comm. of the ACM, 33(10):30--53, 1990. Google ScholarDigital Library
F. Gygi, E. W. Draeger, M. Schulz, B. R. de Supinski, J. A. Gunnels, V. Austel, J. C. Sexton, F. Franchetti, S. Kral, C. W. Ueberhuber, and J. Lorenz. Large-scale electronic structure calculations of high-z metals on the bluegene/l platform. In Proceedings of the 2006 ACM/IEEE Conference on Supercomputing, SC'06, New York, NY, USA, 2006. ACM. Google ScholarDigital Library
N. Jain, A. Bhatele, J.-S. Yeom, M. F. Adams, F. Miniati, C. Mei, and L. V. Kale. Charm++ & MPI: Combining the best of both worlds. In Proceedings of the IEEE International Parallel & Distributed Processing Symposium (to appear), IPDPS'15. IEEE Computer Society, May 2015. LLNL-CONF-663041. Google ScholarDigital Library
N. Jain, E. Bohm, E. Mikida, S. Mandal, M. Kim, P. Jindal, Q. Li, S. Ismail-Beigi, G. Martyna, and L. Kale. Openatom: Scalable ab-initio molecular dynamics with diverse capabilities. In International Supercomputing Conference, ISC HPC'16 (to appear), 2016.Google ScholarCross Ref
D. Jefferson and H. Sowizral. Fast Concurrent Simulation Using the Time Warp Mechanism. In Proceedings of the Conference on Distributed Simulation, pages 63--69, July 1985.Google Scholar
L. V. Kale, G. Zheng, C. W. Lee, and S. Kumar. Scaling applications to massively parallel machines using projections performance analysis tool. In Future Generation Computer Systems Special Issue on: Large-Scale System Performance Modeling and Analysis, volume 22, pages 347--358, February 2006. Google ScholarDigital Library
N. Liu, C. Carothers, J. Cope, P. Carns, R. Ross, A. Crume, and C. Maltzahn. Modeling a leadership-scale storage system. In Proceedings of the 9th international conference on Parallel Processing and Applied Mathematics - Volume Part I, PPAM'11, pages 10--19, Berlin, Heidelberg, 2012. Springer-Verlag. Google ScholarDigital Library
N. Liu, J. Cope, P. Carns, C. Carothers, R. Ross, G. Grider, A. Crume, and C. Maltzahn. On the role of burst buffers in leadership-class storage systems. In Proceedings of the 2012 IEEE Conference on Massive Data Storage, Pacific Grove, CA, Apr. 2012.Google ScholarCross Ref
F. Mattern. Efficient algorithms for distributed snapshopts and global virtual time approximation. Journal of Parallel and Distributed Computing, 18:423--434, 1993. Google ScholarDigital Library
H. Menon, L. Wesolowski, G. Zheng, P. Jetley, L. Kale, T. Quinn, and F. Governato. Adaptive techniques for clustered n-body cosmological simulations. Computational Astrophysics and Cosmology, 2(1):1--16, 2015.Google ScholarCross Ref
R. B. R. Misbah Mubarak, Christopher D. Carothers and P. Carns. Modeling a million-node dragonfly network using massively parallel discrete-event simulation. SCC, SC Companion, 2012. Google ScholarDigital Library
R. B. R. Misbah Mubarak, Christopher D. Carothers and P. Carns. A case study in using massively parallel simulation for extreme-scale torus network codesign. In Proceedings of the 2nd ACM SIGSIM PADS, pages 27--38. ACM, 2014. Google ScholarDigital Library
M. Mubarak, C. D. Carothers, R. Ross, and P. Carns. Modeling a million-node dragonfly network using massively parallel discrete-event simulation. In High Performance Computing, Networking, Storage and Analysis (SCC), 2012 SC Companion,, pages 366--376. IEEE, 2012. Google ScholarDigital Library
D. M. Nicol. The cost of conservative synchronization in parallel discrete event simulations. J. ACM. Google ScholarDigital Library
J. Phillips, G. Zheng, and L. V. Kalé. Namd: Biomolecular simulation on thousands of processors. In Workshop: Scaling to New Heights, Pittsburgh, PA, May 2002.Google ScholarCross Ref
A. B. Sinha, L. V. Kale, and B. Ramkumar. A dynamic and adaptive quiescence detection algorithm. Technical Report 93--11, Parallel Programming Laboratory, Department of Computer Science, University of Illinois, Urbana-Champaign, 1993.Google Scholar
L. Wesolowski, R. Venkataraman, A. Gupta, J.-S. Yeom, K. Bisset, Y. Sun, P. Jetley, T. R. Quinn, and L. V. Kale. TRAM: Optimizing Fine-grained Communication with Topological Routing and Aggregation of Messages. In Proceedings of the International Conference on Parallel Processing, ICPP'14, Minneapolis, MN, September 2014.Google ScholarDigital Library
T. L. Wilmarth. POSE: Scalable General-purpose Parallel Discrete Event Simulation. PhD thesis, Department of Computer Science, University of Illinois at Urbana-Champaign, 2005. Google ScholarDigital Library
J.-S. Yeom, A. Bhatele, K. R. Bisset, E. Bohm, A. Gupta, L. V. Kale, M. Marathe, D. S. Nikolopoulos, M. Schulz, and L. Wesolowski. Overcoming the scalability challenges of epidemic simulations on blue waters. In Proceedings of the IEEE International Parallel & Distributed Processing Symposium, IPDPS'14. IEEE Computer Society, May 2014. Google ScholarDigital Library

Index Terms

Recommendations

Adaptive Methods for Irregular Parallel Discrete Event Simulation Workloads
SIGSIM-PADS '18: Proceedings of the 2018 ACM SIGSIM Conference on Principles of Advanced Discrete Simulation

Parallel Discrete Event Simulations (PDES) running at large scales involve the coordination of billions of very fine grain events distributed across a large number of processes. At such large scales optimistic synchronization protocols, such as TimeWarp,...
Read More
Performance and modularity benefits of message-driven execution

Processor idling due to communication delays and load imbalances are among the major factors that affect the performance of parallel programs. Need to optimize performance often forces programmers to sacrifice modularity. This paper focuses on the ...
Read More
Partitioning parallel discrete event simulation
CompSysTech '08: Proceedings of the 9th International Conference on Computer Systems and Technologies and Workshop for PhD Students in Computing

Parallel Discrete Event Simulation (PDES) allows to speed up the execution of a simulation by distributing the simulation's workload between multiple processors. The mapping of the simulation entities over the processors is very important with regard to ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SIGSIM-PADS '16: Proceedings of the 2016 ACM SIGSIM Conference on Principles of Advanced Discrete Simulation
May 2016
272 pages
ISBN:9781450337427
DOI:10.1145/2901378
General Chairs:
Richard Fujimoto
Georgia Institute of Technology, Atlanta GA
,
Brian Unger
University of Calgary, Alberta Canada
,
Program Chair:
Christopher Carothers
Rensselaer Polytechnic Institute, Troy NY
Copyright © 2016 ACM
© 2016 Association for Computing Machinery. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of the United States government. As such, the United States Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 15 May 2016
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
asynchronous
charm++
message-driven
optimistic
pdes
ross
runtime systems
time warp
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate398of779submissions,51%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 15
  Total Citations
  View Citations
- 274
  Total Downloads
- Downloads (Last 12 months)24
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Towards PDES in a Message-Driven Paradigm: A Preliminary Case Study Using Charm++

SIGSIM-PADS '16: Proceedings of the 2016 ACM SIGSIM Conference on Principles of Advanced Discrete Simulation

ABSTRACT

References

Cited By

Index Terms

Recommendations

Adaptive Methods for Irregular Parallel Discrete Event Simulation Workloads

Performance and modularity benefits of message-driven execution

Partitioning parallel discrete event simulation