skip to main content
10.1145/1878537.1878695acmotherconferencesArticle/Chapter ViewAbstractPublication PagesspringsimConference Proceedingsconference-collections
research-article

A replication structure for efficient and fault-tolerant parallel and distributed simulations

Published: 11 April 2010 Publication History

Abstract

Large scale parallel and distributed simulations (federations) are developed to study complex systems. Their executions are usually computationally intensive, involving a large number of simulation components (federates) which may be developed by different participants and executed at different locations. Hence, it is attractive to provide mechanisms which can accelerate the executions and tolerate the failures of federates. Previously, we have proposed a federate replication structure, which improves simulation performance by replicating federates with alternative synchronization approaches and automatically choosing the fastest replica to represent the federate in the federation execution. In this paper, we will extend the replication structure so that it keeps the advantages of performance enhancement in the presence of failures. Besides presenting the design and implementation details, we also report the experimental results to demonstrate that the extended replication structure can provide fault tolerance while maintaining performance advantages for simulation executions.

References

[1]
Agrawal, D. and J. R. Agre (1992). "Recovering from Multiple Process Failures in the Time Warp Mechanism." IEEE Trans. Comput. 41(12), 1504--1514.
[2]
Berchtold, C. and M. Hezel (2001). "An Architecture for Fault Tolerant HLA-based Simulation." In Procs of the 15th International European Simulation Multi-Conference, pp. 616--620.
[3]
Bryant, R. E. (1977). "Simulation of Packet Communication Architecture Computer Systems." Technical report, MIT. Cambridge, MA, USA.
[4]
Chandy, K. M. and J. Misra (1979). "Distributed Simulation: A Case Study in Design and Verification of Distributed Programs." IEEE Trans. Software Eng. 5(5), 440--452.
[5]
Chen, D., S. J. Turner, and W. Cai (2006). "A Framework for Robust HLA-based Distributed Simulations." In Procs of the 20th Workshop on Principles of Advanced and Distributed Simulation, pp. 183--192.
[6]
Cucuzzo, D., S. D'Alessio, F. Quaglia, and P. Romano (2007). "A Lightweight Heuristic-based Mechanism for Collecting Committed Consistent Global States in Optimistic Simulation." In Procs of the 11th International Symposium on Distributed Simulation and Real-Time Applications, pp. 227--234.
[7]
Damani, O. P. and V. K. Garg (1998). "Fault-tolerant Distributed Simulation." In Procs of the 12th workshop on Parallel and distributed simulation, pp. 38--45.
[8]
Defense Modeling and Simulation Office. "High Level Architecture RTI 1.3NG Programmer's Guide Version 5."
[9]
Eklöf, M., F. Moradi, and R. Ayani (2005). "A Framework for Fault-tolerance in HLA-based Distributed Simulations." In Procs of the 37th conference on Winter simulation, pp. 1182--1189.
[10]
Foster, I., C. Kesselman, and S. Tuecke (2001). "The Anatomy of the Grid - Enabling Scalable Virtual Organizations." Int. J. High Perform. Comput. Appl. 15(3), 200--222.
[11]
Fujimoto, R., D. Lunceford, E. Page, and A. Uhrmacher (2002). "Technical Report of the Dagstuhl-seminar Grand Challenges for Modelling and Simulation."
[12]
Fujimoto, R. M. (1990). "Performance of Time Warp under Synthetic Workloads." In Procs of the SCS Multiconference on Distributed Simulation, pp. 23--28.
[13]
Fujimoto, R. M. (2000). Parallel and Distributed Simulation Systems. Wiley Interscience.
[14]
Grošelj, B. (1991). "Fault-tolerant Distributed Simulation." In Procs of the 23rd conference on Winter simulation, pp. 637--641.
[15]
IEEE (2000). Standard 1516 (HLA Rules), 1516.1 (Federate Interface Specification) and 1516.2 (Object Model Template).
[16]
Jefferson, D. R. (1985). "Virtual Time." ACM Trans. Program. Lang. Syst. 7(3), 404--425.
[17]
Kiesling, T. (2003). "Fault-tolerant Distributed Simulation: A Position Paper." Available at http://www.unibw.de/inf4/personen/wm/t_kiesling/misc/ftds-position-paper.pdf.
[18]
Li, Z., W. Cai, S. J. Turner, and K. Pan (2007). "Federate Migration in a Service Oriented HLA RTI." In Procs of International Symposium on Distributed Simulation and Real-Time Applications, pp. 113--121.
[19]
Li, Z., W. Cai, S. J. Turner, and K. Pan (2008). "Improving Performance by Replicating Simulations with Alternative Synchronization Approaches." In Procs of the 40th Conference on Winter Simulation, pp. 1112--1120.
[20]
Lüthi, J. and C. Berchtold (2000). "Concepts for Dependable Distributed Discrete Event Simulation." In Procs of the 14th European Simulation Multiconference on Simulation and Modelling, pp. 59--66.
[21]
Lüthi, J. and S. Großmann (2004). "FT-RSS: A Flexible Framework for Fault Tolerant HLA Federations." In Procs of International Conference on Computational Science, pp. 865--872.
[22]
Möller, B., M. Karlsson, and B. Löfstrand (2005). "Developing Fault Tolerant Federations Using HLA Evolved." In Procs of the 2005 Spring Simulation Interoperability Workshop, Number 05S-SIW-048.
[23]
Pan, K., S. J. Turner, W. Cai, and Z. Li (2007). "A Service Oriented HLA RTI on the Grid." In Procs of International Conference on Web Services, pp. 984--992.
[24]
Pan, K., S. J. Turner, W. Cai, and Z. Li (2008). "A Hybrid HLA Time Management Algorithm based on Both Conditional and Unconditional Information." In Procs of 22th Workshop on Parallel and Distributed Simulation, pp. 203--211.
[25]
Sotomayor, B. (2005). "The Globus Toolkit 4 Programmer's Tutorial." Available via http://gdp.globus.org/gt4-tutorial/.
[26]
Stelling, P., C. DeMatteis, I. Foster, C. Kesselman, C. Lee, and G. von Laszewski (1999). "A Fault Detection Service for Wide Area Distributed Computations." Cluster Computing 2(2), 117--128.

Cited By

View all
  • (2016)An Experimental Implementation of Software Rejuvenation in Time Warp Simulation2016 IEEE International Symposium on Software Reliability Engineering Workshops (ISSREW)10.1109/ISSREW.2016.37(104-110)Online publication date: Oct-2016
  • (2016)Transparent three-phase Byzantine fault tolerance for parallel and distributed simulationsSimulation Modelling Practice and Theory10.1016/j.simpat.2015.09.01260(90-107)Online publication date: Jan-2016
  • (2014)Un-identical federate replication structure for improving performance of HLA-based simulationsSimulation Modelling Practice and Theory10.1016/j.simpat.2014.06.01648(112-128)Online publication date: Nov-2014
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
SpringSim '10: Proceedings of the 2010 Spring Simulation Multiconference
April 2010
1726 pages
ISBN:9781450300698

Sponsors

  • SCS: Society for Modeling and Simulation International

In-Cooperation

Publisher

Society for Computer Simulation International

San Diego, CA, United States

Publication History

Published: 11 April 2010

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. decoupled federate architecture
  2. fault tolerance
  3. federate replication
  4. parallel and distributed simulation
  5. performance enhancement

Qualifiers

  • Research-article

Conference

SpringSim '10
Sponsor:
  • SCS
SpringSim '10: 2010 Spring Simulation Conference
April 11 - 15, 2010
Florida, Orlando

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)1
  • Downloads (Last 6 weeks)0
Reflects downloads up to 01 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2016)An Experimental Implementation of Software Rejuvenation in Time Warp Simulation2016 IEEE International Symposium on Software Reliability Engineering Workshops (ISSREW)10.1109/ISSREW.2016.37(104-110)Online publication date: Oct-2016
  • (2016)Transparent three-phase Byzantine fault tolerance for parallel and distributed simulationsSimulation Modelling Practice and Theory10.1016/j.simpat.2015.09.01260(90-107)Online publication date: Jan-2016
  • (2014)Un-identical federate replication structure for improving performance of HLA-based simulationsSimulation Modelling Practice and Theory10.1016/j.simpat.2014.06.01648(112-128)Online publication date: Nov-2014
  • (2013)Layered simulation architecture: A practical approachSimulation Modelling Practice and Theory10.1016/j.simpat.2012.11.00132(1-14)Online publication date: Mar-2013
  • (2010)Federate Fault Tolerance in HLA-Based SimulationProceedings of the 2010 IEEE Workshop on Principles of Advanced and Distributed Simulation10.1109/PADS.2010.5471663(3-12)Online publication date: 17-May-2010
  • (2010)A Three-Phases Byzantine Fault Tolerance Mechanism for HLA-Based SimulationProceedings of the 2010 IEEE/ACM 14th International Symposium on Distributed Simulation and Real Time Applications10.1109/DS-RT.2010.24(149-158)Online publication date: 17-Oct-2010

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media