skip to main content
10.1145/2901378.2901389acmconferencesArticle/Chapter ViewAbstractPublication PagespadsConference Proceedingsconference-collections
research-article
Public Access

Modeling a Million-Node Slim Fly Network Using Parallel Discrete-Event Simulation

Published:15 May 2016Publication History

ABSTRACT

As supercomputers close in on exascale performance, the increased number of processors and processing power translates to an increased demand on the underlying network interconnect. The Slim Fly network topology, a new lowdiameter and low-latency interconnection network, is gaining interest as one possible solution for next-generation supercomputing interconnect systems. In this paper, we present a high-fidelity Slim Fly it-level model leveraging the Rensselaer Optimistic Simulation System (ROSS) and Co-Design of Exascale Storage (CODES) frameworks. We validate our Slim Fly model with the Kathareios et al. Slim Fly model results provided at moderately sized network scales. We further scale the model size up to n unprecedented 1 million compute nodes; and through visualization of network simulation metrics such as link bandwidth, packet latency, and port occupancy, we get an insight into the network behavior at the million-node scale. We also show linear strong scaling of the Slim Fly model on an Intel cluster achieving a peak event rate of 36 million events per second using 128 MPI tasks to process 7 billion events. Detailed analysis of the underlying discrete-event simulation performance shows how the million-node Slim Fly model simulation executes in 198 seconds on the Intel cluster.

References

  1. B. Acun, N. Jain, A. Bhatele, M. Mubarak, C. Carothers, and L. Kale. Preliminary evaluation of a parallel trace replay tool for hpc network simulations. In S. Hunold, A. Costan, D. Giménez, A. Iosup, L. Ricci, M. E. Gómez Requena, V. Scarano, A. L. Varbanescu, S. L. Scott, S. Lankes, J. Weidendorfer, and M. Alexander, editors, Euro-Par 2015: Parallel Processing Workshops, volume 9523 of Lecture Notes in Computer Science, pages 417--429. Springer International Publishing, 2015.Google ScholarGoogle Scholar
  2. A. Alexandrov, M. F. Ionescu, K. E. Schauser, and C. Scheiman. Loggp: Incorporating long messages into the logp model\?one step closer towards a realistic model for parallel computation. In Proceedings of the Seventh Annual ACM Symposium on Parallel Algorithms and Architectures, SPAA '95, pages 95--105, New York, NY, USA, 1995. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. P. D. Barnes, Jr., C. D. Carothers, D. R. Jefferson, and J. M. LaPre. Warp speed: Executing time warp on 1,966,080 cores. In Proceedings of the 1st ACM SIGSIM Conference on Principles of Advanced Discrete Simulation, SIGSIM PADS '13, pages 327--336, New York, NY, USA, 2013. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. M. Besta and T. Hoefler. Slim Fly: A Cost Effective Low-Diameter Network Topology. Nov. 2014. Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (SC14). Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. A. Bhatele. Task mapping on complex computer network topologies for improved performance. Technical report, LDRD Final Report, Lawrence Livermore National Laboratory, Oct. 2015. LLNL-TR-678732.Google ScholarGoogle Scholar
  6. C. D. Carothers, D. Bauer, and S. Pearce. Ross: A high-performance, low memory, modular time warp system. In Proceedings of the Fourteenth Workshop on Parallel and Distributed Simulation, PADS '00, pages 53--60, Washington, DC, USA, 2000. IEEE Computer Society. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. C. D. Carothers, K. S. Perumalla, and R. M. Fujimoto. Efficient optimistic parallel simulations using reverse computation. ACM Trans. Model. Comput. Simul., 9(3):224--253, July 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. CCI. Rsa cluster, Nov. 2014.Google ScholarGoogle Scholar
  9. J. Cope, L. N., L. S., C. P., C. C. D., and R. R. Codes: Enabling co-design of multilayer exascale storage architectures. In Proceedings of the Workshop on Emerging Supercomputing Technologies (WEST), Tuscon, AZ, USA, 2011.Google ScholarGoogle Scholar
  10. W. Dally. Virtual-channel flow control. Parallel and Distributed Systems, IEEE Transactions on, 3(2):194--205, Mar 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. W. Dally and B. Towles. Principles and Practices of Interconnection Networks. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. P. R. Hafner. Geometric realisation of the graphs of mckay-miller-siran. Journal of Combinatorial Theory, Series B, 90(2):223--232, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Intel. Ushering in a new era: Argonne national laboratory's aurora system. Technical report, Intel Corporation, April 2015.Google ScholarGoogle Scholar
  14. G. Kathareios, C. Minkenberg, B. Prisacari, G. Rodriguez, and T. Hoefler. Cost-Effective Diameter-Two Topologies: Analysis and Evaluation. Nov. 2015. Accepted at IEEE/ACM International Conference on High Performance Computing, Networking, Storage and Analysis (SC15). Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. N. Liu, A. Haider, X.-H. Sun, and D. Jin. Fattreesim: Modeling large-scale fat-tree networks for hpc systems and data centers using parallel and discrete event simulation. In Proceedings of the 3rd ACM SIGSIM Conference on Principles of Advanced Discrete Simulation, SIGSIM PADS '15, pages 199--210, New York, NY, USA, 2015. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. B. D. McKay, M. Miller, and J. Siran. A note on large graphs of diameter two and given maximum degree. Journal of Combinatorial Theory, Series B, 74(1):110 -- 118, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. P. A. Merolla, J. V. Arthur, R. Alvarez-Icaza, A. S. Cassidy, J. Sawada, F. Akopyan, B. L. Jackson, N. Imam, C. Guo, Y. Nakamura, B. Brezzo, I. Vo, S. K. Esser, R. Appuswamy, B. Taba, A. Amir, M. D. Flickner, W. P. Risk, R. Manohar, and D. S. Modha. A million spiking-neuron integrated circuit with a scalable communication network and interface. Science, 345(6197):668--673, 2014.Google ScholarGoogle ScholarCross RefCross Ref
  18. Miller, Mirka, Siran, and Jozef. Moore graphs and beyond: a survey of the degree/diameter problem. The Electronic Journal of Combinatorics {electronic only}, DS14:61 p., electronic only--61 p., electronic only, 2005.Google ScholarGoogle Scholar
  19. M. Mubarak, C. D. Carothers, R. Ross, and P. Carns. Modeling a million-node dragonfly network using massively parallel discrete-event simulation. In Proceedings of the 2012 SC Companion: High Performance Computing, Networking Storage and Analysis, SCC '12, pages 366--376, Washington, DC, USA, 2012. IEEE Computer Society. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. M. Mubarak, C. D. Carothers, R. B. Ross, and P. Carns. A case study in using massively parallel simulation for extreme-scale torus network codesign. In Proceedings of the 2Nd ACM SIGSIM Conference on Principles of Advanced Discrete Simulation, SIGSIM PADS '14, pages 27--38, New York, NY, USA, 2014. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. D. M. Nicol. The cost of conservative synchronization in parallel discrete event simulations. J. ACM, 40(2):304--333, Apr. 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. NVIDIA. Summit and sierra supercomputers: An inside look at the u.s. department of energy's new pre-exascale systems. Technical report, NVIDIA, November 2014.Google ScholarGoogle Scholar
  23. M. Papka, P. Messina, R. Coffey, and C. Drugan. Argonne Leadership Computing Facility 2014 annual report. Mar 2015.Google ScholarGoogle Scholar
  24. S. Snyder, P. Carns, J. Jenkins, K. Harms, R. Ross, M. Mubarak, and C. Carothers. A case for epidemic fault detection and group membership in hpc storage systems. In S. A. Jarvis, S. A. Wright, and S. D. Hammond, editors, High Performance Computing Systems. Performance Modeling, Benchmarking, and Simulation, volume 8966 of Lecture Notes in Computer Science, pages 237--248. Springer International Publishing, 2015.Google ScholarGoogle ScholarCross RefCross Ref
  25. L. G. Valiant. A scheme for fast parallel communication. SIAM Journal on Computing, 11(2):350--361, 1982.Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. S.-J. Wang. Load-balancing in multistage interconnection networks under multiple-pass routing. Journal of Parallel and Distributed Computing, 36(2):189 -- 194, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Modeling a Million-Node Slim Fly Network Using Parallel Discrete-Event Simulation

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Conferences
            SIGSIM-PADS '16: Proceedings of the 2016 ACM SIGSIM Conference on Principles of Advanced Discrete Simulation
            May 2016
            272 pages
            ISBN:9781450337427
            DOI:10.1145/2901378

            Copyright © 2016 ACM

            Publication rights licensed to ACM. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of the United States government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 15 May 2016

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article

            Acceptance Rates

            Overall Acceptance Rate398of779submissions,51%

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader