ABSTRACT
As supercomputers close in on exascale performance, the increased number of processors and processing power translates to an increased demand on the underlying network interconnect. The Slim Fly network topology, a new lowdiameter and low-latency interconnection network, is gaining interest as one possible solution for next-generation supercomputing interconnect systems. In this paper, we present a high-fidelity Slim Fly it-level model leveraging the Rensselaer Optimistic Simulation System (ROSS) and Co-Design of Exascale Storage (CODES) frameworks. We validate our Slim Fly model with the Kathareios et al. Slim Fly model results provided at moderately sized network scales. We further scale the model size up to n unprecedented 1 million compute nodes; and through visualization of network simulation metrics such as link bandwidth, packet latency, and port occupancy, we get an insight into the network behavior at the million-node scale. We also show linear strong scaling of the Slim Fly model on an Intel cluster achieving a peak event rate of 36 million events per second using 128 MPI tasks to process 7 billion events. Detailed analysis of the underlying discrete-event simulation performance shows how the million-node Slim Fly model simulation executes in 198 seconds on the Intel cluster.
- B. Acun, N. Jain, A. Bhatele, M. Mubarak, C. Carothers, and L. Kale. Preliminary evaluation of a parallel trace replay tool for hpc network simulations. In S. Hunold, A. Costan, D. Giménez, A. Iosup, L. Ricci, M. E. Gómez Requena, V. Scarano, A. L. Varbanescu, S. L. Scott, S. Lankes, J. Weidendorfer, and M. Alexander, editors, Euro-Par 2015: Parallel Processing Workshops, volume 9523 of Lecture Notes in Computer Science, pages 417--429. Springer International Publishing, 2015.Google Scholar
- A. Alexandrov, M. F. Ionescu, K. E. Schauser, and C. Scheiman. Loggp: Incorporating long messages into the logp model\?one step closer towards a realistic model for parallel computation. In Proceedings of the Seventh Annual ACM Symposium on Parallel Algorithms and Architectures, SPAA '95, pages 95--105, New York, NY, USA, 1995. ACM. Google ScholarDigital Library
- P. D. Barnes, Jr., C. D. Carothers, D. R. Jefferson, and J. M. LaPre. Warp speed: Executing time warp on 1,966,080 cores. In Proceedings of the 1st ACM SIGSIM Conference on Principles of Advanced Discrete Simulation, SIGSIM PADS '13, pages 327--336, New York, NY, USA, 2013. ACM. Google ScholarDigital Library
- M. Besta and T. Hoefler. Slim Fly: A Cost Effective Low-Diameter Network Topology. Nov. 2014. Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (SC14). Google ScholarDigital Library
- A. Bhatele. Task mapping on complex computer network topologies for improved performance. Technical report, LDRD Final Report, Lawrence Livermore National Laboratory, Oct. 2015. LLNL-TR-678732.Google Scholar
- C. D. Carothers, D. Bauer, and S. Pearce. Ross: A high-performance, low memory, modular time warp system. In Proceedings of the Fourteenth Workshop on Parallel and Distributed Simulation, PADS '00, pages 53--60, Washington, DC, USA, 2000. IEEE Computer Society. Google ScholarDigital Library
- C. D. Carothers, K. S. Perumalla, and R. M. Fujimoto. Efficient optimistic parallel simulations using reverse computation. ACM Trans. Model. Comput. Simul., 9(3):224--253, July 1999. Google ScholarDigital Library
- CCI. Rsa cluster, Nov. 2014.Google Scholar
- J. Cope, L. N., L. S., C. P., C. C. D., and R. R. Codes: Enabling co-design of multilayer exascale storage architectures. In Proceedings of the Workshop on Emerging Supercomputing Technologies (WEST), Tuscon, AZ, USA, 2011.Google Scholar
- W. Dally. Virtual-channel flow control. Parallel and Distributed Systems, IEEE Transactions on, 3(2):194--205, Mar 1992. Google ScholarDigital Library
- W. Dally and B. Towles. Principles and Practices of Interconnection Networks. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 2003. Google ScholarDigital Library
- P. R. Hafner. Geometric realisation of the graphs of mckay-miller-siran. Journal of Combinatorial Theory, Series B, 90(2):223--232, 2004. Google ScholarDigital Library
- Intel. Ushering in a new era: Argonne national laboratory's aurora system. Technical report, Intel Corporation, April 2015.Google Scholar
- G. Kathareios, C. Minkenberg, B. Prisacari, G. Rodriguez, and T. Hoefler. Cost-Effective Diameter-Two Topologies: Analysis and Evaluation. Nov. 2015. Accepted at IEEE/ACM International Conference on High Performance Computing, Networking, Storage and Analysis (SC15). Google ScholarDigital Library
- N. Liu, A. Haider, X.-H. Sun, and D. Jin. Fattreesim: Modeling large-scale fat-tree networks for hpc systems and data centers using parallel and discrete event simulation. In Proceedings of the 3rd ACM SIGSIM Conference on Principles of Advanced Discrete Simulation, SIGSIM PADS '15, pages 199--210, New York, NY, USA, 2015. ACM. Google ScholarDigital Library
- B. D. McKay, M. Miller, and J. Siran. A note on large graphs of diameter two and given maximum degree. Journal of Combinatorial Theory, Series B, 74(1):110 -- 118, 1998. Google ScholarDigital Library
- P. A. Merolla, J. V. Arthur, R. Alvarez-Icaza, A. S. Cassidy, J. Sawada, F. Akopyan, B. L. Jackson, N. Imam, C. Guo, Y. Nakamura, B. Brezzo, I. Vo, S. K. Esser, R. Appuswamy, B. Taba, A. Amir, M. D. Flickner, W. P. Risk, R. Manohar, and D. S. Modha. A million spiking-neuron integrated circuit with a scalable communication network and interface. Science, 345(6197):668--673, 2014.Google ScholarCross Ref
- Miller, Mirka, Siran, and Jozef. Moore graphs and beyond: a survey of the degree/diameter problem. The Electronic Journal of Combinatorics {electronic only}, DS14:61 p., electronic only--61 p., electronic only, 2005.Google Scholar
- M. Mubarak, C. D. Carothers, R. Ross, and P. Carns. Modeling a million-node dragonfly network using massively parallel discrete-event simulation. In Proceedings of the 2012 SC Companion: High Performance Computing, Networking Storage and Analysis, SCC '12, pages 366--376, Washington, DC, USA, 2012. IEEE Computer Society. Google ScholarDigital Library
- M. Mubarak, C. D. Carothers, R. B. Ross, and P. Carns. A case study in using massively parallel simulation for extreme-scale torus network codesign. In Proceedings of the 2Nd ACM SIGSIM Conference on Principles of Advanced Discrete Simulation, SIGSIM PADS '14, pages 27--38, New York, NY, USA, 2014. ACM. Google ScholarDigital Library
- D. M. Nicol. The cost of conservative synchronization in parallel discrete event simulations. J. ACM, 40(2):304--333, Apr. 1993. Google ScholarDigital Library
- NVIDIA. Summit and sierra supercomputers: An inside look at the u.s. department of energy's new pre-exascale systems. Technical report, NVIDIA, November 2014.Google Scholar
- M. Papka, P. Messina, R. Coffey, and C. Drugan. Argonne Leadership Computing Facility 2014 annual report. Mar 2015.Google Scholar
- S. Snyder, P. Carns, J. Jenkins, K. Harms, R. Ross, M. Mubarak, and C. Carothers. A case for epidemic fault detection and group membership in hpc storage systems. In S. A. Jarvis, S. A. Wright, and S. D. Hammond, editors, High Performance Computing Systems. Performance Modeling, Benchmarking, and Simulation, volume 8966 of Lecture Notes in Computer Science, pages 237--248. Springer International Publishing, 2015.Google ScholarCross Ref
- L. G. Valiant. A scheme for fast parallel communication. SIAM Journal on Computing, 11(2):350--361, 1982.Google ScholarDigital Library
- S.-J. Wang. Load-balancing in multistage interconnection networks under multiple-pass routing. Journal of Parallel and Distributed Computing, 36(2):189 -- 194, 1996. Google ScholarDigital Library
Index Terms
- Modeling a Million-Node Slim Fly Network Using Parallel Discrete-Event Simulation
Recommendations
Modeling Large-Scale Slim Fly Networks Using Parallel Discrete-Event Simulation
Special Issue on PADS 2016As supercomputers approach exascale performance, the increased number of processors translates to an increased demand on the underlying network interconnect. The slim fly network topology, a new low-diameter, low-latency, and low-cost interconnection ...
Modeling a Million-Node Dragonfly Network Using Massively Parallel Discrete-Event Simulation
SCC '12: Proceedings of the 2012 SC Companion: High Performance Computing, Networking Storage and AnalysisA low-latency and low-diameter interconnection network will be an important component of future exascale architectures. The dragonfly network topology, a two-level directly connected network, is a candidate for exascale architectures because of its low ...
Fit Fly: A Case Study on Interconnect Innovation through Parallel Simulation
SIGSIM-PADS '19: Proceedings of the 2019 ACM SIGSIM Conference on Principles of Advanced Discrete SimulationTo meet the demand for exascale-level performance from high-performance computing (HPC) interconnects, many system architects are turning to simulation results for accurate and reliable predictions of the performance of prospective technologies. Testing ...
Comments