ABSTRACT
The OpenFlow-style Software Defined Networking (SDN) technology has shown promising performance in data centers and campus networks; and the HPC community is significantly interested in adopting the SDN technology. However, while OpenFlow-style SDN allows dynamic per-flow resource management using a global network view, it does not support adaptive routing, which is widely used in HPC systems. This gives rise to the question whether SDN can achieve the performance that HPC systems expect with adaptive routing. In this work, we investigate possible methods to apply the SDN technology on the current generation HPC interconnects with the Dragonfly topology, and compare the performance of SDN with that of adaptive routing. Our results indicate that adaptive routing results in higher performance than SDN when both have similar resource allocation for a given traffic condition. However, SDN can use the global network view to compete with adaptive routing by allocating network resources more effectively.
- 2017. NERSC Cori Supercomputer. http://www.nersc.gov/users/computational-systems/cori/. (2017).Google Scholar
- Helgi Adalsteinsson, Scott Cranford, David A. Evensky, Joseph P. Kenny, Jackson Mayo, Ali Pinar, and Curtis L. Janssen. 2010. A Simulator for Large-Scale Parallel Computer Architectures. Int. J. Distrib. Syst. Technol. 1, 2 (April 2010), 57--73. Google ScholarDigital Library
- Mohammad Al-Fares, Sivasankar Radhakrishnan, Barath Raghavan, Nelson Huang, and Amin Vahdat. 2010. Hedera: Dynamic Flow Scheduling for Data Center Networks. In Proceedings of the 7th USENIX Conference on Networked Systems Design and Implementation (NSDI'10). USENIX Association, Berkeley, CA, USA, 19--19. http://dl.acm.org/citation.cfm?id=1855711.1855730 Google ScholarDigital Library
- Mohammad Alizadeh, Abdul Kabbani, Tom Edsall, Balaji Prabhakar, Amin Vahdat, and Masato Yasuda. 2012. Less Is More: Trading a Little Bandwidth for Ultra-Low Latency in the Data Center. In Presented as part of the 9th USENIX Symposium on Networked Systems Design and Implementation (NSDI 12). USENIX, San Jose, CA, 253--266. https://www.usenix.org/conference/nsdi12/technical-sessions/presentation/alizadeh Google ScholarDigital Library
- Robert Alverson, Duncan Roweth, and Larry Kaplan. 2010. The Gemini System Interconnect. In Proceedings of the 2010 18th IEEE Symposium on High Performance Interconnects (HOTI '10). IEEE Computer Society, Washington, DC, USA, 83--87. Google ScholarDigital Library
- Omer Arap, Geoffrey Brown, Bryce Himebaugh, and D. Martin Swany. 2014. Software Defined Multicasting for MPI Collective Operation Offloading with the NetFPGA. In Euro-Par.Google Scholar
- Billy J. Archer and Manuel Vigil. 2014. The Trinity System. In Nuclear Explosive Code Development Conference (NECDC). Los Alamos, New Mexico. Also appears as Los Alamos Technical Report LA-UR-15-20221.Google Scholar
- M. F. Bari, S. R. Chowdhury, R. Ahmed, and R. Boutaba. 2013. PolicyCop: An Autonomic QoS Policy Enforcement Framework for Software Defined Networks. In Future Networks and Services (SDN4FNS), 2013 IEEE SDN for. 1--7.Google Scholar
- Dong Chen, Noel A. Eisley, Philip Heidelberger, Robert M. Senger, Yutaka Sugawara, Sameer Kumar, Valentina Salapura, David L. Satterfield, Burkhard Steinmacher-Burow, and Jeffrey J. Parker. 2011. The IBM Blue Gene/Q Interconnection Network and Message Unit. In Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC '11). ACM, New York, NY, USA, Article 26, 10 pages. Google ScholarDigital Library
- Greg Faanes, Abdulla Bataineh, Duncan Roweth, Tom Court, Edwin Froese, Bob Alverson, Tim Johnson, Joe Kopnick, Mike Higgins, and James Reinhard. 2012. Cray Cascade: A Scalable HPC System Based on a Dragonfly Network. In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (SC '12). IEEE Computer Society Press, Los Alamitos, CA, USA, Article 103, 9 pages. http://dl.acm.org/citation.cfm?id=2388996.2389136 Google ScholarDigital Library
- Andrew D. Ferguson, Arjun Guha, Chen Liang, Rodrigo Fonseca, and Shriram Krishnamurthi. 2013. Participatory Networking: An API for Application Control of SDNs. SIGCOMM Comput. Commun. Rev. 43, 4 (Aug. 2013), 327--338. Google ScholarDigital Library
- Open Networking Foundation. 2014. OpenFlow Switch Specification, Version 1.5.0 (Protocol version 0x06). (19 December 2014). available at https://www.opennetworking.org/images/stories/downloads/sdn-resources/onf-speciications/openflow/openflow-switch-v1.5.0.noipr.pdf.Google Scholar
- Open Networking Foundation. 2014. SDN Architecture. White Paper, ONF TR-502. (June 2014). available at https://www.opennetworking.org/images/stories/downloads/sdn-resources/technical-reports/TR_SDN_ARCH_1.0_06062014.pdf.Google Scholar
- Nan Jiang, John Kim, and William J. Dally. 2009. Indirect Adaptive Routing on Large Scale Interconnection Networks. In Proceedings of the 36th Annual International Symposium on Computer Architecture (ISCA '09). ACM, New York, NY, USA, 220--231. Google ScholarDigital Library
- Gregory Johnson, Darren J. Kerbyson, and Mike Lang. 2008. Optimization of InfiniBand for Scientific Applications. In Proceedings of the 2008 IEEE International Symposium on Parallel and Distributed Processing. IEEE Computer Society Press, 1--8.Google ScholarCross Ref
- K. Karenos, V. Kalogeraki, and S. V. Krishnamurthy. 2005. A rate control framework for supporting multiple classes of traffic in sensor networks. In 26th IEEE International Real-Time Systems Symposium (RTSS'05). 11 pp.--297. Google ScholarDigital Library
- J. Kim, W. J. Dally, J. Dally, and D. Abts. 2006. Adaptive Routing in High-Radix Clos Network. In SC 2006 Conference, Proceedings of the ACM/IEEE. 7--7. Google ScholarDigital Library
- John Kim, Wiliam J. Dally, Steve Scott, and Dennis Abts. 2008. Technology-Driven, Highly-Scalable Dragonfly Topology. In Proceedings of the 35th Annual International Symposium on Computer Architecture (ISCA '08). IEEE Computer Society, Washington, DC, USA, 77--88. Google ScholarDigital Library
- Jason Lee, Zhou Tong, Karthik Achalkar, Xin Yuan, and Michael Lang. 2016. Enhancing Infiniband with Openflow-style SDN Capability. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC '16). IEEE Press, Piscataway, NJ, USA, Article 36, 12 pages. http://dl.acm.org/citation.cfm?id=3014904.3014953 Google ScholarDigital Library
- P. Makpaisit, K. Ichikawa, P. Uthayopas, S. Date, K. Takahashi, and D. Khureltulga. 2015. MPI_Reduce algorithm for OpenFlow-enabled network. In 2015 15th International Symposium on Communications and Information Technologies (ISCIT). 261--264.Google Scholar
- Baatarsuren Munkhdorj, Keichi Takahashi, Dashdavaa Khureltulga, Yasuhiro Watashiba, Yoshiyuki Kido, Susumu Date, and Shinji Shimojo. 2015. Design and Implementation of Control Sequence Generator for SDN-enhanced MPI. In Proceedings of the Fifth International Workshop on Network-Aware Data Management (NDM '15). ACM, New York, NY, USA, Article 4, 9 pages. Google ScholarDigital Library
- Dritan Nace, Linh Nhat Doan, Olivier Klopfenstein, and Alfred Bashllari. 2008. Max-min Fairness in Multi-commodity Flows. Computers and Operations Research 35, 2 (Feb. 2008), 557--573. Google ScholarDigital Library
- K. Takahashi, D. Khureltulga, B. Munkhdorj, Y. Kido, S. Date, H. Yamanaka, E. Kawai, and S. Shimojo. 2015. Concept and Design of SDN-Enhanced MPI Framework. In 2015 Fourth European Workshop on Software Defined Networks. 109--110. Google ScholarDigital Library
- K. Takahashi, D. Khureltulga, Y. Watashiba, Y. Kido, S. Date, and S. Shimojo. 2014. Performance evaluation of SDN-enhanced MPI allreduce on a cluster system with fat-tree interconnect. In High Performance Computing Simulation (HPCS), 2014 International Conference on. 784--792.Google Scholar
- Xin Yuan, R. Melhem, and R. Gupta. 1996. Compiled Communication for All-Optical TDM Networks. In Supercomputing, 1996. Proceedings of the 1996 ACM/IEEE Conference on. 25--25. Google ScholarDigital Library
Recommendations
Topology-custom UGAL routing on dragonfly
SC '19: Proceedings of the International Conference for High Performance Computing, Networking, Storage and AnalysisThe Dragonfly network has been deployed in the current generation supercomputers and will be used in the next generation supercomputers. The Universal Globally Adaptive Load-balance routing (UGAL) is the state-of-the-art routing scheme for Dragonfly. In ...
Technology-Driven, Highly-Scalable Dragonfly Topology
ISCA '08: Proceedings of the 35th Annual International Symposium on Computer ArchitectureEvolving technology and increasing pin-bandwidth motivate the use of high-radix routers to reduce the diameter, latency, and cost of interconnection networks. High-radix networks, however, require longer cables than their low-radix counterparts. Because ...
Planar-adaptive routing: low-cost adaptive networks for multiprocessors
Network throughput can be increased by allowing multipath, adaptive routing. Adaptive routing allows more freedom in the paths taken by messages, spreading load over physical channels more evenly. The flexibility of adaptive routing introduces new ...
Comments