article

Technology-Driven, Highly-Scalable Dragonfly Topology

ACM SIGARCH Computer Architecture News Volume 36 Issue 3June 2008pp 77–88https://doi.org/10.1145/1394608.1382129

Published:01 June 2008Publication History

ACM SIGARCH Computer Architecture News

Abstract

Evolving technology and increasing pin-bandwidth motivate the use of high-radix routers to reduce the diameter, latency, and cost of interconnection networks. High-radix networks, however, require longer cables than their low-radix counterparts. Because cables dominate network cost, the number of cables, and particularly the number of long, global cables should be minimized to realize an efficient network. In this paper, we introduce the dragonfly topology which uses a group of high-radix routers as a virtual router to increase the effective radix of the network. With this organization, each minimally routed packet traverses at most one global channel. By reducing global channels, a dragonfly reduces cost by 20% compared to a flattened butterfly and by 52% compared to a folded Clos network in configurations with ≥ 16K nodes.We also introduce two new variants of global adaptive routing that enable load-balanced routing in the dragonfly. Each router in a dragonfly must make an adaptive routing decision based on the state of a global channel connected to a different router. Because of the indirect nature of this routing decision, conventional adaptive routing algorithms give degraded performance. We introduce the use of selective virtual-channel discrimination and the use of credit round-trip latency to both sense and signal channel congestion. The combination of these two methods gives throughput and latency that approaches that of an ideal adaptive routing algorithm.

References

D. Abts, A. Bataineh, S. Scott, G. Faanes, J. Schwarzmeier, E. Lundberg, T. Johnson, M. Bye, and G. Schwoerer. The Cray BlackWidow: A Highly Scalable Vector Multiprocessor. In Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis (SC'07), Reno, NV, Nov. 2007. Google ScholarDigital Library
R. D. Chamberlain, M. A. Franklin, and C. S. Baw. Gemini: An Optical Interconnection Network for Parallel Processing. IEEE Transactions on Parallel and Distributed Systems, 13(10):1038-1055, 2002. Google ScholarDigital Library
D. Chiou, L. R. Dennison, and W. J. Dally. Adaptive source routing and packet processing. United States Patent 20050100035, May 2005.Google Scholar
C. Clos. A Study of Non-Blocking Switching Networks. The Bell System technical Journal, 32(2):406-424, March 1953.Google Scholar
W. J. Dally. Virtual-channel Flow Control. IEEE Transactions on Parallel and Distributed Systems, 3(2):194-205, 1992. Google ScholarDigital Library
W. J. Dally and J. W. Poulton. Digital systems engineering. Cambridge University Press, New York, NY, 1998. Google ScholarDigital Library
W. J. Dally and C. L. Seitz. Deadlock-free message routing in multiprocessor interconnection networks. IEEE Transactions on Computers, 36(5):547-553, 1987. Google ScholarDigital Library
W. J. Dally and B. Towles. Principles and Practices of Interconnection Networks. Morgan Kaufmann, San Francisco, CA, 2004. Google ScholarDigital Library
S. Dandamudi and D. Eager. Hierarchical Interconnection Networks for Multicomputer Systems. IEEE Transactions on Computers, 39(6):786- 797, 1990. Google ScholarDigital Library
A. K. Gupta and W. J. Dally. Topology optimization of interconnection networks. IEEE Computer Architecture Letters, 5(1), 2006. Google ScholarDigital Library
A. K. Gupta, W. J. Dally, A. Singh, and B. Towles. Scalable Opto-Electronic Network (SOENet). In Proc. of Hot Interconnects, pages 71-75, Stanford, CA, Aug. 2002. Google ScholarDigital Library
Intel Connects Cables. http://www.intel.com/design/network/ products/optical/cables/index.htm/.Google Scholar
J. Kim, W. J. Dally, and D. Abts. Adaptive Routing in High-radix Clos Network. In International Conference for High Performance Computing, Networking, Storage, and Analysis (SC'06), Tampa, FL, Nov. 2006. Google ScholarDigital Library
J. Kim, W. J. Dally, and D. Abts. Flattened Butterfly : A Cost-Efficient Topology for High-Radix Networks. In Proc. of the International Symposium on Computer Architecture (ISCA), pages 126-137, San Diego, CA, June 2007. Google ScholarDigital Library
J. Kim, W. J. Dally, B. Towles, and A. K. Gupta. Microarchitecture of a High-Radix Router. In Proc. of the International Symposium on Computer Architecture (ISCA), pages 420-431, Madison, WI, June 2005. Google ScholarDigital Library
A. K. Kodi and A. Louri. Design of a High-Speed Optical Interconnect for Scalable Shared-Memory Multiprocessors. IEEE Micro, 25(1):41- 49, 2005. Google ScholarDigital Library
P. Kongetira, K. Aingaran, and K. Olukotun. Niagara: A 32-Way Multithreaded Sparc Processor. IEEE Micro, 25(2):21-29, 2005. Google ScholarDigital Library
J. M. Kumar and L. M. Patnaik. Extended hypercube: A hierarchical interconnection network of hypercubes. IEEE Trans. Parallel Distrib. Syst., 3(1):45-57, 1992. Google ScholarDigital Library
J. Laudon and D. Lenoski. The SGI Origin: A ccNUMA Highly Scalable Server. In Proc. of the 24th Annual Int'l Symp. on Computer Architecture , pages 241-251, 1997. Google ScholarDigital Library
C. Leiserson. Fat-trees: Universal networks for hardware efficient supercomputing. IEEE Transactions on Computer, C-34(10):892-901, October 1985. Google ScholarDigital Library
Luxtera Blazar LUX5010. http://www.luxtera.com/ prod- ucts_blazar.htm.Google Scholar
Luxtera Inc. White Paper: Fiber will displace copper sooner than you think, Nov. 2005.Google Scholar
R. Palmer, J. Poulton, W. J. Dally, J. Eyles, A. M. Fuller, T. Greer, M. Horowitz, M. Kellam, F. Quan, and F. Zarkeshvari. A 14mW 6.25Gb/s Transceiver in 90nm CMOS for Serial Chip-to-Chip Communications. In IEEE Int'l Solid-State Circuits Conf., Digest of Tech. Papers (ISSCC), pages 440-441, 2007.Google Scholar
T. Pinkston. Design considerations for optical interconnects in parallel computers. In Massively Parallel Processing Using Optical Interconnections , pages 306-322, Cancun, Mexico, 1994.Google ScholarCross Ref
F. P. Preparata and J. Vuillemin. The cube-connected cycles: a versatile network for parallel computation. Commun. ACM, 24(5):300-309, 1981. Google ScholarDigital Library
S. Scott, D. Abts, J. Kim, and W. J. Dally. The BlackWidow High-radix Clos Network. In Proc. of the International Symposium on Computer Architecture (ISCA), pages 16-28, Boston, MA, June 2006. Google ScholarDigital Library
S. Scott and G. Thorson. The Cray T3E Network: Adaptive Routing in a High Performance 3D Torus. In Hot Chips 4, Stanford, CA, Aug. 1996.Google Scholar
A. Shacham and K. Bergman. Building Ultralow-Latency Interconnection Networks Using Photonic Integration. IEEE Micro, 27(4):6-20, 2007. Google ScholarDigital Library
A. Singh. Load-Balanced Routing in Interconnection Networks. PhD thesis, Stanford University, 2005.Google Scholar
A. Singh, W. J. Dally, A. K. Gupta, and B. Towles. GOAL: A load-balanced adaptive routing algorithm for torus networks. In Proc. of the International Symposium on Computer Architecture (ISCA), pages 194-205, San Diego, CA, June 2003. Google ScholarDigital Library
A. Singh, W. J. Dally, A. K. Gupta, and B. Towles. Adaptive channel queue routing on k-ary n-cubes. In SPAA '04: Proceedings of the sixteenth annual ACM symposium on Parallelism in algorithms and architectures , pages 11-19, New York, NY, USA, 2004. ACM Press. Google ScholarDigital Library
L. G. Valiant. A scheme for fast parallel communication. SIAM Journal on Computing, 11(2):350-361, 1982.Google ScholarDigital Library
D. Wentzlaff, P. Griffin, H. Hoffmann, L. Bao, B. Edwards, C. Ramey, M. Mattina, C.-C. Miao, J. F. B. III, and A. Agarwal. On-Chip Interconnection Architecture of the Tile Processor. IEEE Micro, 27(5):15-31, 2007. Google ScholarDigital Library

Index Terms

Technology-Driven, Highly-Scalable Dragonfly Topology
1. Hardware
  1. Hardware validation
  2. Integrated circuits
    1. Interconnect
      1. Input / output circuits

Recommendations

Flattened butterfly: a cost-efficient topology for high-radix networks
ISCA '07: Proceedings of the 34th annual international symposium on Computer architecture

Increasing integrated-circuit pin bandwidth has motivateda corresponding increase in the degree or radix of interconnection networksand their routers. This paper introduces the flattened butterfly, a cost-efficient topology for high-radix networks. On ...
Read More
Technology-Driven, Highly-Scalable Dragonfly Topology
ISCA '08: Proceedings of the 35th Annual International Symposium on Computer Architecture

Evolving technology and increasing pin-bandwidth motivate the use of high-radix routers to reduce the diameter, latency, and cost of interconnection networks. High-radix networks, however, require longer cables than their low-radix counterparts. Because ...
Read More
Flattened butterfly: a cost-efficient topology for high-radix networks

Increasing integrated-circuit pin bandwidth has motivateda corresponding increase in the degree or radix of interconnection networksand their routers. This paper introduces the flattened butterfly, a cost-efficient topology for high-radix networks. On ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM SIGARCH Computer Architecture News Volume 36, Issue 3
June 2008
449 pages
ISSN:0163-5964
DOI:10.1145/1394608
Issue’s Table of Contents
ISCA '08: Proceedings of the 35th Annual International Symposium on Computer Architecture
June 2008
449 pages
ISBN:9780769531748
Copyright © 2008 Authors
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 1 June 2008
Check for updates
Author Tags
dragonfly
interconnection networks
topology
Qualifiers
- article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 298
  Total Citations
  View Citations
- 2,250
  Total Downloads
- Downloads (Last 12 months)296
- Downloads (Last 6 weeks)32
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Technology-Driven, Highly-Scalable Dragonfly Topology

ACM SIGARCH Computer Architecture News

Abstract

References

Cited By

Index Terms

Recommendations

Flattened butterfly: a cost-efficient topology for high-radix networks

Technology-Driven, Highly-Scalable Dragonfly Topology

Flattened butterfly: a cost-efficient topology for high-radix networks