Article

Flattened butterfly: a cost-efficient topology for high-radix networks

Authors:
John Kim

Stanford University, Stanford, CA

Stanford University, Stanford, CA
View Profile

,
William J. Dally

Stanford University, Stanford, CA

Stanford University, Stanford, CA
View Profile

,
Dennis Abts

Cray Inc., Chippewa Falls, WI

Cray Inc., Chippewa Falls, WI
View Profile

ISCA '07: Proceedings of the 34th annual international symposium on Computer architectureJune 2007Pages 126–137https://doi.org/10.1145/1250662.1250679

Published:09 June 2007Publication History

ISCA '07: Proceedings of the 34th annual international symposium on Computer architecture

Pages 126–137

ABSTRACT

Increasing integrated-circuit pin bandwidth has motivateda corresponding increase in the degree or radix of interconnection networksand their routers. This paper introduces the flattened butterfly, a cost-efficient topology for high-radix networks. On benign (load-balanced) traffic, the flattened butterfly approaches the cost/performance of a butterfly network and has roughly half the cost of a comparable performance Clos network.The advantage over the Clos is achieved by eliminating redundant hopswhen they are not needed for load balance. On adversarial traffic, the flattened butterfly matches the cost/performance of a folded-Clos network and provides an order of magnitude better performance than a conventional butterfly.In this case, global adaptive routing is used to switchthe flattened butterfly from minimal to non-minimal routing - usingredundant hops only when they are needed. Minimal and non-minimal, oblivious and adaptive routing algorithms are evaluated on the flattened butterfly.We show that load-balancing adversarial traffic requires non-minimalglobally-adaptive routing and show that sequential allocators are required to avoid transient load imbalance when using adaptive routing algorithms.We also compare the cost of the flattened butterfly to folded-Clos, hypercube,and butterfly networks with identical capacityand show that the flattened butterfly is more cost-efficient thanfolded-Clos and hypercube topologies.

References

Amphenol. http://www.amphenol.com/.Google Scholar
L. N. Bhuyan and D. P. Agrawal. Generalized hypercube and hyperbus structures for a computer network. IEEE Trans. Computers, 33(4):323--333, 1984.Google ScholarDigital Library
K.-Y. K. Chang et al. A 0.4-4-Gb/s CMOS Quad Transceiver Cell Using On-Chip Regulated Dual-Loop PLLs. IEEE Journal of Solid--State Circuits, 38(5):747--754, 2003.Google Scholar
C. Clos. A Study of Non-Blocking Switching Networks. The Bell System technical Journal, 32(2):406--424, March 1953.Google Scholar
Cray XT3. http://www.cray.com/products/systems/xt3/.Google Scholar
W. J. Dally. Performance Analysis of k-ary n-cube Interconnection Networks. IEEE Transactions on Computers, 39(6):775--785, 1990. Google ScholarDigital Library
W. J. Dally. Virtual-channel Flow Control. IEEE Transactions on Parallel and Distributed Systems, 3(2):194--205, 1992. Google ScholarDigital Library
W. J. Dally, P. P. Carvey, and L. R. Dennison. The Avici Terabit Switch/Router. In Proc. of Hot Interconnects, pages 41--50, August 1998.Google Scholar
W. J. Dally and B. Towles. Principles and Practices of Interconnection Networks. Morgan Kaufmann, San Francisco, CA, 2004. Google ScholarDigital Library
H. G. Dietz and T.I.Mattox. Compiler techniques for flat neighborhood networks. In 13th International Workshop on Languages and Compilers for Parallel Computing, pages 420--431, Yorktown Heights, New York, 2000. Google ScholarDigital Library
Gore. http://www.gore.com/electronics.Google Scholar
E. J. Kim et al. Energy optimization techniques in cluster interconnects. In Proc. of the 2003 international symposium on Low power electronics and design, pages 459--464, 2003. Google ScholarDigital Library
J. Kim, W. J. Dally, and D. Abts. Adaptive Routing in High-radix Clos Network. In International Conference for High Performance Computing, Networking, Storage, and Analysis (SC'06), Tampa, FL, 2006. Google ScholarDigital Library
J. Kim, W. J. Dally, B. Towles, and A. K. Gupta. Microarchitecture of a high-radix router. In Proc. of the International Symposium on Computer Architecture (ISCA), pages 420--431, Madison, WI, 2005. Google ScholarDigital Library
C. P. Kruskal and M. Snir. The performance of multistage interconnection networks for multiprocessors. IEEE Trans. Computers, 32(12):1091--1098, 1983.Google ScholarDigital Library
R. E. Ladner and M. J. Fischer. Parallel prefix computation. J. ACM, 27(4):831--838, 1980. Google ScholarDigital Library
J. Laudon and D. Lenoski. The SGI Origin: A ccNUMA Highly Scalable Server. In Proc. of the 24th Annual Int'l Symp. on Computer Architecture, pages 241--251, 1997. Google ScholarDigital Library
C. Leiserson. Fat-trees: Universal networks for hardware efficient supercomputing. IEEE Transactions on Computer, C34(10):892--901, October 1985. Google ScholarDigital Library
Microprocessor Report. http://www.mdronline.com/.Google Scholar
C. S. Patel, S. M. Chai, S. Yalamanchili, and D. E. Schimmel. Power constrained design of multiprocessor interconnection networks. In International Conference on Computer Design (ICCD), pages 408--416, 1997. Google ScholarDigital Library
G. Pautsch. Thermal Challenges in the Next Generation of Supercomputers. CoolCon, 2005.Google Scholar
G. Pfister. An Introduction to the InfiniBand Arechitecture (http://www.infinibandta.org). IEEE Press, 2001.Google Scholar
S. Scott, D. Abts, J. Kim, and W. J. Dally. The BlackWidow High-radix Clos Network. In Proc. of the International Symposium on Computer Architecture (ISCA), Boston, MA, June 2006. Google ScholarDigital Library
S. Scott and G. Thorson. The Cray T3E Network: Adaptive Routing in a High Performance 3D Torus. In Hot Chips 4, Stanford, CA, Aug. 1996.Google Scholar
L. Shang, L.-S. Peh, and N. K. Jha. Dynamic voltage scaling with links for power optimization of interconnection networks. In International Symposium on High-Performance Computer Architecture (HPCA), page 91, 2003. Google ScholarDigital Library
H. J. Siegel. A model of simd machines and a comparison of various interconnection networks. IEEE Trans. Computers, 28(12):907--917, 1979.Google ScholarDigital Library
A. Singh. Load-Balanced Routing in Interconnection Networks. PhD thesis, Stanford University, 2005.Google Scholar
A. Singh, W. J. Dally, A. K. Gupta, and B. Towles. GOAL: A load-balanced adaptive routing algorithm for torus networks. In Proc. of the International Symposium on Computer Architecture (ISCA), pages 194--205, San Diego, CA, June 2003. Google ScholarDigital Library
V. Soteriou and L.-S. Peh. Design-space exploration of power-aware on/off interconnection networks. In International Conference on Computer Design (ICCD), pages 510--517, 2004. Google ScholarDigital Library
L. G. Valiant. A scheme for fast parallel communication. SIAM Journal on Computing, 11(2):350--361, 1982.Google ScholarDigital Library
H. Wang, L. S. Peh, and S. Malik. Power-driven Design of Router Microarchitectures in On-chip Networks. In Proc. of the 36th Annual IEEE/ACM Int'l Symposium on Microarchitecture, pages 105--116, 2003. Google ScholarDigital Library
K.-L. J. Wong, H. Hatamkhani, M. Mansuri, and C.-K. K. Yang. A 27-mW 3.6-Gb/s I/O Transceiver. IEEE Journal ofGoogle Scholar
S. Young and S. Yalamanchili. Adaptive routing in generalized hypercube architectures. In Proc. of the IEEE Symposium on Parallel and Distributed Processing, pages 564--571, Dallas, TX, Dec. 1991.Google ScholarDigital Library

Index Terms

Flattened butterfly: a cost-efficient topology for high-radix networks
1. Computer systems organization
  1. Architectures
    1. Parallel architectures
      1. Interconnection architectures
2. Hardware
  1. Communication hardware, interfaces and storage
    1. Buses and high-speed links

Recommendations

Technology-Driven, Highly-Scalable Dragonfly Topology

Evolving technology and increasing pin-bandwidth motivate the use of high-radix routers to reduce the diameter, latency, and cost of interconnection networks. High-radix networks, however, require longer cables than their low-radix counterparts. Because ...
Read More
Flattened butterfly: a cost-efficient topology for high-radix networks

Increasing integrated-circuit pin bandwidth has motivateda corresponding increase in the degree or radix of interconnection networksand their routers. This paper introduces the flattened butterfly, a cost-efficient topology for high-radix networks. On ...
Read More
Flattened Butterfly Topology for On-Chip Networks

With the trend towards increasing number of cores in a multicore processors, the on-chip network that connects the cores needs to scale efficiently. In this work, we propose the use of high-radix networks in on-chip networks and describe how the ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ISCA '07: Proceedings of the 34th annual international symposium on Computer architecture
June 2007
542 pages
ISBN:9781595937063
DOI:10.1145/1250662
General Chair:
Dean Tullsen
University of California, San Diego
,
Program Chair:
Brad Calder
Microsoft & University of California, San Diego
ACM SIGARCH Computer Architecture News Volume 35, Issue 2
May 2007
527 pages
ISSN:0163-5964
DOI:10.1145/1273440
Issue’s Table of Contents
Copyright © 2007 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 9 June 2007
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
cost model
flattened butterfly
global adaptive routing
high-radix routers
interconnection networks
topology
Qualifiers
- Article
Conference

Acceptance Rates
Overall Acceptance Rate543of3,203submissions,17%
Upcoming Conference
ISCA '24

Sponsor:

sigarch

ISCA '24: The 51st Annual International Symposium on Computer Architecture

June 29 - July 3, 2024

Buenos Aires , Argentina
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 357
  Total Citations
  View Citations
- 2,264
  Total Downloads
- Downloads (Last 12 months)148
- Downloads (Last 6 weeks)22
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Flattened butterfly: a cost-efficient topology for high-radix networks

ISCA '07: Proceedings of the 34th annual international symposium on Computer architecture

ABSTRACT

References

Cited By

Index Terms

Recommendations

Technology-Driven, Highly-Scalable Dragonfly Topology

Flattened butterfly: a cost-efficient topology for high-radix networks

Flattened Butterfly Topology for On-Chip Networks