ABSTRACT
Increasing integrated-circuit pin bandwidth has motivateda corresponding increase in the degree or radix of interconnection networksand their routers. This paper introduces the flattened butterfly, a cost-efficient topology for high-radix networks. On benign (load-balanced) traffic, the flattened butterfly approaches the cost/performance of a butterfly network and has roughly half the cost of a comparable performance Clos network.The advantage over the Clos is achieved by eliminating redundant hopswhen they are not needed for load balance. On adversarial traffic, the flattened butterfly matches the cost/performance of a folded-Clos network and provides an order of magnitude better performance than a conventional butterfly.In this case, global adaptive routing is used to switchthe flattened butterfly from minimal to non-minimal routing - usingredundant hops only when they are needed. Minimal and non-minimal, oblivious and adaptive routing algorithms are evaluated on the flattened butterfly.We show that load-balancing adversarial traffic requires non-minimalglobally-adaptive routing and show that sequential allocators are required to avoid transient load imbalance when using adaptive routing algorithms.We also compare the cost of the flattened butterfly to folded-Clos, hypercube,and butterfly networks with identical capacityand show that the flattened butterfly is more cost-efficient thanfolded-Clos and hypercube topologies.
- Amphenol. http://www.amphenol.com/.Google Scholar
- L. N. Bhuyan and D. P. Agrawal. Generalized hypercube and hyperbus structures for a computer network. IEEE Trans. Computers, 33(4):323--333, 1984.Google ScholarDigital Library
- K.-Y. K. Chang et al. A 0.4-4-Gb/s CMOS Quad Transceiver Cell Using On-Chip Regulated Dual-Loop PLLs. IEEE Journal of Solid--State Circuits, 38(5):747--754, 2003.Google Scholar
- C. Clos. A Study of Non-Blocking Switching Networks. The Bell System technical Journal, 32(2):406--424, March 1953.Google Scholar
- Cray XT3. http://www.cray.com/products/systems/xt3/.Google Scholar
- W. J. Dally. Performance Analysis of k-ary n-cube Interconnection Networks. IEEE Transactions on Computers, 39(6):775--785, 1990. Google ScholarDigital Library
- W. J. Dally. Virtual-channel Flow Control. IEEE Transactions on Parallel and Distributed Systems, 3(2):194--205, 1992. Google ScholarDigital Library
- W. J. Dally, P. P. Carvey, and L. R. Dennison. The Avici Terabit Switch/Router. In Proc. of Hot Interconnects, pages 41--50, August 1998.Google Scholar
- W. J. Dally and B. Towles. Principles and Practices of Interconnection Networks. Morgan Kaufmann, San Francisco, CA, 2004. Google ScholarDigital Library
- H. G. Dietz and T.I.Mattox. Compiler techniques for flat neighborhood networks. In 13th International Workshop on Languages and Compilers for Parallel Computing, pages 420--431, Yorktown Heights, New York, 2000. Google ScholarDigital Library
- Gore. http://www.gore.com/electronics.Google Scholar
- E. J. Kim et al. Energy optimization techniques in cluster interconnects. In Proc. of the 2003 international symposium on Low power electronics and design, pages 459--464, 2003. Google ScholarDigital Library
- J. Kim, W. J. Dally, and D. Abts. Adaptive Routing in High-radix Clos Network. In International Conference for High Performance Computing, Networking, Storage, and Analysis (SC'06), Tampa, FL, 2006. Google ScholarDigital Library
- J. Kim, W. J. Dally, B. Towles, and A. K. Gupta. Microarchitecture of a high-radix router. In Proc. of the International Symposium on Computer Architecture (ISCA), pages 420--431, Madison, WI, 2005. Google ScholarDigital Library
- C. P. Kruskal and M. Snir. The performance of multistage interconnection networks for multiprocessors. IEEE Trans. Computers, 32(12):1091--1098, 1983.Google ScholarDigital Library
- R. E. Ladner and M. J. Fischer. Parallel prefix computation. J. ACM, 27(4):831--838, 1980. Google ScholarDigital Library
- J. Laudon and D. Lenoski. The SGI Origin: A ccNUMA Highly Scalable Server. In Proc. of the 24th Annual Int'l Symp. on Computer Architecture, pages 241--251, 1997. Google ScholarDigital Library
- C. Leiserson. Fat-trees: Universal networks for hardware efficient supercomputing. IEEE Transactions on Computer, C34(10):892--901, October 1985. Google ScholarDigital Library
- Microprocessor Report. http://www.mdronline.com/.Google Scholar
- C. S. Patel, S. M. Chai, S. Yalamanchili, and D. E. Schimmel. Power constrained design of multiprocessor interconnection networks. In International Conference on Computer Design (ICCD), pages 408--416, 1997. Google ScholarDigital Library
- G. Pautsch. Thermal Challenges in the Next Generation of Supercomputers. CoolCon, 2005.Google Scholar
- G. Pfister. An Introduction to the InfiniBand Arechitecture (http://www.infinibandta.org). IEEE Press, 2001.Google Scholar
- S. Scott, D. Abts, J. Kim, and W. J. Dally. The BlackWidow High-radix Clos Network. In Proc. of the International Symposium on Computer Architecture (ISCA), Boston, MA, June 2006. Google ScholarDigital Library
- S. Scott and G. Thorson. The Cray T3E Network: Adaptive Routing in a High Performance 3D Torus. In Hot Chips 4, Stanford, CA, Aug. 1996.Google Scholar
- L. Shang, L.-S. Peh, and N. K. Jha. Dynamic voltage scaling with links for power optimization of interconnection networks. In International Symposium on High-Performance Computer Architecture (HPCA), page 91, 2003. Google ScholarDigital Library
- H. J. Siegel. A model of simd machines and a comparison of various interconnection networks. IEEE Trans. Computers, 28(12):907--917, 1979.Google ScholarDigital Library
- A. Singh. Load-Balanced Routing in Interconnection Networks. PhD thesis, Stanford University, 2005.Google Scholar
- A. Singh, W. J. Dally, A. K. Gupta, and B. Towles. GOAL: A load-balanced adaptive routing algorithm for torus networks. In Proc. of the International Symposium on Computer Architecture (ISCA), pages 194--205, San Diego, CA, June 2003. Google ScholarDigital Library
- V. Soteriou and L.-S. Peh. Design-space exploration of power-aware on/off interconnection networks. In International Conference on Computer Design (ICCD), pages 510--517, 2004. Google ScholarDigital Library
- L. G. Valiant. A scheme for fast parallel communication. SIAM Journal on Computing, 11(2):350--361, 1982.Google ScholarDigital Library
- H. Wang, L. S. Peh, and S. Malik. Power-driven Design of Router Microarchitectures in On-chip Networks. In Proc. of the 36th Annual IEEE/ACM Int'l Symposium on Microarchitecture, pages 105--116, 2003. Google ScholarDigital Library
- K.-L. J. Wong, H. Hatamkhani, M. Mansuri, and C.-K. K. Yang. A 27-mW 3.6-Gb/s I/O Transceiver. IEEE Journal ofGoogle Scholar
- S. Young and S. Yalamanchili. Adaptive routing in generalized hypercube architectures. In Proc. of the IEEE Symposium on Parallel and Distributed Processing, pages 564--571, Dallas, TX, Dec. 1991.Google ScholarDigital Library
Index Terms
- Flattened butterfly: a cost-efficient topology for high-radix networks
Recommendations
Technology-Driven, Highly-Scalable Dragonfly Topology
Evolving technology and increasing pin-bandwidth motivate the use of high-radix routers to reduce the diameter, latency, and cost of interconnection networks. High-radix networks, however, require longer cables than their low-radix counterparts. Because ...
Flattened butterfly: a cost-efficient topology for high-radix networks
Increasing integrated-circuit pin bandwidth has motivateda corresponding increase in the degree or radix of interconnection networksand their routers. This paper introduces the flattened butterfly, a cost-efficient topology for high-radix networks. On ...
Flattened Butterfly Topology for On-Chip Networks
With the trend towards increasing number of cores in a multicore processors, the on-chip network that connects the cores needs to scale efficiently. In this work, we propose the use of high-radix networks in on-chip networks and describe how the ...
Comments