skip to main content
10.1145/1185347.1185350acmconferencesArticle/Chapter ViewAbstractPublication PagesancsConference Proceedingsconference-collections
Article

Towards an efficient switch architecture for high-radix switches

Published: 03 December 2006 Publication History

Abstract

The interconnection network plays a key role in the overall performance achieved by high performance computing systems, also contributing an increasing fraction of its cost and power consumption. Current trends in interconnection network technology suggest that high-radix switches will be preferred as networks will become smaller (in terms of switch count) with the associated savings in packet latency, cost, and power consumption. Unfortunately, current switch architectures have scalability problems that prevent them from being effective when implemented with a high number of ports.In this paper, an efficient and cost-effective architecture for high-radix switches is proposed. The architecture, referred to as Partitioned Crossbar Input Queued (PCIQ), relies on three key components: a partitioned crossbar organization that allows the use of simple arbiters and crossbars, a packet based arbiter, and a mechanism to eliminate the switch-level HOL blocking.Under uniform traffic, maximum switch efficiency is achieved. Furthermore, switch-level HOL blocking is completely eliminated under hot-spot traffic, again delivering maximum throughput. Additionally, PCIQ inherently implements an efficient congestion management technique that eliminates all the network-wide HOL blocking. On the contrary, the previously proposed architectures either show poor performance or they require significantly higher costs than PCIQ (in both components and complexity).

References

[1]
Advanced switching for the PCI express architecture - white paper. Available at http://www.intel.com/technology/pciexpress/de/AdvancedSwitching.pdf.
[2]
Infiniband™ trade association. http://www.infinibandta.com.
[3]
Myrinet, 2000 series networking. Available at http://www.cspi.com/multicomputer/products/2000_series_networking/2000_networking.htm.
[4]
Quadrics qsnet. Available at http://doc.quadrics.com.
[5]
T. Anderson, S. Owicki, J. Saxe, and C. Thacker. High-speed switch scheduling for local-area networks. ACM Trans. on Computer Systems, 11(4):319--352, Nov. 1993.
[6]
W. J. Dally. Virtual-channel flow control. Proceedings of the 17th annual International Symposium on Computer Architecture, pages 60--68, 1990.
[7]
W. J. Dally and C. L. Seitz. The torus routing chip. Distributed Computing, 1(4):187--196, 1986.
[8]
J. Duato and et al. Implementation of recn on a switch without egress queues. Research Report DISCA/0068-2006, Universidad Politecnica de Valencia, 2006, available at www.disca.upv.es.
[9]
J. Duato, I. Johnson, J. Flich, F. Naven, P. Garcia, and T. Nachiondo. A new scalable and cost-effective congestion management strategy for lossless multistage interconnection networks. Proceedings of the 11th International Symposium on High-Performance Computer Architecture, pages 108--119, Feb. 2005.
[10]
P. Garcia, J. Flich, J. Duato, I. Johnson, F. Quiles, and F. Naven. Dynamic evolution of congestion trees: Analysis and impact on switch architecture. Proceedings of the 2005 International Conference on High Performance Embedded Architectures and Compilers, Nov. 2005.
[11]
K. Johguchi, Z. Zhu, K. Aoyama, Y. Mukuda, H. J. Mattausch, T. Koide, and T. Hironaka. Unified data/instruction cache with distributed crossbar, hidden precharge pipeline and dynamic cmos logic. Fourth Hiroshima International Workshop on Nanoelectronics for Tera-Bit Information Processing, 2005.
[12]
M. Karol and M. Hluchyj. Queuing in highperformance packet-switching. IEEE J. Select. Areas. Commun, 6:1587--1597, Dec. 1998.
[13]
M. J. Karol and et al. Input versus output queueing on a space-division packet switch. IEEE Transactions on Communications, COM-35(12):1347--1356, 1987.
[14]
J. Kim, W. J. Dally, B. Towles, and A. K. Gupta. Microarchitecture of a high-radix router. 32nd Annual International Symposium on Computer Architecture (ISCA '05), pages 420--431, 2005.
[15]
G. Kornaros, C. Kozyrakis, P. Vatsolaki, and M. Katevenis. Pipelined multi-queue management in a vlsi atm switch chip with credit-based flow-control. 17th Conference on Advanced Research in VLSI (ARVLSI '97), 1997.
[16]
H. J. Mattausch. Hierarchical n-port memory architecture based on 1-port memory cells. Solid-State Circuits Conference, 1997. ESSCIRC '97. Proceedings of the 23rd European, pages 348--351, 1997.
[17]
N. McKeown, M. Izzard, A. Mekkittikul, W. Ellersick, and M. Horowitz. The tiny tera: A packet switch core. IEEE Micro, 17:27--33, Jan./Feb. 1997.
[18]
R. Rojas-Cessa, E. Oki, and H. Chao. Cixob-k: Combined input-crosspoint-output buffered packet switch. Proceedings of the IEEE Global Telecomunications Conference, 2001.
[19]
S. L. Scott and G. Thorson. The Cray T3E network: Adaptive routing in a high performance 3D torus. Proceedings of Hot Interconnects Symposium IV, August 1996.
[20]
E. S. Shin, V. J. M. III, and G. F. Riley. Round-robin arbiter design and generation. Proceedings of the 15th International Symposium on System Synthesis, 2002.
[21]
Y. Tamir and G. L. Frazier. High-performance multi-queue buffers for vlsi communications switches. SIGARCH Comput. Archit. News, 16(2):343--354, 1988.
[22]
I. B. M. Team. An overview of Bluegene/L supercomputer. In ACM Supercomputing Conference, Nov. 2002.
[23]
Z. Zhu, K. Johguchi, H. Mattausch, T. Koide, T. Hirakawa, and T. Hironaka. A novel hierarchical multi-port cache. Solid-State Circuits Conference, 2003. ESSCIRC '03. Proceedings of the 29th European, pages 405--408, 2003.

Cited By

View all
  • (2023)RETRACTED CHAPTER: Overview of Router Architecture in High Performance ComputingProceedings of the 8th International Conference on Financial Innovation and Economic Development (ICFIED 2023)10.2991/978-94-6463-142-5_57(493-506)Online publication date: 14-May-2023
  • (2016)Scalable High-Radix Modular Crossbar Switches2016 IEEE 24th Annual Symposium on High-Performance Interconnects (HOTI)10.1109/HOTI.2016.019(37-44)Online publication date: Aug-2016
  • (2015)SCOC: High-radix switches made of bufferless clos networks2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA)10.1109/HPCA.2015.7056050(402-414)Online publication date: Feb-2015
  • Show More Cited By

Index Terms

  1. Towards an efficient switch architecture for high-radix switches

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      ANCS '06: Proceedings of the 2006 ACM/IEEE symposium on Architecture for networking and communications systems
      December 2006
      202 pages
      ISBN:1595935800
      DOI:10.1145/1185347
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 03 December 2006

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. arbiter efficiency
      2. partitioned crossbar
      3. switch organization

      Qualifiers

      • Article

      Conference

      ANCS06

      Acceptance Rates

      Overall Acceptance Rate 88 of 314 submissions, 28%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)11
      • Downloads (Last 6 weeks)4
      Reflects downloads up to 07 Mar 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2023)RETRACTED CHAPTER: Overview of Router Architecture in High Performance ComputingProceedings of the 8th International Conference on Financial Innovation and Economic Development (ICFIED 2023)10.2991/978-94-6463-142-5_57(493-506)Online publication date: 14-May-2023
      • (2016)Scalable High-Radix Modular Crossbar Switches2016 IEEE 24th Annual Symposium on High-Performance Interconnects (HOTI)10.1109/HOTI.2016.019(37-44)Online publication date: Aug-2016
      • (2015)SCOC: High-radix switches made of bufferless clos networks2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA)10.1109/HPCA.2015.7056050(402-414)Online publication date: Feb-2015
      • (2015)Optimizing the configuration of combined high-radix switchesThe Journal of Supercomputing10.1007/s11227-015-1408-x71:7(2614-2643)Online publication date: 1-Jul-2015
      • (2015)A Highly-Efficient Crossbar Allocator Architecture for High-Radix SwitchComputer Engineering and Technology10.1007/978-3-662-45815-0_5(48-58)Online publication date: 2015
      • (2014)Formalization and configuration methodology for high-radix combined switchesThe Journal of Supercomputing10.1007/s11227-014-1223-969:3(1410-1444)Online publication date: 1-Sep-2014
      • (2013)Scalable high-radix router microarchitecture using a network switch organizationACM Transactions on Architecture and Code Optimization10.1145/251243310:3(1-25)Online publication date: 16-Sep-2013
      • (2013)Extending the Energy Efficiency and Performance With Channel Buffers, Crossbars, and Topology Analysis for Network-on-ChipsIEEE Transactions on Very Large Scale Integration (VLSI) Systems10.1109/TVLSI.2012.222728321:11(2141-2154)Online publication date: 1-Nov-2013
      • (2013)Obtaining the optimal configuration of high-radix Combined switchesJournal of Parallel and Distributed Computing10.1016/j.jpdc.2013.04.00973:9(1239-1250)Online publication date: 1-Sep-2013
      • (2012)Crossbar NoCs Are Scalable Beyond 100 NodesIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2011.217673031:4(573-585)Online publication date: 1-Apr-2012
      • Show More Cited By

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media