ABSTRACT
The rise of the programmable switching ASIC has allowed switches to handle the complexity and diversity of modern networking programs while meeting the performance demands of modern networks. Exploitation of the flexibility of these switches, however, has exploded routing program size: recently proposed programs contain more than 100 [11] or even 1000 [10] tables. Realizing these programs in a programmable switch requires finding layouts with minimal depth: if a layout has more match-action stages than a switch's pipeline provides, the switch must recirculate, cutting throughput. Even if a layout fits a switch's pipeline, since most commercial pipelines cannot allocate memory freely to stages, non-compact pipelines can result in underloaded stages and significant memory underutilization. While inter-table control and data dependencies critically limit the ability of compilers to lay out tables compactly, no switch architecture which can fully resolve dependencies has been proposed. To address this problem, we introduce precedence, an extension of the RMT switching ASIC, which enables tables linked by dependencies to be executed in parallel or even out-of-order. Precedence can resolve nearly 70% of switch.p4 [11]'s dependencies (a real-world routing program), reduce its pipeline depth by 48%, and only modestly increases silicon area.
- Broadcom Trident 3. {n. d.}. XPliant Ethernet Switch Product Family. https://www.broadcom.com/products/ethernet-connectivity/switching/strataxgs/bcm56870-series/. Accessed: 2018-11-15.Google Scholar
- Pat Bosshart, Dan Daly, Glen Gibb, Martin Izzard, Nick McKeown, Jennifer Rexford, Cole Schlesinger, Dan Talayco, Amin Vahdat, George Varghese, and David Walker. 2014. P4: Programming Protocol-independent Packet Processors. SIGCOMM Comput. Commun. Rev. 44, 3 (July 2014), 87--95. Google ScholarDigital Library
- Pat Bosshart, Glen Gibb, Hun-Seok Kim, George Varghese, Nick McKeown, Martin Izzard, Fernando Mujica, and Mark Horowitz. 2013. Forwarding metamorphosis: Fast programmable match-action processing in hardware for SDN. In ACM SIGCOMM Computer Communication Review, Vol. 43. ACM, 99--110. Google ScholarDigital Library
- Pat Bosshart, Glen Gibb, Hun-Seok Kim, George Varghese, Nick McKeown, Martin Izzard, Fernando Mujica, and Mark Horowitz. 2013. Forwarding Metamorphosis: Fast Programmable Match-action Processing in Hardware for SDN. In Proceedings of the ACM SIGCOMM 2013 Conference on SIGCOMM (SIGCOMM '13). ACM, New York, NY, USA, 99--110. Google ScholarDigital Library
- Cavium. {n. d.}. XPliant Ethernet Switch Product Family. https://www.cavium.com/xpliant-ethernet-switch-product-family.html. Accessed: 2018-11-15.Google Scholar
- Sharad Chole, Andy Fingerhut, Sha Ma, Anirudh Sivaraman, Shay Vargaftik, Alon Berger, Gal Mendelson, Mohammad Alizadeh, Shang-Tse Chuang, Isaac Keslassy, et al. 2017. drmt: Disaggregated programmable switching. In Proceedings of the Conference of the ACM Special Interest Group on Data Communication. ACM, 1--14. Google ScholarDigital Library
- Jinquan Dai, Bo Huang, Long Li, and Luddy Harrison. 2005. Automatically Partitioning Packet Processing Applications for Pipelined Architectures. SIGPLAN Not. 40, 6 (June 2005), 237--248. Google ScholarDigital Library
- G. Diamos and S. Yalamanchili. 2010. Speculative execution on multi-GPU systems. In 2010 IEEE International Symposium on Parallel Distributed Processing (IPDPS). 1--12.Google Scholar
- Lance Hammond, Mark Willey, and Kunle Olukotun. 1998. Data speculation support for a chip multiprocessor. ACM SIGOPS Operating Systems Review 32, 5 (1998), 58--69. Google ScholarDigital Library
- David Hancock and Jacobus van der Merwe. 2016. HyPer4: Using P4 to Virtualize the Programmable Data Plane. In Proceedings of the 12th International on Conference on Emerging Networking EXperiments and Technologies (CoNEXT '16). ACM, New York, NY, USA, 35--49. Google ScholarDigital Library
- Barefoot Inc. 2019. switch.p4. https://github.com/p4lang/switch/blob/master/p4src/switch.p4Google Scholar
- Intel. {n. d.}. Intel Ethernet Switch Silicon. https://www.intel.com/content/www/us/en/products/network-io/ethernet/switches.html. Accessed: 2018-11-15.Google Scholar
- Lavanya Jose, Lisa Yan, George Varghese, and Nick McKeown. 2015. Compiling Packet Programs to Reconfigurable Switches. In Proceedings of the 12th USENIX Conference on Networked Systems Design and Implementation (NSDI'15). USENIX Association, Berkeley, CA, USA, 103--115. http://dl.acm.org/citation.cfm?id=2789770.2789778 Google ScholarDigital Library
- Andrew B Kahng, Bill Lin, and Siddhartha Nath. 2012. Explicit modeling of control and data for improved NoC router estimation. In Design Automation Conference (DAC), 2012 49th ACM/EDAC/IEEE. IEEE, 392--397. Google ScholarDigital Library
- Shaoshan Liu, Christine Eisenbeis, and Jean-Luc Gaudiot. 2011. Value Prediction and Speculative Execution on GPU. International Journal of Parallel Programming 39, 5 (Oct. 2011), 533--552.Google ScholarCross Ref
- Scott A. Mahlke, David C. Lin, William Y. Chen, Richard E. Hank, and Roger A. Bringmann. 1992. Effective Compiler Support for Predicated Execution Using the Hyperblock. In Proceedings of the 25th Annual International Symposium on Microarchitecture (MICRO 25). IEEE Computer Society Press, Los Alamitos, CA, USA, 45--54. http://dl.acm.org/citation.cfm?id=144953.144998 Google ScholarDigital Library
- J. Menon, M. de Kruijf, and K. Sankaralingam. 2012. iGPU: Exception support and speculative execution on GPUs. In 2012 39th Annual International Symposium on Computer Architecture (ISCA). 72--83. Google ScholarDigital Library
- Rishiyur Nikhil. 2004. Bluespec System Verilog: efficient, correct RTL from high level specifications. In Formal Methods and Models for Co-Design, 2004. MEMOCODE'04. Proceedings. Second ACM and IEEE International Conference on. IEEE, 69--70. Google ScholarDigital Library
- Jeffrey T Oplinger, David L Heine, and Monica S Lam. 1999. In search of speculative thread-level parallelism. In Parallel Architectures and Compilation Techniques, 1999. Proceedings. 1999 International Conference on. IEEE, 303--313. Google ScholarDigital Library
- David A. Patterson and John L. Hennessy. 1990. Computer Architecture: A Quantitative Approach. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA. Google ScholarDigital Library
- B Ramakrishna Rau and Joseph A Fisher. 1993. Instruction-level parallel processing: history, overview, and perspective. In Instruction-Level Parallelism. Springer, 9--50. Google ScholarDigital Library
- Anirudh Sivaraman, Alvin Cheung, Mihai Budiu, Changhoon Kim, Mohammad Alizadeh, Hari Balakrishnan, George Varghese, Nick McKeown, and Steve Licking. 2016. Packet transactions: High-level programming for line-rate switches. In Proceedings of the 2016 ACM SIGCOMM Conference. ACM, 15--28. Google ScholarDigital Library
- Anirudh Sivaraman, Changhoon Kim, Ramkumar Krishnamoorthy, Advait Dixit, and Mihai Budiu. 2015. Dc. p4: Programming the forwarding plane of a datacenter switch. In Proceedings of the 1st ACM SIGCOMM Symposium on Software Defined Networking Research. ACM, 2. Google ScholarDigital Library
- Donald Thomas and Philip Moorby. 2008. The Verilog® Hardware Description Language. Springer Science & Business Media.Google Scholar
Index Terms
- Precedence: Enabling Compact Program Layout By Table Dependency Resolution
Recommendations
Boosting Memory Performance of Many-Core FPGA Device through Dynamic Precedence Graph
FCCM '13: Proceedings of the 2013 IEEE 21st Annual International Symposium on Field-Programmable Custom Computing MachinesEmerging FPGA device, integrated with abundant RAM blocks and high-performance processor cores, offers an unprecedented opportunity to effectively implement singlechip distributed logic-memory (DLM) architectures [1]. Being "memory-centric", the DLM ...
Outer-loop vectorization: revisited for short SIMD architectures
PACT '08: Proceedings of the 17th international conference on Parallel architectures and compilation techniquesVectorization has been an important method of using data-level parallelism to accelerate scientific workloads on vector machines such as Cray for the past three decades. In the last decade it has also proven useful for accelerating multi-media and ...
Comments