research-article

Precedence: Enabling Compact Program Layout By Table Dependency Resolution

Authors:
Christopher Leet

Yale University

Yale University
View Profile

,
Shenshen Chen

Yale University and Tongji University

Yale University and Tongji University
View Profile

,
Kai Gao

Sichuan University

Sichuan University
View Profile

,
Yang Richard Yang

Yale University and Tongji University

Yale University and Tongji University
View Profile

SOSR '19: Proceedings of the 2019 ACM Symposium on SDN ResearchApril 2019Pages 1–7https://doi.org/10.1145/3314148.3314348

Published:03 April 2019Publication History

SOSR '19: Proceedings of the 2019 ACM Symposium on SDN Research

Pages 1–7

ABSTRACT

The rise of the programmable switching ASIC has allowed switches to handle the complexity and diversity of modern networking programs while meeting the performance demands of modern networks. Exploitation of the flexibility of these switches, however, has exploded routing program size: recently proposed programs contain more than 100 [11] or even 1000 [10] tables. Realizing these programs in a programmable switch requires finding layouts with minimal depth: if a layout has more match-action stages than a switch's pipeline provides, the switch must recirculate, cutting throughput. Even if a layout fits a switch's pipeline, since most commercial pipelines cannot allocate memory freely to stages, non-compact pipelines can result in underloaded stages and significant memory underutilization. While inter-table control and data dependencies critically limit the ability of compilers to lay out tables compactly, no switch architecture which can fully resolve dependencies has been proposed. To address this problem, we introduce precedence, an extension of the RMT switching ASIC, which enables tables linked by dependencies to be executed in parallel or even out-of-order. Precedence can resolve nearly 70% of switch.p4 [11]'s dependencies (a real-world routing program), reduce its pipeline depth by 48%, and only modestly increases silicon area.

References

Broadcom Trident 3. {n. d.}. XPliant Ethernet Switch Product Family. https://www.broadcom.com/products/ethernet-connectivity/switching/strataxgs/bcm56870-series/. Accessed: 2018-11-15.Google Scholar
Pat Bosshart, Dan Daly, Glen Gibb, Martin Izzard, Nick McKeown, Jennifer Rexford, Cole Schlesinger, Dan Talayco, Amin Vahdat, George Varghese, and David Walker. 2014. P4: Programming Protocol-independent Packet Processors. SIGCOMM Comput. Commun. Rev. 44, 3 (July 2014), 87--95. Google ScholarDigital Library
Pat Bosshart, Glen Gibb, Hun-Seok Kim, George Varghese, Nick McKeown, Martin Izzard, Fernando Mujica, and Mark Horowitz. 2013. Forwarding metamorphosis: Fast programmable match-action processing in hardware for SDN. In ACM SIGCOMM Computer Communication Review, Vol. 43. ACM, 99--110. Google ScholarDigital Library
Pat Bosshart, Glen Gibb, Hun-Seok Kim, George Varghese, Nick McKeown, Martin Izzard, Fernando Mujica, and Mark Horowitz. 2013. Forwarding Metamorphosis: Fast Programmable Match-action Processing in Hardware for SDN. In Proceedings of the ACM SIGCOMM 2013 Conference on SIGCOMM (SIGCOMM '13). ACM, New York, NY, USA, 99--110. Google ScholarDigital Library
Cavium. {n. d.}. XPliant Ethernet Switch Product Family. https://www.cavium.com/xpliant-ethernet-switch-product-family.html. Accessed: 2018-11-15.Google Scholar
Sharad Chole, Andy Fingerhut, Sha Ma, Anirudh Sivaraman, Shay Vargaftik, Alon Berger, Gal Mendelson, Mohammad Alizadeh, Shang-Tse Chuang, Isaac Keslassy, et al. 2017. drmt: Disaggregated programmable switching. In Proceedings of the Conference of the ACM Special Interest Group on Data Communication. ACM, 1--14. Google ScholarDigital Library
Jinquan Dai, Bo Huang, Long Li, and Luddy Harrison. 2005. Automatically Partitioning Packet Processing Applications for Pipelined Architectures. SIGPLAN Not. 40, 6 (June 2005), 237--248. Google ScholarDigital Library
G. Diamos and S. Yalamanchili. 2010. Speculative execution on multi-GPU systems. In 2010 IEEE International Symposium on Parallel Distributed Processing (IPDPS). 1--12.Google Scholar
Lance Hammond, Mark Willey, and Kunle Olukotun. 1998. Data speculation support for a chip multiprocessor. ACM SIGOPS Operating Systems Review 32, 5 (1998), 58--69. Google ScholarDigital Library
David Hancock and Jacobus van der Merwe. 2016. HyPer4: Using P4 to Virtualize the Programmable Data Plane. In Proceedings of the 12th International on Conference on Emerging Networking EXperiments and Technologies (CoNEXT '16). ACM, New York, NY, USA, 35--49. Google ScholarDigital Library
Barefoot Inc. 2019. switch.p4. https://github.com/p4lang/switch/blob/master/p4src/switch.p4Google Scholar
Intel. {n. d.}. Intel Ethernet Switch Silicon. https://www.intel.com/content/www/us/en/products/network-io/ethernet/switches.html. Accessed: 2018-11-15.Google Scholar
Lavanya Jose, Lisa Yan, George Varghese, and Nick McKeown. 2015. Compiling Packet Programs to Reconfigurable Switches. In Proceedings of the 12th USENIX Conference on Networked Systems Design and Implementation (NSDI'15). USENIX Association, Berkeley, CA, USA, 103--115. http://dl.acm.org/citation.cfm?id=2789770.2789778 Google ScholarDigital Library
Andrew B Kahng, Bill Lin, and Siddhartha Nath. 2012. Explicit modeling of control and data for improved NoC router estimation. In Design Automation Conference (DAC), 2012 49th ACM/EDAC/IEEE. IEEE, 392--397. Google ScholarDigital Library
Shaoshan Liu, Christine Eisenbeis, and Jean-Luc Gaudiot. 2011. Value Prediction and Speculative Execution on GPU. International Journal of Parallel Programming 39, 5 (Oct. 2011), 533--552.Google ScholarCross Ref
Scott A. Mahlke, David C. Lin, William Y. Chen, Richard E. Hank, and Roger A. Bringmann. 1992. Effective Compiler Support for Predicated Execution Using the Hyperblock. In Proceedings of the 25th Annual International Symposium on Microarchitecture (MICRO 25). IEEE Computer Society Press, Los Alamitos, CA, USA, 45--54. http://dl.acm.org/citation.cfm?id=144953.144998 Google ScholarDigital Library
J. Menon, M. de Kruijf, and K. Sankaralingam. 2012. iGPU: Exception support and speculative execution on GPUs. In 2012 39th Annual International Symposium on Computer Architecture (ISCA). 72--83. Google ScholarDigital Library
Rishiyur Nikhil. 2004. Bluespec System Verilog: efficient, correct RTL from high level specifications. In Formal Methods and Models for Co-Design, 2004. MEMOCODE'04. Proceedings. Second ACM and IEEE International Conference on. IEEE, 69--70. Google ScholarDigital Library
Jeffrey T Oplinger, David L Heine, and Monica S Lam. 1999. In search of speculative thread-level parallelism. In Parallel Architectures and Compilation Techniques, 1999. Proceedings. 1999 International Conference on. IEEE, 303--313. Google ScholarDigital Library
David A. Patterson and John L. Hennessy. 1990. Computer Architecture: A Quantitative Approach. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA. Google ScholarDigital Library
B Ramakrishna Rau and Joseph A Fisher. 1993. Instruction-level parallel processing: history, overview, and perspective. In Instruction-Level Parallelism. Springer, 9--50. Google ScholarDigital Library
Anirudh Sivaraman, Alvin Cheung, Mihai Budiu, Changhoon Kim, Mohammad Alizadeh, Hari Balakrishnan, George Varghese, Nick McKeown, and Steve Licking. 2016. Packet transactions: High-level programming for line-rate switches. In Proceedings of the 2016 ACM SIGCOMM Conference. ACM, 15--28. Google ScholarDigital Library
Anirudh Sivaraman, Changhoon Kim, Ramkumar Krishnamoorthy, Advait Dixit, and Mihai Budiu. 2015. Dc. p4: Programming the forwarding plane of a datacenter switch. In Proceedings of the 1st ACM SIGCOMM Symposium on Software Defined Networking Research. ACM, 2. Google ScholarDigital Library
Donald Thomas and Philip Moorby. 2008. The Verilog® Hardware Description Language. Springer Science & Business Media.Google Scholar

Index Terms

Precedence: Enabling Compact Program Layout By Table Dependency Resolution
1. Networks
  1. Network components
    1. Intermediate nodes
      1. Routers

Recommendations

Knowledge, Timed Precedence and Clocks
Read More
Boosting Memory Performance of Many-Core FPGA Device through Dynamic Precedence Graph
FCCM '13: Proceedings of the 2013 IEEE 21st Annual International Symposium on Field-Programmable Custom Computing Machines

Emerging FPGA device, integrated with abundant RAM blocks and high-performance processor cores, offers an unprecedented opportunity to effectively implement singlechip distributed logic-memory (DLM) architectures [1]. Being "memory-centric", the DLM ...
Read More
Outer-loop vectorization: revisited for short SIMD architectures
PACT '08: Proceedings of the 17th international conference on Parallel architectures and compilation techniques

Vectorization has been an important method of using data-level parallelism to accelerate scientific workloads on vector machines such as Cray for the past three decades. In the last decade it has also proven useful for accelerating multi-media and ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

SOSR '19: Proceedings of the 2019 ACM Symposium on SDN Research
April 2019
166 pages
ISBN:9781450367103
DOI:10.1145/3314148

Copyright © 2019 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 3 April 2019
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- research-article
- Research
- Refereed limited
Conference

Acceptance Rates
Overall Acceptance Rate7of43submissions,16%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 1
  Total Citations
  View Citations
- 233
  Total Downloads
- Downloads (Last 12 months)27
- Downloads (Last 6 weeks)2
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Precedence: Enabling Compact Program Layout By Table Dependency Resolution

SOSR '19: Proceedings of the 2019 ACM Symposium on SDN Research

ABSTRACT

References

Cited By

Index Terms

Recommendations

Knowledge, Timed Precedence and Clocks

Boosting Memory Performance of Many-Core FPGA Device through Dynamic Precedence Graph

Outer-loop vectorization: revisited for short SIMD architectures

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Precedence: Enabling Compact Program Layout By Table Dependency Resolution

SOSR '19: Proceedings of the 2019 ACM Symposium on SDN Research

ABSTRACT

References

Cited By

Index Terms

Recommendations

Knowledge, Timed Precedence and Clocks

Boosting Memory Performance of Many-Core FPGA Device through Dynamic Precedence Graph

Outer-loop vectorization: revisited for short SIMD architectures

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media