skip to main content
10.1145/3314148.3314348acmconferencesArticle/Chapter ViewAbstractPublication PagescommConference Proceedingsconference-collections
research-article

Precedence: Enabling Compact Program Layout By Table Dependency Resolution

Published:03 April 2019Publication History

ABSTRACT

The rise of the programmable switching ASIC has allowed switches to handle the complexity and diversity of modern networking programs while meeting the performance demands of modern networks. Exploitation of the flexibility of these switches, however, has exploded routing program size: recently proposed programs contain more than 100 [11] or even 1000 [10] tables. Realizing these programs in a programmable switch requires finding layouts with minimal depth: if a layout has more match-action stages than a switch's pipeline provides, the switch must recirculate, cutting throughput. Even if a layout fits a switch's pipeline, since most commercial pipelines cannot allocate memory freely to stages, non-compact pipelines can result in underloaded stages and significant memory underutilization. While inter-table control and data dependencies critically limit the ability of compilers to lay out tables compactly, no switch architecture which can fully resolve dependencies has been proposed. To address this problem, we introduce precedence, an extension of the RMT switching ASIC, which enables tables linked by dependencies to be executed in parallel or even out-of-order. Precedence can resolve nearly 70% of switch.p4 [11]'s dependencies (a real-world routing program), reduce its pipeline depth by 48%, and only modestly increases silicon area.

References

  1. Broadcom Trident 3. {n. d.}. XPliant Ethernet Switch Product Family. https://www.broadcom.com/products/ethernet-connectivity/switching/strataxgs/bcm56870-series/. Accessed: 2018-11-15.Google ScholarGoogle Scholar
  2. Pat Bosshart, Dan Daly, Glen Gibb, Martin Izzard, Nick McKeown, Jennifer Rexford, Cole Schlesinger, Dan Talayco, Amin Vahdat, George Varghese, and David Walker. 2014. P4: Programming Protocol-independent Packet Processors. SIGCOMM Comput. Commun. Rev. 44, 3 (July 2014), 87--95. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Pat Bosshart, Glen Gibb, Hun-Seok Kim, George Varghese, Nick McKeown, Martin Izzard, Fernando Mujica, and Mark Horowitz. 2013. Forwarding metamorphosis: Fast programmable match-action processing in hardware for SDN. In ACM SIGCOMM Computer Communication Review, Vol. 43. ACM, 99--110. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Pat Bosshart, Glen Gibb, Hun-Seok Kim, George Varghese, Nick McKeown, Martin Izzard, Fernando Mujica, and Mark Horowitz. 2013. Forwarding Metamorphosis: Fast Programmable Match-action Processing in Hardware for SDN. In Proceedings of the ACM SIGCOMM 2013 Conference on SIGCOMM (SIGCOMM '13). ACM, New York, NY, USA, 99--110. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Cavium. {n. d.}. XPliant Ethernet Switch Product Family. https://www.cavium.com/xpliant-ethernet-switch-product-family.html. Accessed: 2018-11-15.Google ScholarGoogle Scholar
  6. Sharad Chole, Andy Fingerhut, Sha Ma, Anirudh Sivaraman, Shay Vargaftik, Alon Berger, Gal Mendelson, Mohammad Alizadeh, Shang-Tse Chuang, Isaac Keslassy, et al. 2017. drmt: Disaggregated programmable switching. In Proceedings of the Conference of the ACM Special Interest Group on Data Communication. ACM, 1--14. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Jinquan Dai, Bo Huang, Long Li, and Luddy Harrison. 2005. Automatically Partitioning Packet Processing Applications for Pipelined Architectures. SIGPLAN Not. 40, 6 (June 2005), 237--248. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. G. Diamos and S. Yalamanchili. 2010. Speculative execution on multi-GPU systems. In 2010 IEEE International Symposium on Parallel Distributed Processing (IPDPS). 1--12.Google ScholarGoogle Scholar
  9. Lance Hammond, Mark Willey, and Kunle Olukotun. 1998. Data speculation support for a chip multiprocessor. ACM SIGOPS Operating Systems Review 32, 5 (1998), 58--69. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. David Hancock and Jacobus van der Merwe. 2016. HyPer4: Using P4 to Virtualize the Programmable Data Plane. In Proceedings of the 12th International on Conference on Emerging Networking EXperiments and Technologies (CoNEXT '16). ACM, New York, NY, USA, 35--49. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Barefoot Inc. 2019. switch.p4. https://github.com/p4lang/switch/blob/master/p4src/switch.p4Google ScholarGoogle Scholar
  12. Intel. {n. d.}. Intel Ethernet Switch Silicon. https://www.intel.com/content/www/us/en/products/network-io/ethernet/switches.html. Accessed: 2018-11-15.Google ScholarGoogle Scholar
  13. Lavanya Jose, Lisa Yan, George Varghese, and Nick McKeown. 2015. Compiling Packet Programs to Reconfigurable Switches. In Proceedings of the 12th USENIX Conference on Networked Systems Design and Implementation (NSDI'15). USENIX Association, Berkeley, CA, USA, 103--115. http://dl.acm.org/citation.cfm?id=2789770.2789778 Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Andrew B Kahng, Bill Lin, and Siddhartha Nath. 2012. Explicit modeling of control and data for improved NoC router estimation. In Design Automation Conference (DAC), 2012 49th ACM/EDAC/IEEE. IEEE, 392--397. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Shaoshan Liu, Christine Eisenbeis, and Jean-Luc Gaudiot. 2011. Value Prediction and Speculative Execution on GPU. International Journal of Parallel Programming 39, 5 (Oct. 2011), 533--552.Google ScholarGoogle ScholarCross RefCross Ref
  16. Scott A. Mahlke, David C. Lin, William Y. Chen, Richard E. Hank, and Roger A. Bringmann. 1992. Effective Compiler Support for Predicated Execution Using the Hyperblock. In Proceedings of the 25th Annual International Symposium on Microarchitecture (MICRO 25). IEEE Computer Society Press, Los Alamitos, CA, USA, 45--54. http://dl.acm.org/citation.cfm?id=144953.144998 Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. J. Menon, M. de Kruijf, and K. Sankaralingam. 2012. iGPU: Exception support and speculative execution on GPUs. In 2012 39th Annual International Symposium on Computer Architecture (ISCA). 72--83. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Rishiyur Nikhil. 2004. Bluespec System Verilog: efficient, correct RTL from high level specifications. In Formal Methods and Models for Co-Design, 2004. MEMOCODE'04. Proceedings. Second ACM and IEEE International Conference on. IEEE, 69--70. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Jeffrey T Oplinger, David L Heine, and Monica S Lam. 1999. In search of speculative thread-level parallelism. In Parallel Architectures and Compilation Techniques, 1999. Proceedings. 1999 International Conference on. IEEE, 303--313. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. David A. Patterson and John L. Hennessy. 1990. Computer Architecture: A Quantitative Approach. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. B Ramakrishna Rau and Joseph A Fisher. 1993. Instruction-level parallel processing: history, overview, and perspective. In Instruction-Level Parallelism. Springer, 9--50. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Anirudh Sivaraman, Alvin Cheung, Mihai Budiu, Changhoon Kim, Mohammad Alizadeh, Hari Balakrishnan, George Varghese, Nick McKeown, and Steve Licking. 2016. Packet transactions: High-level programming for line-rate switches. In Proceedings of the 2016 ACM SIGCOMM Conference. ACM, 15--28. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Anirudh Sivaraman, Changhoon Kim, Ramkumar Krishnamoorthy, Advait Dixit, and Mihai Budiu. 2015. Dc. p4: Programming the forwarding plane of a datacenter switch. In Proceedings of the 1st ACM SIGCOMM Symposium on Software Defined Networking Research. ACM, 2. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Donald Thomas and Philip Moorby. 2008. The Verilog® Hardware Description Language. Springer Science & Business Media.Google ScholarGoogle Scholar

Index Terms

  1. Precedence: Enabling Compact Program Layout By Table Dependency Resolution

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      SOSR '19: Proceedings of the 2019 ACM Symposium on SDN Research
      April 2019
      166 pages
      ISBN:9781450367103
      DOI:10.1145/3314148

      Copyright © 2019 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 3 April 2019

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed limited

      Acceptance Rates

      Overall Acceptance Rate7of43submissions,16%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader