skip to main content
10.1145/2486001.2486031acmconferencesArticle/Chapter ViewAbstractPublication PagescommConference Proceedingsconference-collections
research-article
Free access

pFabric: minimal near-optimal datacenter transport

Published: 27 August 2013 Publication History

Abstract

In this paper we present pFabric, a minimalistic datacenter transport design that provides near theoretically optimal flow completion times even at the 99th percentile for short flows, while still minimizing average flow completion time for long flows. Moreover, pFabric delivers this performance with a very simple design that is based on a key conceptual insight: datacenter transport should decouple flow scheduling from rate control. For flow scheduling, packets carry a single priority number set independently by each flow; switches have very small buffers and implement a very simple priority-based scheduling/dropping mechanism. Rate control is also correspondingly simpler; flows start at line rate and throttle back only under high and persistent packet loss. We provide theoretical intuition and show via extensive simulations that the combination of these two simple mechanisms is sufficient to provide near-optimal performance.

References

[1]
M. Al-Fares, A. Loukissas, and A. Vahdat. A scalable, commodity data center network architecture. In Proc. of SIGCOMM, 2008.
[2]
M. Al-Fares, S. Radhakrishnan, B. Raghavan, N. Huang, and A. Vahdat. Hedera: dynamic flow scheduling for data center networks. In Proc. of NSDI, 2010.
[3]
M. Alizadeh, A. Greenberg, D. A. Maltz, J. Padhye, P. Patel, B. Prabhakar, S. Sengupta, and M. Sridharan. Data center TCP (DCTCP). In Proc. of SIGCOMM, 2010.
[4]
M. Alizadeh, A. Kabbani, T. Edsall, B. Prabhakar, A. Vahdat, and M. Yasuda. Less is more: trading a little bandwidth for ultra-low latency in the data center. In Proc. of NSDI, 2012.
[5]
M. Alizadeh, S. Yang, S. Katti, N. McKeown, B. Prabhakar, and S. Shenker. Deconstructing datacenter packet transport. In Proc. of HotNets, 2012.
[6]
M. Alizadeh, S. Yang, M. Sharif, S. Katti, N. McKeown, B. Prabhakar, and S. Shenker. pFabric: Minimal Near-Optimal Datacenter Transport. http://simula.stanford.edu/ alizade/pfabric-techreport.pdf.
[7]
B. Atikoglu, Y. Xu, E. Frachtenberg, S. Jiang, and M. Paleczny. Workload analysis of a large-scale key-value store. In Proc. of SIGMETRICS, 2012.
[8]
N. Bansal and M. Harchol-Balter. Analysis of SRPT scheduling: investigating unfairness. In Proc. of SIGMETRICS, 2001.
[9]
A. Bar-Noy, M. M. Halldórsson, G. Kortsarz, R. Salman, and H. Shachnai. Sum multicoloring of graphs. J. Algorithms, 2000.
[10]
T. Bonald and L. Massoulié. Impact of fairness on Internet performance. In Proc. of SIGMETRICS, 2001.
[11]
A. Dixit, P. Prakash, Y. C. Hu, and R. R. Kompella. On the Impact of Packet Spraying in Data Center Networks. In Proc. of INFOCOM, 2013.
[12]
A. Greenberg, J. R. Hamilton, N. Jain, S. Kandula, C. Kim, P. Lahiri, D. A. Maltz, P. Patel, and S. Sengupta. VL2: a scalable and flexible data center network. In Proc. of SIGCOMM, 2009.
[13]
D. Gross, J. F. Shortle, J. M. Thompson, and C. M. Harris. Fundamentals of Queueing Theory. Wiley-Interscience, New York, NY, USA, 4th edition, 2008.
[14]
C.-Y. Hong, M. Caesar, and P. B. Godfrey. Finishing Flows Quickly with Preemptive Scheduling. In Proc. of SIGCOMM, 2012.
[15]
The Network Simulator NS-2. http://www.isi.edu/nsnam/ns/.
[16]
J. Ousterhout, P. Agrawal, D. Erickson, C. Kozyrakis, J. Leverich, D. Mazières, S. Mitra, A. Narayanan, D. Ongaro, G. Parulkar, M. Rosenblum, S. M. Rumble, E. Stratmann, and R. Stutsman. The case for RAMCloud. Commun. ACM, 2011.
[17]
C. Raiciu, S. Barre, C. Pluntke, A. Greenhalgh, D. Wischik, and M. Handley. Improving datacenter performance and robustness with multipath TCP. In Proc. of the SIGCOMM, 2011.
[18]
B. Vamanan, J. Hasan, and T. N. Vijaykumar. Deadline-Aware Datacenter TCP (D2TCP). In Proc. of SIGCOMM, 2012.
[19]
V. Vasudevan, A. Phanishayee, H. Shah, E. Krevat, D. G. Andersen, G. R. Ganger, G. A. Gibson, and B. Mueller. Safe and effective fine-grained TCP retransmissions for datacenter communication. In Proc. of SIGCOMM, 2009.
[20]
M. Verloop, S. Borst, and R. Núnez Queija. Stability of size-based scheduling disciplines in resource-sharing networks. Perform. Eval., 62(1--4), 2005.
[21]
C. Wilson, H. Ballani, T. Karagiannis, and A. Rowtron. Better never than late: meeting deadlines in datacenter networks. In Proc. of SIGCOMM, 2011.
[22]
D. Zats, T. Das, P. Mohan, D. Borthakur, and R. H. Katz. DeTail: Reducing the Flow Completion Time Tail in Datacenter Networks. In Proc. of SIGCOMM, 2012.

Cited By

View all
  • (2025)EDM: An Ultra-Low Latency Ethernet Fabric for Memory DisaggregationProceedings of the 30th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 110.1145/3669940.3707221(377-394)Online publication date: 3-Feb-2025
  • (2025)Enabling Rank-Based P4 Programmable Schedulers: Requirements, Implementation, and Evaluation on BMv2 SwitchesIEEE Transactions on Networking10.1109/TNET.2024.348115233:1(299-310)Online publication date: Feb-2025
  • (2025)GraphCC: A practical graph learning-based approach to Congestion Control in datacentersComputer Networks10.1016/j.comnet.2024.110981257(110981)Online publication date: Feb-2025
  • Show More Cited By

Index Terms

  1. pFabric: minimal near-optimal datacenter transport

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SIGCOMM '13: Proceedings of the ACM SIGCOMM 2013 conference on SIGCOMM
    August 2013
    580 pages
    ISBN:9781450320566
    DOI:10.1145/2486001
    • cover image ACM SIGCOMM Computer Communication Review
      ACM SIGCOMM Computer Communication Review  Volume 43, Issue 4
      October 2013
      595 pages
      ISSN:0146-4833
      DOI:10.1145/2534169
      Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 27 August 2013

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. datacenter network
    2. flow scheduling
    3. packet transport

    Qualifiers

    • Research-article

    Conference

    SIGCOMM'13
    Sponsor:
    SIGCOMM'13: ACM SIGCOMM 2013 Conference
    August 12 - 16, 2013
    Hong Kong, China

    Acceptance Rates

    SIGCOMM '13 Paper Acceptance Rate 38 of 246 submissions, 15%;
    Overall Acceptance Rate 462 of 3,389 submissions, 14%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)447
    • Downloads (Last 6 weeks)46
    Reflects downloads up to 20 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2025)EDM: An Ultra-Low Latency Ethernet Fabric for Memory DisaggregationProceedings of the 30th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 110.1145/3669940.3707221(377-394)Online publication date: 3-Feb-2025
    • (2025)Enabling Rank-Based P4 Programmable Schedulers: Requirements, Implementation, and Evaluation on BMv2 SwitchesIEEE Transactions on Networking10.1109/TNET.2024.348115233:1(299-310)Online publication date: Feb-2025
    • (2025)GraphCC: A practical graph learning-based approach to Congestion Control in datacentersComputer Networks10.1016/j.comnet.2024.110981257(110981)Online publication date: Feb-2025
    • (2024)ReverieProceedings of the 21st USENIX Symposium on Networked Systems Design and Implementation10.5555/3691825.3691861(651-668)Online publication date: 16-Apr-2024
    • (2024)CredenceProceedings of the 21st USENIX Symposium on Networked Systems Design and Implementation10.5555/3691825.3691859(613-634)Online publication date: 16-Apr-2024
    • (2024)BBQProceedings of the 21st USENIX Symposium on Networked Systems Design and Implementation10.5555/3691825.3691851(455-475)Online publication date: 16-Apr-2024
    • (2024)SifterProceedings of the 21st USENIX Symposium on Networked Systems Design and Implementation10.5555/3691825.3691830(75-94)Online publication date: 16-Apr-2024
    • (2024)MLTCP: A Distributed Technique to Approximate Centralized Flow Scheduling For Machine LearningProceedings of the 23rd ACM Workshop on Hot Topics in Networks10.1145/3696348.3696878(167-176)Online publication date: 18-Nov-2024
    • (2024)vPIFO: Virtualized Packet Scheduler for Programmable Hierarchical Scheduling in High-Speed NetworksProceedings of the ACM SIGCOMM 2024 Conference10.1145/3651890.3672270(983-999)Online publication date: 4-Aug-2024
    • (2024)PPT: A Pragmatic Transport for DatacentersProceedings of the ACM SIGCOMM 2024 Conference10.1145/3651890.3672235(954-969)Online publication date: 4-Aug-2024
    • Show More Cited By

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Login options

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media