ABSTRACT
Modern datacenter networks provide very high capacity via redundant Clos topologies and low switch latency, but transport protocols rarely deliver matching performance. We present NDP, a novel data-center transport architecture that achieves near-optimal completion times for short transfers and high flow throughput in a wide range of scenarios, including incast. NDP switch buffers are very shallow and when they fill the switches trim packets to headers and priority forward the headers. This gives receivers a full view of instantaneous demand from all senders, and is the basis for our novel, high-performance, multipath-aware transport protocol that can deal gracefully with massive incast events and prioritize traffic from different senders on RTT timescales. We implemented NDP in Linux hosts with DPDK, in a software switch, in a NetFPGA-based hardware switch, and in P4. We evaluate NDP's performance in our implementations and in large-scale simulations, simultaneously demonstrating support for very low-latency and high throughput.
Supplemental Material
- M. Al-Fares, A. Loukissas, and A. Vahdat. A scalable, commodity data center network architecture. In Proc. ACM SIGCOMM, Aug. 2010.Google Scholar
- M. Al-Fares, S. Radhakrishnan, B. Raghavan, N. Huang, and A. Vahdat. Hedera: Dynamic flow scheduling for data center networks. In Proc. Usenix NSDI, 2010.Google ScholarDigital Library
- M. Alizadeh, T. Edsall, S. Dharmapurikar, R. Vaidyanathan, K. Chu, A. Fingerhut, V. T. Lam, F. Matus, R. Pan, N. Yadav, and G. Varghese. CONGA: Distributed Congestion-aware Load Balancing for Datacenters. In Proc. ACM SIGCOMM 2014, pages 503--514. Google ScholarDigital Library
- M. Alizadeh, A. Greenberg, D. A. Maltz, J. Padhye, P. Patel, B. Prabhakar, S. Sengupta, and M. Sridharan. Data center TCP (DCTCP). In Proc. ACM SIGCOMM, Aug. 2010. Google ScholarDigital Library
- M. Alizadeh, A. Kabbani, T. Edsall, B. Prabhakar, A. Vahdat, and M. Yasuda. Less is more: trading a little bandwidth for ultra-low latency in the data center. In Proc. Usenix NSDI, pages 253--266, 2012.Google Scholar
- M. Alizadeh, S. Yang, M. Sharif, S. Katti, N. McKeown, B. Prabhakar, and S. Shenker. pFabric: Minimal near-optimal datacenter transport. In Proc. ACM SIGCOMM 2013. Google ScholarDigital Library
- T. Benson, A. Akella, and D. A. Maltz. Network traffic characteristics of data centers in the wild. In Proceedings of the 10th ACM SIGCOMM conference on Internet measurement, pages 267--280. ACM, 2010. Google ScholarDigital Library
- R. Braden. RFC 1644: T/TCP -- TCP extensions for transactions functional specification. Technical report, RFC Editor, July 1994.Google Scholar
- P. Cheng, F. Ren, R. Shu, and C. Lin. Catch the whole lot in an action: Rapid precise packet loss notification in data centers. In Proc. Usenix NSDI, 2014.Google ScholarDigital Library
- Y. Cheng, J. Chu, S. Radhakrishnan, and A. Jain. RFC 7413: TCP fast open. Technical report, RFC Editor, Dec. 2014.Google Scholar
- J. Chu, N. Dukkipati, Y. Cheng, and M. Mathis. RFC 6928: Increasing TCP's initial window. Technical report, RFC Editor, Apr. 2013.Google Scholar
- A. Dixit, P. Prakash, Y. Hu, and R. Kompella. On the impact of packet spraying in data center networks. In Proc. IEEE INFOCOM 2013, 2013. Google ScholarCross Ref
- DPDK Data Plane Development Kit. http://dpdk.org. Accessed: 2017-01-27.Google Scholar
- S. Floyd and V. Jacobson. Traffic phase effects in packet-switched gateways. SIGCOMM Comput. Commun. Rev., 21(2):26--42, Apr. 1991. Google ScholarDigital Library
- S. Floyd and J. Kempf. RFC 3714: IAB concerns regarding congestion control for voice traffic in the internet. Technical report, RFC Editor, Mar. 2004.Google Scholar
- P. X. Gao, A. Narayan, G. Kumar, R. Agarwal, S. Ratnasamy, and S. Shenker. pHost: Distributed Near-optimal Datacenter Transport Over Commodity Network Fabric. In Proc. ACM CoNEXT, 2015. Google ScholarDigital Library
- A. Greenberg el al. VL2: a scalable and flexible data center network. In Proc. ACM SIGCOMM, Aug. 2009.Google Scholar
- R. Griffith, Y. Chen, J. Liu, A. Joseph, and R. Katz. Understanding TCP incast throughput collapse in datacenter networks. In Proc. WREN Workshop, 2009.Google Scholar
- C. Guo, G. Lu, D. Li, H. Wu, X. Zhang, Y. Shi, C. Tian, Y. Zhang, and S. Lu. Bcube: A high performance, server-centric network architecture for modular data centers. In Proc. ACM SIGCOMM 2009. Google ScholarDigital Library
- C. Guo, H. Wu, Z. Deng, G. Soni, J. Ye, J. Padhye, and M. Lipshteyn. Rdma over commodity ethernet at scale. In Proc. ACM SIGCOMM 2016, pages 202--215. Google ScholarDigital Library
- K. He, E. Rozner, K. Agarwal, W. Felter, J. Carter, and A. Akella. Presto: Edge-based load balancing for fast datacenter networks. In Proc. ACM SIGCOMM 2015, pages 465--478. Google ScholarDigital Library
- C.-Y. Hong, M. Caesar, and P. B. Godfrey. Finishing flows quickly with preemptive scheduling. In Proc. ACM SIGCOMM 2012. Google ScholarDigital Library
- IEEE DCB. 802.3bd - MAC Control Frame for Priority-based Flow Control Project. http://www.ieee802.org/3/bd/, 2010. Superseding IEEE 802.3x Full Duplex and Flow Control.Google Scholar
- IEEE DCB. 802.1Qbb - Priority-based Flow Control. http://www.ieee802.org/1/pages/802.1bb.html, 2011.Google Scholar
- Infiniband Trade Association. RoCEv2. https://cw.infinibandta.org/document/dl/7781, Sept. 2014.Google Scholar
- V. Jacobson and M. J. Karels. Congestion avoidance and control. In Proc. ACM SIGCOMM, Stanford, CA, Aug. 1988. Google ScholarDigital Library
- C. Kent and J. Mogul. Fragmentation considered harmful. In Proc. ACM SIGCOMM, Aug. 1987. Google ScholarDigital Library
- R. Mittal, V. T. Lam, N. Dukkipati, E. Blem, H. Wassel, M. Ghobadi, A. Vahdat, Y. Wang, D. Wetherall, and D. Zats. Timely: Rtt-based congestion control for the datacenter. In Proce. ACM SIGCOMM 2015, pages 537--550.Google ScholarDigital Library
- The P4 Language Consortium. P416 language specification version 1.0.0. 2016.Google Scholar
- J. Perry, A. Ousterhout, H. Balakrishnan, D. Shah, and H. Fugal. Fastpass: A centralized "zero-queue" datacenter network. In Proc. ACM SIGCOMM 2014. Google ScholarDigital Library
- C. Raiciu, S. Barre, C. Pluntke, A. Greenhalgh, D. Wischik, and M. Handley. Improving datacenter performance and robustness with Multipath TCP. In Proc. ACM SIGCOMM, Aug. 2011. Google ScholarDigital Library
- K. Ramakrishnan, S. Floyd, and D. Black. RFC 3168: the addition of explicit congestion notification (ECN) to IP. Technical report, RFC Editor, Sept. 2001.Google Scholar
- A. Romanow and S. Floyd. Dynamics of TCP traffic over ATM networks. In Proc. ACM SIGCOMM, London, 1994. Google ScholarDigital Library
- A. Roy, H. Zeng, J. Bagga, G. Porter, and A. C. Snoeren. Inside the social network's (datacenter) network. In Proc. ACM SIGCOMM 2015, pages 123--137. Google ScholarDigital Library
- S. Sen, D. Shue, S. Ihm, and M. J. Freedman. Scalable, optimal flow routing in datacenters via local link balancing. In Proc. ACM CoNEXT 2013, pages 151--162. Google ScholarDigital Library
- A. Singla, C.-Y. Hong, L. Popa, and P. B. Godfrey. Jellyfish: Networking data centers randomly. In Proc. Usenix NSDI 2012.Google ScholarDigital Library
- B. Vamanan, J. Hasan, and T. Vijaykumar. Deadline-aware datacenter tcp (d2tcp). ACM SIGCOMM Computer Communication Review, 42(4):115--126, 2012. Google ScholarDigital Library
- V. Vasudevan, A. Phanishayee, H. Shah, E. Krevat, D. G. Andersen, G. R. Ganger, G. A. Gibson, and B. Mueller. Safe and effective fine-grained tcp retransmissions for datacenter communication. In Proc.ACM SIGCOMM 2009, pages 303--314. Google ScholarDigital Library
- C. Wilson, H. Ballani, T. Karagiannis, and A. Rowtron. Better never than late: Meeting deadlines in datacenter networks. In Proc. SIGCOMM '11, 2011. Google ScholarDigital Library
- Y. Zhu, H. Eran, D. Firestone, C. Guo, M. Lipshteyn, Y. Liron, J. Padhye, S. Raindel, M. H. Yahia, and M. Zhang. Congestion control for large-scale rdma deployments. In Proc. ACM SIGCOMM 2015, pages 523--536. Google ScholarDigital Library
- N. Zilberman, Y. Audzevich, G. A. Covington, and A. W. Moore. NetFPGA SUME: Toward 100 Gbps as research commodity. Micro, 34(5), 2014.Google Scholar
Index Terms
- Re-architecting datacenter networks and stacks for low latency and high performance
Recommendations
Homa: a receiver-driven low-latency transport protocol using network priorities
SIGCOMM '18: Proceedings of the 2018 Conference of the ACM Special Interest Group on Data CommunicationHoma is a new transport protocol for datacenter networks. It provides exceptionally low latency, especially for workloads with a high volume of very short messages, and it also supports large messages and high network utilization. Homa uses in-network ...
Enabling ECN for datacenter networks with RTT variations
CoNEXT '19: Proceedings of the 15th International Conference on Emerging Networking Experiments And TechnologiesECN has been widely employed in production datacenters to deliver high throughput low latency communications. Despite being successful, prior ECN-based transports have an important drawback: they adopt a fixed RTT value in calculating instantaneous ECN ...
Congestion-aware adaptive forwarding in datacenter networks
Datacenters employ the scale-out model to achieve scalability. This model requires parallelism in the underlying workload. Therefore, high bisection bandwidth is required to support intensive communications between servers. Several new datacenter ...
Comments