ABSTRACT
As datacenter network speeds rise, an increasing fraction of server CPU cycles is consumed by TCP packet processing, in particular for remote procedure calls (RPCs). To free server CPUs from this burden, various existing approaches have attempted to mitigate these overheads, by bypassing the OS kernel, customizing the TCP stack for an application, or by offloading packet processing to dedicated hardware. In doing so, these approaches trade security, agility, or generality for efficiency. Neither trade-off is fully desirable in the fast-evolving commodity cloud.
We present TAS, TCP acceleration as a service. TAS splits the common case of TCP processing for RPCs in the datacenter from the OS kernel and executes it as a fastpath OS service on dedicated CPUs. Doing so allows us to streamline the common case, while still supporting all of the features of a stock TCP stack, including security, agility, and generality. In particular, we remove code and data of less common cases from the fastpath, improving performance on the wide, deeply pipelined CPU architecture common in today's servers. To be workload proportional, TAS dynamically allocates the appropriate amount of CPUs to accommodate the fastpath, depending on the traffic load. TAS provides up to 90% higher throughput and 57% lower tail latency than the IX kernel bypass OS for common cloud applications, such as a key-value store and a realtime analytics framework. TAS also scales to more TCP connections, providing 2.2x higher throughput than IX with 64K connections.
- {n. d.}. https://github.com/torvalds/linux/blob/master/net/ipv4/tcp_input.c#L5302.Google Scholar
- {n. d.}. https://support.microsoft.com/en-us/help/951037/information-about-the-tcp-chimney-offload-receive-side-scaling-and-net.Google Scholar
- {n. d.}. Intel Data Plane Development Kit. http://www.dpdk.org/.Google Scholar
- {n. d.}. http://memcached.org/.Google Scholar
- {n. d.}. http://redis.io/.Google Scholar
- Mohammad Alizadeh, Albert Greenberg, David A. Maltz, Jitendra Padhye, Parveen Patel, Balaji Prabhakar, Sudipta Sengupta, and Murari Sridharan. 2010. Data Center TCP (DCTCP). In 2010 ACM Conference on SIGCOMM (SIGCOMM). 12. Google ScholarDigital Library
- Brian W. Barrett, Ron Brightwell, Scott Hemmert, Kevin Pedretti, Kyle Wheeler, Keith Underwood, Rolf Riesen, Arthur B. Maccabee, and Trammell Hudson. 2013. The Portals 4.0.1 Network Programming Interface (sand2013-3181 ed.). Sandia National Laboratories.Google Scholar
- Andrew Baumann, Paul Barham, Pierre-Evariste Dagand, Tim Harris, Rebecca Isaacs, Simon Peter, Timothy Roscoe, Adrian Schüpbach, and Akhilesh Singhania. 2009. The Multikernel: A New OS Architecture for Scalable Multicore Systems. In 16th ACM Symposium on Operating Systems Principles (SOSP). 16. Google ScholarDigital Library
- Adam Belay, George Prekas, Ana Klimovic, Samuel Grossman, Christos Kozyrakis, and Edouard Bugnion. 2014. IX: A Protected Dataplane Operating System for High Throughput and Low Latency. In 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI). 17. http://dl.acm.org/citation.cfm?id=2685048.2685053 Google ScholarDigital Library
- Nathan L. Binkert, Ali G. Saidi, and Steven K. Reinhardt. 2006. Integrated Network Interfaces for High-bandwidth TCP/IP. In 12th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS). Google ScholarDigital Library
- Neal Cardwell, Yuchung Cheng, C. Stephen Gunn, Soheil Hassas Yeganeh, and Van Jacobson. 2016. BBR: Congestion-Based Congestion Control. ACM Queue 14, 5, Article 50 (Oct. 2016), 34 pages. Google ScholarDigital Library
- Chelsio Communications. 2013. TCP Offload at 40Gbps. http://www.chelsio.com/wp-content/uploads/2013/09/TOE-Technical-Brief.pdf.Google Scholar
- Andy Currid. 2004. TCP Offload to the Rescue. ACM Queue 2, 3 (June 2004). Google ScholarDigital Library
- Peter Druschel, Larry Peterson, and Bruce Davie. 1994. Experiences with a High-Speed Network Adaptor: A Software Perspective. In 1994 ACM Conference on SIGCOMM (SIGCOMM). Google ScholarDigital Library
- Daniel Firestone, Andrew Putnam, Sambhrama Mundkur, Derek Chiou, Alireza Dabagh, Mike Andrewartha, Hari Angepat, Vivek Bhanu, Adrian Caulfield, Eric Chung, Harish Kumar Chandrappa, Somesh Chaturmohta, Matt Humphrey, Jack Lavier, Norman Lam, Fengfen Liu, Kalin Ovtcharov, Jitu Padhye, Gautham Popuri, Shachar Raindel, Tejas Sapre, Mark Shaw, Gabriel Silva, Madhan Sivakumar, Nisheeth Srivastava, Anshuman Verma, Qasim Zuhair, Deepak Bansal, Doug Burger, Kushagra Vaid, David A. Maltz, and Albert Greenberg. 2018. Azure Accelerated Networking: SmartNICs in the Public Cloud. In 15th USENIX Symposium on Networked Systems Design and Implementation (NSDI 18). USENIX Association, Renton, WA, 51--66. https://www.usenix.org/conference/nsdi18/presentation/firestone Google ScholarDigital Library
- Mario Flajslik and Mendel Rosenblum. 2013. Network Interface Design for Low Latency Request-response Protocols. In 2013 USENIX Annual Technical Conference (ATC). 14. http://dl.acm.org/citation.cfm7id=2535461.2535502 Google ScholarDigital Library
- Albert Greenberg, James R. Hamilton, Navendu Jain, Srikanth Kandula, Changhoon Kim, Parantap Lahiri, David A. Maltz, Parveen Patel, and Sudipta Sengupta. 2009. VL2: a scalable and flexible data center network. In 2009 ACM Conference on SIGCOMM (SIGCOMM). Google ScholarDigital Library
- R. Hamilton, J. Iyengar, I. Swett, and A. Wilk. 2016. QUIC: A UDP-Based Secure and Reliable Transport for HTTP/2. https://tools.ietf.org/html/draft-tsvwg-quic-protocol-02.Google Scholar
- Sangjin Han, Scott Marshall, Byung-Gon Chun, and Sylvia Ratnasamy. 2012. MegaPipe: A New Programming Interface for Scalable Network I/O. In 10th USENIX Symposium on Operating Systems Design and Implementation (OSDI). 14. http://dl.acm.org/citation.cfm?id=2387880.2387894 Google ScholarDigital Library
- Mark Handley, Costin Raiciu, Alexandru Agache, Andrei Voinescu, Andrew W. Moore, Gianni Antichi, and Marcin Wójcik. 2017. Re-architecting Datacenter Networks and Stacks for Low Latency and High Performance. In Proceedings of the Conference of the ACM Special Interest Group on Data Communication (SIGCOMM '17). ACM, New York, NY, USA, 29--42. Google ScholarDigital Library
- Van Jacobson. {n. d.}. TCP in 30 instructions. http://www.pdl.cmu.edu/mailinglists/ips/mail/msg00133.html.Google Scholar
- V. Jacobson. 1988. Congestion Avoidance and Control. SIGCOMM Computer Communication Review 18, 4 (Aug. 1988), 314--329. Google ScholarDigital Library
- Virajith Jalaparti, Peter Bodik, Srikanth Kandula, Ishai Menache, Mikhail Rybalkin, and Chenyu Yan. 2013. Speeding Up Distributed Request-response Workflows. In 2013 ACM Conference on SIGCOMM (SIGCOMM). Google ScholarDigital Library
- Eun Young Jeong, Shinae Woo, Muhammad Jamshed, Haewon Jeong, Sunghwan Ihm, Dongsu Han, and KyoungSoo Park. 2014. mTCP: A Highly Scalable User-level TCP Stack for Multicore Systems. In 11th USENIX Symposium on Networked Systems Design and Implementation (NSDI). 14. http://dl.acm.org/citation.cfm?id=2616448.2616493 Google ScholarDigital Library
- Antoine Kaufmann, SImon Peter, Naveen Kr. Sharma, Thomas Anderson, and Arvind Krishnamurthy. 2016. High Performance Packet Processing with FlexNIC. In 21st International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS). 15. Google ScholarDigital Library
- Xiaofeng Lin, Yu Chen, Xiaodong Li, Junjie Mao, Jiaquan He, Wei Xu, and Yuanchun Shi. 2016. Scalable Kernel TCP Design and Implementation for Short-Lived Connections. In 21st International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS). 14. Google ScholarDigital Library
- Ilias Marinos, Robert N.M. Watson, and Mark Handley. 2014. Network Stack Specialization for Performance. In 2014 ACM Conference on SIGCOMM (SIGCOMM). 12. Google ScholarDigital Library
- Radhika Mittal, Vinh The Lam, Nandita Dukkipati, Emily Blem, Hassan Wassel, Monia Ghobadi, Amin Vahdat, Yaogong Wang, David Wetherall, and David Zats. 2015. TIMELY: RTT-based Congestion Control for the Datacenter. In 2015 ACM Conference on SIGCOMM (SIGCOMM). 14. Google ScholarDigital Library
- Akshay Narayan, Frank J. Cangialosi, Prateesh Goyal, Srinivas Narayana, Mohammad Alizadeh, and Hari Balakrishnan. 2017. The Case for Moving Congestion Control Out of the Datapath. In Sixteenth ACM Workshop on Hot Topics in Networks (HotNets). Palo Alto, CA. Google ScholarDigital Library
- Rajesh Nishtala, Hans Fugal, Steven Grimm, Marc Kwiatkowski, Herman Lee, Harry C. Li, Ryan McElroy, Mike Paleczny, Daniel Peek, Paul Saab, David Stafford, Tony Tung, and Venkateshwaran Venkataramani. 2013. Scaling Memcache at Facebook. In 10th USENIX Symposium on Networked Systems Design and Implementation (NSDI). 14. http://dl.acm.org/citation.cfm?id=2482626.2482663 Google ScholarDigital Library
- Zhixiong Niu, Hong Xu, Dongsu Han, Peng Cheng, Yongqiang Xiong, Guo Chen, and Keith Winstein. 2017. Network Stack As a Service in the Cloud. In Proceedings of the 16th ACM Workshop on Hot Topics in Networks (HotNets-XVI). ACM, New York, NY, USA, 65--71. Google ScholarDigital Library
- Stanko Novakovic, Alexandros Daglis, Edouard Bugnion, Babak Falsafi, and Boris Grot. 2014. Scale-out NUMA. In 19th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS). Google ScholarDigital Library
- Aleksey Pesterev, Jacob Strauss, Nickolai Zeldovich, and Robert T. Morris. 2012. Improving Network Connection Locality on Multicore Systems. In 7th ACM European Conference on Computer Systems (EuroSys). 14. Google ScholarDigital Library
- Simon Peter, Jialin Li, Irene Zhang, Dan R. K. Ports, Doug Woos, Arvind Krishnamurthy, Thomas Anderson, and Timothy Roscoe. 2015. Arrakis: The Operating System Is the Control Plane. ACM Transactions on Computer Systems 33, 4, Article 11 (Nov. 2015), 30 pages. Google ScholarDigital Library
- Ian Pratt and Keir Fraser. 2001. Arsenic: A User-Accessible Gigabit Ethernet Interface. In 20th IEEE International Conference on Computer Communications (INFOCOM).Google Scholar
- George Prekas, Marios Kogias, and Edouard Bugnion. 2017. ZygOS: Achieving Low Tail Latency for Microsecond-scale Networked Tasks. In Proceedings of the 26th Symposium on Operating Systems Principles (SOSP '17). 17. Google ScholarDigital Library
- Sivasankar Radhakrishnan, Yilong Geng, Vimalkumar Jeyakumar, Abdul Kabbani, George Porter, and Amin Vahdat. 2014. SENIC: Scalable NIC for End-Host Rate Limiting. In 11th USENIX Symposium on Networked Systems Design and Implementation (NSDI). https://www.usenix.org/conference/nsdi14/technical-sessions/presentation/radhakrishnan Google ScholarDigital Library
- RDMA Consortium. {n. d.}. Architectural specifications for RDMA over TCP/IP. http://www.rdmaconsortium.org/.Google Scholar
- Rick Reed. 2012. Scaling to Millions of Simultaneous Connections. http://www.erlang-factory.com/upload/presentations/558/efsf2012-whatsapp-scaling.pdf.Google Scholar
- Greg Regnier, Srihari Makineni, Ramesh Illikkal, Ravi Iyer, Dave Minturn, Ram Huggahalli, Don Newell, Linda Cline, and Annie Foong. 2004. TCP Onloading for Data Center Servers. Computer 37, 11 (Nov. 2004), 48--58. Google ScholarDigital Library
- Mihai Rotaru. 2013. Scaling to 12 Million Concurrent Connections: How MigratoryData Did It. https://mrotaru.wordpress.com/2013/10/10/scaling-to-12-million-concurrent-connections-how-migratorydata-did-it/.Google Scholar
- Stefan Savage, Neal Cardwell, David Wetherall, and Tom Anderson. 1999. TCP Congestion Control with a Misbehaving Receiver. SIGCOMM Computer Communication Review 29, 5 (Oct. 1999), 71--78. Google ScholarDigital Library
- Arjun Singh, Joon Ong, Amit Agarwal, Glen Anderson, Ashby Armistead, Roy Bannon, Seb Boving, Gaurav Desai, Bob Felderman, Paulie Germano, Anand Kanagala, Jeff Provost, Jason Simmons, Eiichi Tanda, Jim Wanderer, Urs Hölzle, Stephen Stuart, and Amin Vahdat. 2015. Jupiter Rising: A Decade of Clos Topologies and Centralized Control in Google's Datacenter Network. In 2015 ACM Conference on SIGCOMM (SIGCOMM). Google ScholarDigital Library
- Livio Soares and Michael Stumm. 2010. FlexSC: Flexible System Call Scheduling with Exception-less System Calls. In Proceedings of the 9th USENIX Conference on Operating Systems Design and Implementation (OSDI'10). USENIX Association, Berkeley, CA, USA, 33--46. http://dl.acm.org/citation.cfm?id=1924943.1924946 Google ScholarDigital Library
- T. von Eicken, A. Basu, V. Buch, and W. Vogels. 1995. U-Net: a user-level network interface for parallel and distributed computing. In 15th ACM Symposium on Operating Systems Principles (SOSP). Google ScholarDigital Library
- Ahmad Yasin. 2014. A Top-Down method for performance analysis and counters architecture. In ISPASS. IEEE Computer Society, 35--44.Google Scholar
- Kenichi Yasukata, Michio Honda, Douglas Santry, and Lars Eggert. 2016. StackMap: Low-latency Networking with the OS Stack and Dedicated NICs. In Proceedings of the 2016 USENIX Conference on Usenix Annual Technical Conference (USENIX ATC '16). USENIX Association, Berkeley, CA, USA, 43--56. http://dl.acm.org/citation.cfm?id=3026959.3026964 Google ScholarDigital Library
- Yibo Zhu, Haggai Eran, Daniel Firestone, Chuanxiong Guo, Marina Lipshteyn, Yehonatan Liron, Jitendra Padhye, Shachar Raindel, Mohamad Haj Yahia, and Ming Zhang. 2015. Congestion Control for Large-Scale RDMA Deployments. In 2015 ACM Conference on SIGCOMM (SIGCOMM). 14. Google ScholarDigital Library
- TAS: TCP Acceleration as an OS Service
Comments