skip to main content
10.1145/3293883.3295705acmconferencesArticle/Chapter ViewAbstractPublication PagesppoppConference Proceedingsconference-collections

QTLS: high-performance TLS asynchronous offload framework with Intel® QuickAssist technology

Published:16 February 2019Publication History

ABSTRACT

Hardware accelerators are a promising solution to optimize the Total Cost of Ownership (TCO) of cloud datacenters. This paper targets the costly Transport Layer Security (TLS) and investigates the TLS acceleration for the widely-deployed event-driven TLS servers or terminators. Our study reveals an important fact: the straight offloading of TLS-involved crypto operations suffers from the frequent long-lasting blockings in the offload I/O, leading to the underutilization of both CPU and accelerator resources.

To achieve efficient TLS acceleration for the event-driven web architecture, we propose QTLS, a high-performance TLS asynchronous offload framework based on Intel® QuickAssist Technology (QAT). QTLS re-engineers the TLS software stack and divides the TLS offloading into four phases to eliminate blockings. Then, multiple crypto operations from different TLS connections can be offloaded concurrently in one process/thread, bringing a performance boost. Moreover, QTLS is built with a heuristic polling scheme to retrieve accelerator responses efficiently and timely, and a kernel-bypass notification scheme to avoid expensive switches between user mode and kernel mode while delivering async events. The comprehensive evaluation shows that QTLS can provide up to 9x connections per second (CPS) with TLS-RSA (2048bit), 2x secure data transfer throughput and 85% reduction of average response time compared to the software baseline.

References

  1. Daniel J. Bernstein and Tanja Lange. 2017. SafeCurves: choosing safe curves for elliptic-curve cryptography. Retrieved November 30, 2018 from https://safecurves.cr.yp.to/Google ScholarGoogle Scholar
  2. Dan Boneh and Hovav Shacham. 2002. Fast variants of RSA. Crypto-Bytes 5, 1 (2002), 1--9.Google ScholarGoogle Scholar
  3. Ran Canetti, Shai Halevi, and Jonathan Katz. 2003. A Forward-Secure Public-Key Encryption Scheme. In Proceedings of the International Conference on the Theory and Applications of Cryptographic Techniques (EUROCRYPT). 255--271. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Claude Castelluccia, Einar Mykletun, and Gene Tsudik. 2006. Improving secure server performance by re-balancing SSL/TLS handshakes. In Proceedings of the ACM Symposium on Information, computer and communications security (ASIACCS). 26--34. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Adrian M. Caulfield, Eric S. Chung, Andrew Putnam, Hari Angepat, Jeremy Fowers, Michael Haselman, Stephen Heil, Matt Humphrey, Puneet Kaur, Joo-Young Kim, Daniel Lo, Todd Massengill, Kalin Ovtcharov, Michael Papamichael, Lisa Woods, Sitaram Lanka, Derek Chiou, and Doug Burger. 2016. A Cloud-Scale Acceleration Architecture. In Proceedings of the 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). 7:1--7:13. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Cristian Coarfa, Peter Druschel, and Dan S Wallach. 2006. Performance Analysis of TLS Web servers. ACM Transactions on Computer Systems (TOCS) 24, 1 (2006), 39--69. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. HAProxy Community. 2018. HAProxy - The Reliable, High Performance TCP/HTTP Load Balancer. Retrieved November 30, 2018 from http://www.haproxy.org/Google ScholarGoogle Scholar
  8. Squid Community. 2018. Squid: Optimising Web Delivery. Retrieved November 30, 2018 from http://www.squid-cache.org/Google ScholarGoogle Scholar
  9. Cas Cremers, Marko Horvat, Jonathan Hoyland, Sam Scott, and Thyla van der Merwe. 2017. A Comprehensive Symbolic Analysis of TLS 1.3. In Proceedings of the ACM SIGSAC Conference on Computer and Communications Security (CCS). 1773--1788. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Tim Dierks. 2008. The transport layer security (TLS) protocol version 1.2. Technical Report.Google ScholarGoogle Scholar
  11. Benjamin Erb. 2012. Concurrent programming for scalable web architectures. (2012).Google ScholarGoogle Scholar
  12. Daniel Firestone, Andrew Putnam, Sambhrama Mundkur, Derek Chiou, Alireza Dabagh, Mike Andrewartha, Hari Angepat, Vivek Bhanu, Adrian Caulfield, Eric Chung, Harish Kumar Chandrappa, Somesh Chaturmohta, Matt Humphrey, Jack Lavier, Norman Lam, Fengfen Liu, Kalin Ovtcharov, Jitu Padhye, Gautham Popuri, Shachar Raindel, Tejas Sapre, Mark Shaw, Gabriel Silva, Madhan Sivakumar, Nisheeth Srivastava, Anshuman Verma, Qasim Zuhair, Deepak Bansal, Doug Burger, Kushagra Vaid, David A. Maltz, and Albert Greenberg. 2018. Azure Accelerated Networking: SmartNICs in the Public Cloud. In Proceedings of the 15th USENIX Symposium on Networked Systems Design and Implementation (NSDI). 51--66. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. OpenSSL Software Foundation. 2018. OpenSSL: Cryptography and SSL/TLS Toolkit. Retrieved November 30, 2018 from https://www.openssl.org/Google ScholarGoogle Scholar
  14. The Apache Software Foundation. 2018. The Apache HTTP Server Project. Retrieved November 30, 2018 from https://httpd.apache.org/Google ScholarGoogle Scholar
  15. Owen Garrett. 2015. NGINX vs. Apache: Our View of a Decade-Old Question. Retrieved November 30, 2018 from https://www.nginx.com/blog/nginx-vs-apache-our-view/Google ScholarGoogle Scholar
  16. Vinodh Gopal, James Guilford, Erdinc Ozturk, Wajdi Feghali, Gil Wolrich, and Martin Dixon. 2009. Fast and constant-time implementation of modular exponentiation. (2009).Google ScholarGoogle Scholar
  17. Shay Gueron and Vlad Krasnov. 2015. Fast prime field elliptic-curve cryptography with 256-bit primes. Journal of Cryptographic Engineering 5, 2 (2015), 141--151.Google ScholarGoogle ScholarCross RefCross Ref
  18. Owen Harrison and John Waldron. 2008. Practical Symmetric Key Cryptography on Modern Graphics Hardware. In Proceedings of the 17th USENIX Security Symposium (Security). 195--210. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Intel. 2014. Intel® QuickAssist Technology Performance Optimization Guide. Technical Report. https://01.org/sites/default/files/page/330687_qat_perf_opt_guide_rev_1.0.pdfGoogle ScholarGoogle Scholar
  20. Intel. 2018. Intel® QuickAssist Technology (Intel® QAT). Retrieved November 30, 2018 from https://www.intel.com/content/www/us/en/architecture-and-technology/intel-quick-assist-technology-overview.htmlGoogle ScholarGoogle Scholar
  21. Intel and Wangsu. 2018. Working Together to Build a High Efficiency CDN System for HTTPS. Technical Report. https://01.org/sites/default/files/downloads/intelr-quickassist-technology/i12036-casestudy-intelqatcdnaccelerationen337190-001us.pdfGoogle ScholarGoogle Scholar
  22. Takashi Isobe, Satoshi Tsutsumi, Koichiro Seto, Kenji Aoshima, and Kazutoshi Kariya. 2010. 10 Gbps Implementation of TLS/SSL Accelerator on FPGA. In Proceedings of the 18th International Workshop on Quality of Service (IWQoS). 1--6.Google ScholarGoogle ScholarCross RefCross Ref
  23. Keon Jang, Sangjin Han, Seungyeop Han, Sue B Moon, and KyoungSoo Park. 2011. SSLShader: Cheap SSL Acceleration with Commodity Processors. In Proceedings of the 8th USENIX conference on Networked Systems Design and Implementation (NSDI). Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Zia-Uddin-Ahamed Khan and Mohammed Benaissa. 2015. Throughput/area-efficient ECC processor using Montgomery point multiplication on FPGA. IEEE Transactions on Circuits and Systems II: Express Briefs 62, 11 (2015), 1078--1082.Google ScholarGoogle ScholarCross RefCross Ref
  25. Moein Khazraee, Lu Zhang, Luis Vega, and Michael Bedford Taylor. 2017. Moonwalk: Nre Optimization in ASIC Clouds. In Proceedings of the 21st International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS). 511--526. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Michael E. Kounavis, Xiaozhu Kang, Ken Grewal, Mathew Eszenyi, Shay Gueron, and David Durham. 2010. Encrypting the Internet. In Proceedings of the Annual Conference of ACM Special Interest Group on Data Communication (SIGCOMM). 135--146. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Oliver Kowalke. 2018. Boost.Fiber 1.68.0 Overview. Retrieved November 30, 2018 from https://www.boost.org/doc/libs/1_68_0/libs/fiber/doc/html/fiber/overview.htmlGoogle ScholarGoogle Scholar
  28. Hugo Krawczyk and Pasi Eronen. 2010. HMAC-based Extract-and-Expand Key Derivation Function (HKDF). Technical Report.Google ScholarGoogle Scholar
  29. Yang Liu, Jianguo Wang, and Steven Swanson. 2018. Griffin: Uniting CPU and GPU in Information Retrieval Systems for Intra-Query Parallelism. In Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP). 327--337. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Peter Membrey, David Hows, and Eelco Plugge. 2012. SSL load balancing. In Practical Load Balancing. Springer, 175--192.Google ScholarGoogle Scholar
  31. Rui Miao, Hongyi Zeng, Changhoon Kim, Jeongkeun Lee, and Minlan Yu. 2017. SilkRoad: Making Stateful Layer-4 Load Balancing Fast and Cheap Using Switching ASICs. In Proceedings of the Annual Conference of ACM Special Interest Group on Data Communication (SIGCOMM). 15--28. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. David Naylor, Alessandro Finamore, Ilias Leontiadis, Yan Grunenberger, Marco Mellia, Maurizio Munafò, Konstantina Papagiannaki, and Peter Steenkiste. 2014. The Cost of the "S" inHTTPS. In Proceedings of the 10th ACM International on Conference on emerging Networking Experiments and Technologies (CoNEXT). 133--140. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Q-Success. 2018. Usage of Nginx broken down by ranking. Retrieved November 30, 2018 from https://w3techs.com/technologies/breakdown/ws-nginx/rankingGoogle ScholarGoogle Scholar
  34. Qualys. 2018. SSL Pulse. Retrieved November 30, 2018 from https://www.ssllabs.com/ssl-pulse/Google ScholarGoogle Scholar
  35. Amir Rawdat. 2017. Testing the Performance of NGINX and NGINX Plus Web Servers. Retrieved November 30, 2018 from https://www.nginx.com/blog/testing-the-performance-of-nginx-and-nginx-plus-web-servers/Google ScholarGoogle Scholar
  36. Will Reese. 2008. Nginx: the High-performance Web Server and Reverse Proxy. Linux Journal 2008, 173, Article 2 (2008). Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Eric Rescorla. 2018. The transport layer security (TLS) protocol version 1.3. Technical Report.Google ScholarGoogle Scholar
  38. Hovav Shacham and Dan Boneh. 2001. improving SSL Handshake Performance via Batching. In Proceedings of the Cryptographer's Track at RSA Conference (CT-RSA). 28--43. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Hovav Shacham, Dan Boneh, and Eric Rescorla. 2004. Client-side caching for TLS. ACM Transactions on Information and System Security (TISSEC) 7, 4 (2004), 553--575. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Mostafa I Soliman and Ghada Y Abozaid. 2011. FPGA implementation and performance evaluation of a high throughput crypto coprocessor. Journal of Parallel and Distributed Computing (JPDC) 71, 8 (2011), 1075--1084. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Alibaba Open Source. 2018. tengine qat ssl. Retrieved November 30, 2018 from http://tengine.taobao.org/document/tengine_qat_ssl.htmlGoogle ScholarGoogle Scholar
  42. Drew Springall, Zakir Durumeric, and J Alex Halderman. 2016. Measuring the Security Harm of TLS Crypto Shortcuts. In Proceedings of the Internet Measurement Conference (IMC). 33--47. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Robert Szerwinski and Tim Güneysu. 2008. Exploiting the Power of GPUs for Asymmetric Cryptography. In Proceedings of the 10th International Workshop on Cryptographic Hardware and Embedded Systems (CHES). Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. LiteSpeed Technologies. 2018. Event-Driven vs. Process-Based Web Servers. Retrieved November 30, 2018 from https://www.litespeedtech.com/products/litespeed-web-server/features/event-driven-architectureGoogle ScholarGoogle Scholar
  45. Changzheng Wei, Jian Li, Weigang Li, Ping Yu, and Haibing Guan. 2017. STYX: A Trusted and Accelerated Hierarchical SSL Key Management and Distribution System for Cloud Based CDN Application. In Proceedings of the ACM Symposium on Cloud Computing (SoCC). 201--213. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Wikipedia. 2018. AES instruction set. Retrieved November 30, 2018 from https://en.wikipedia.org/wiki/AES_instruction_setGoogle ScholarGoogle Scholar
  47. Wikipedia. 2018. Elliptic-curve cryptography. Retrieved November 30, 2018 from https://en.wikipedia.org/wiki/Elliptic-curve_cryptographyGoogle ScholarGoogle Scholar
  48. Wikipedia. 2018. Elliptic-curve Diffie-Hellman. Retrieved November 30, 2018 from https://en.wikipedia.org/wiki/Elliptic-curve_DiffieHellmanGoogle ScholarGoogle Scholar
  49. Wikipedia. 2018. Epoll. Retrieved November 30, 2018 from https://en.wikipedia.org/wiki/EpollGoogle ScholarGoogle Scholar
  50. Wikipedia. 2018. Fiber (computer science). Retrieved November 30, 2018 from https://en.wikipedia.org/wiki/Fiber_(computer_science)Google ScholarGoogle Scholar
  51. Wikipedia. 2018. File descriptor. Retrieved November 30, 2018 from https://en.wikipedia.org/wiki/File_descriptorGoogle ScholarGoogle Scholar
  52. Wikipedia. 2018. Kqueue. Retrieved November 30, 2018 from https://en.wikipedia.org/wiki/KqueueGoogle ScholarGoogle Scholar
  53. Wikipedia. 2018. OpenSSL. Retrieved November 30, 2018 from https://en.wikipedia.org/wiki/OpenSSLGoogle ScholarGoogle Scholar
  54. Wikipedia. 2018. Pseudorandom function family. Retrieved November 30, 2018 from https://en.wikipedia.org/wiki/Pseudorandom_function_familyGoogle ScholarGoogle Scholar
  55. Wikipedia. 2018. RSA (cryptosystem). Retrieved November 30, 2018 from https://en.wikipedia.org/wiki/RSA_(cryptosystem)Google ScholarGoogle Scholar
  56. Wikipedia. 2018. TLS termination proxy. Retrieved November 30, 2018 from https://en.wikipedia.org/wiki/TLS_termination_proxyGoogle ScholarGoogle Scholar
  57. Wikipedia. 2018. Transport Layer Security. Retrieved November 30, 2018 from https://en.wikipedia.org/wiki/Transport_Layer_SecurityGoogle ScholarGoogle Scholar
  58. Jason Yang and James Goodman. 2007. Symmetric Key Cryptography on Modern Graphics Hardware. In Proceedings of the International Conference on the Theory and Application of Cryptology and Information Security (ASIACRYPT). 249--264. Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. Shun Yao and Dantong Yu. 2017. PhiOpenSSL: Using the Xeon Phi Coprocessor for Efficient Cryptographic Calculations. In Proceedings of the IEEE International Parallel and Distributed Processing Symposium (IPDPS). 565--574.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. QTLS: high-performance TLS asynchronous offload framework with Intel® QuickAssist technology

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          PPoPP '19: Proceedings of the 24th Symposium on Principles and Practice of Parallel Programming
          February 2019
          472 pages
          ISBN:9781450362252
          DOI:10.1145/3293883

          Copyright © 2019 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 16 February 2019

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          PPoPP '19 Paper Acceptance Rate29of152submissions,19%Overall Acceptance Rate230of1,014submissions,23%

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader