ABSTRACT
Hardware accelerators are a promising solution to optimize the Total Cost of Ownership (TCO) of cloud datacenters. This paper targets the costly Transport Layer Security (TLS) and investigates the TLS acceleration for the widely-deployed event-driven TLS servers or terminators. Our study reveals an important fact: the straight offloading of TLS-involved crypto operations suffers from the frequent long-lasting blockings in the offload I/O, leading to the underutilization of both CPU and accelerator resources.
To achieve efficient TLS acceleration for the event-driven web architecture, we propose QTLS, a high-performance TLS asynchronous offload framework based on Intel® QuickAssist Technology (QAT). QTLS re-engineers the TLS software stack and divides the TLS offloading into four phases to eliminate blockings. Then, multiple crypto operations from different TLS connections can be offloaded concurrently in one process/thread, bringing a performance boost. Moreover, QTLS is built with a heuristic polling scheme to retrieve accelerator responses efficiently and timely, and a kernel-bypass notification scheme to avoid expensive switches between user mode and kernel mode while delivering async events. The comprehensive evaluation shows that QTLS can provide up to 9x connections per second (CPS) with TLS-RSA (2048bit), 2x secure data transfer throughput and 85% reduction of average response time compared to the software baseline.
- Daniel J. Bernstein and Tanja Lange. 2017. SafeCurves: choosing safe curves for elliptic-curve cryptography. Retrieved November 30, 2018 from https://safecurves.cr.yp.to/Google Scholar
- Dan Boneh and Hovav Shacham. 2002. Fast variants of RSA. Crypto-Bytes 5, 1 (2002), 1--9.Google Scholar
- Ran Canetti, Shai Halevi, and Jonathan Katz. 2003. A Forward-Secure Public-Key Encryption Scheme. In Proceedings of the International Conference on the Theory and Applications of Cryptographic Techniques (EUROCRYPT). 255--271. Google ScholarDigital Library
- Claude Castelluccia, Einar Mykletun, and Gene Tsudik. 2006. Improving secure server performance by re-balancing SSL/TLS handshakes. In Proceedings of the ACM Symposium on Information, computer and communications security (ASIACCS). 26--34. Google ScholarDigital Library
- Adrian M. Caulfield, Eric S. Chung, Andrew Putnam, Hari Angepat, Jeremy Fowers, Michael Haselman, Stephen Heil, Matt Humphrey, Puneet Kaur, Joo-Young Kim, Daniel Lo, Todd Massengill, Kalin Ovtcharov, Michael Papamichael, Lisa Woods, Sitaram Lanka, Derek Chiou, and Doug Burger. 2016. A Cloud-Scale Acceleration Architecture. In Proceedings of the 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). 7:1--7:13. Google ScholarDigital Library
- Cristian Coarfa, Peter Druschel, and Dan S Wallach. 2006. Performance Analysis of TLS Web servers. ACM Transactions on Computer Systems (TOCS) 24, 1 (2006), 39--69. Google ScholarDigital Library
- HAProxy Community. 2018. HAProxy - The Reliable, High Performance TCP/HTTP Load Balancer. Retrieved November 30, 2018 from http://www.haproxy.org/Google Scholar
- Squid Community. 2018. Squid: Optimising Web Delivery. Retrieved November 30, 2018 from http://www.squid-cache.org/Google Scholar
- Cas Cremers, Marko Horvat, Jonathan Hoyland, Sam Scott, and Thyla van der Merwe. 2017. A Comprehensive Symbolic Analysis of TLS 1.3. In Proceedings of the ACM SIGSAC Conference on Computer and Communications Security (CCS). 1773--1788. Google ScholarDigital Library
- Tim Dierks. 2008. The transport layer security (TLS) protocol version 1.2. Technical Report.Google Scholar
- Benjamin Erb. 2012. Concurrent programming for scalable web architectures. (2012).Google Scholar
- Daniel Firestone, Andrew Putnam, Sambhrama Mundkur, Derek Chiou, Alireza Dabagh, Mike Andrewartha, Hari Angepat, Vivek Bhanu, Adrian Caulfield, Eric Chung, Harish Kumar Chandrappa, Somesh Chaturmohta, Matt Humphrey, Jack Lavier, Norman Lam, Fengfen Liu, Kalin Ovtcharov, Jitu Padhye, Gautham Popuri, Shachar Raindel, Tejas Sapre, Mark Shaw, Gabriel Silva, Madhan Sivakumar, Nisheeth Srivastava, Anshuman Verma, Qasim Zuhair, Deepak Bansal, Doug Burger, Kushagra Vaid, David A. Maltz, and Albert Greenberg. 2018. Azure Accelerated Networking: SmartNICs in the Public Cloud. In Proceedings of the 15th USENIX Symposium on Networked Systems Design and Implementation (NSDI). 51--66. Google ScholarDigital Library
- OpenSSL Software Foundation. 2018. OpenSSL: Cryptography and SSL/TLS Toolkit. Retrieved November 30, 2018 from https://www.openssl.org/Google Scholar
- The Apache Software Foundation. 2018. The Apache HTTP Server Project. Retrieved November 30, 2018 from https://httpd.apache.org/Google Scholar
- Owen Garrett. 2015. NGINX vs. Apache: Our View of a Decade-Old Question. Retrieved November 30, 2018 from https://www.nginx.com/blog/nginx-vs-apache-our-view/Google Scholar
- Vinodh Gopal, James Guilford, Erdinc Ozturk, Wajdi Feghali, Gil Wolrich, and Martin Dixon. 2009. Fast and constant-time implementation of modular exponentiation. (2009).Google Scholar
- Shay Gueron and Vlad Krasnov. 2015. Fast prime field elliptic-curve cryptography with 256-bit primes. Journal of Cryptographic Engineering 5, 2 (2015), 141--151.Google ScholarCross Ref
- Owen Harrison and John Waldron. 2008. Practical Symmetric Key Cryptography on Modern Graphics Hardware. In Proceedings of the 17th USENIX Security Symposium (Security). 195--210. Google ScholarDigital Library
- Intel. 2014. Intel® QuickAssist Technology Performance Optimization Guide. Technical Report. https://01.org/sites/default/files/page/330687_qat_perf_opt_guide_rev_1.0.pdfGoogle Scholar
- Intel. 2018. Intel® QuickAssist Technology (Intel® QAT). Retrieved November 30, 2018 from https://www.intel.com/content/www/us/en/architecture-and-technology/intel-quick-assist-technology-overview.htmlGoogle Scholar
- Intel and Wangsu. 2018. Working Together to Build a High Efficiency CDN System for HTTPS. Technical Report. https://01.org/sites/default/files/downloads/intelr-quickassist-technology/i12036-casestudy-intelqatcdnaccelerationen337190-001us.pdfGoogle Scholar
- Takashi Isobe, Satoshi Tsutsumi, Koichiro Seto, Kenji Aoshima, and Kazutoshi Kariya. 2010. 10 Gbps Implementation of TLS/SSL Accelerator on FPGA. In Proceedings of the 18th International Workshop on Quality of Service (IWQoS). 1--6.Google ScholarCross Ref
- Keon Jang, Sangjin Han, Seungyeop Han, Sue B Moon, and KyoungSoo Park. 2011. SSLShader: Cheap SSL Acceleration with Commodity Processors. In Proceedings of the 8th USENIX conference on Networked Systems Design and Implementation (NSDI). Google ScholarDigital Library
- Zia-Uddin-Ahamed Khan and Mohammed Benaissa. 2015. Throughput/area-efficient ECC processor using Montgomery point multiplication on FPGA. IEEE Transactions on Circuits and Systems II: Express Briefs 62, 11 (2015), 1078--1082.Google ScholarCross Ref
- Moein Khazraee, Lu Zhang, Luis Vega, and Michael Bedford Taylor. 2017. Moonwalk: Nre Optimization in ASIC Clouds. In Proceedings of the 21st International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS). 511--526. Google ScholarDigital Library
- Michael E. Kounavis, Xiaozhu Kang, Ken Grewal, Mathew Eszenyi, Shay Gueron, and David Durham. 2010. Encrypting the Internet. In Proceedings of the Annual Conference of ACM Special Interest Group on Data Communication (SIGCOMM). 135--146. Google ScholarDigital Library
- Oliver Kowalke. 2018. Boost.Fiber 1.68.0 Overview. Retrieved November 30, 2018 from https://www.boost.org/doc/libs/1_68_0/libs/fiber/doc/html/fiber/overview.htmlGoogle Scholar
- Hugo Krawczyk and Pasi Eronen. 2010. HMAC-based Extract-and-Expand Key Derivation Function (HKDF). Technical Report.Google Scholar
- Yang Liu, Jianguo Wang, and Steven Swanson. 2018. Griffin: Uniting CPU and GPU in Information Retrieval Systems for Intra-Query Parallelism. In Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP). 327--337. Google ScholarDigital Library
- Peter Membrey, David Hows, and Eelco Plugge. 2012. SSL load balancing. In Practical Load Balancing. Springer, 175--192.Google Scholar
- Rui Miao, Hongyi Zeng, Changhoon Kim, Jeongkeun Lee, and Minlan Yu. 2017. SilkRoad: Making Stateful Layer-4 Load Balancing Fast and Cheap Using Switching ASICs. In Proceedings of the Annual Conference of ACM Special Interest Group on Data Communication (SIGCOMM). 15--28. Google ScholarDigital Library
- David Naylor, Alessandro Finamore, Ilias Leontiadis, Yan Grunenberger, Marco Mellia, Maurizio Munafò, Konstantina Papagiannaki, and Peter Steenkiste. 2014. The Cost of the "S" inHTTPS. In Proceedings of the 10th ACM International on Conference on emerging Networking Experiments and Technologies (CoNEXT). 133--140. Google ScholarDigital Library
- Q-Success. 2018. Usage of Nginx broken down by ranking. Retrieved November 30, 2018 from https://w3techs.com/technologies/breakdown/ws-nginx/rankingGoogle Scholar
- Qualys. 2018. SSL Pulse. Retrieved November 30, 2018 from https://www.ssllabs.com/ssl-pulse/Google Scholar
- Amir Rawdat. 2017. Testing the Performance of NGINX and NGINX Plus Web Servers. Retrieved November 30, 2018 from https://www.nginx.com/blog/testing-the-performance-of-nginx-and-nginx-plus-web-servers/Google Scholar
- Will Reese. 2008. Nginx: the High-performance Web Server and Reverse Proxy. Linux Journal 2008, 173, Article 2 (2008). Google ScholarDigital Library
- Eric Rescorla. 2018. The transport layer security (TLS) protocol version 1.3. Technical Report.Google Scholar
- Hovav Shacham and Dan Boneh. 2001. improving SSL Handshake Performance via Batching. In Proceedings of the Cryptographer's Track at RSA Conference (CT-RSA). 28--43. Google ScholarDigital Library
- Hovav Shacham, Dan Boneh, and Eric Rescorla. 2004. Client-side caching for TLS. ACM Transactions on Information and System Security (TISSEC) 7, 4 (2004), 553--575. Google ScholarDigital Library
- Mostafa I Soliman and Ghada Y Abozaid. 2011. FPGA implementation and performance evaluation of a high throughput crypto coprocessor. Journal of Parallel and Distributed Computing (JPDC) 71, 8 (2011), 1075--1084. Google ScholarDigital Library
- Alibaba Open Source. 2018. tengine qat ssl. Retrieved November 30, 2018 from http://tengine.taobao.org/document/tengine_qat_ssl.htmlGoogle Scholar
- Drew Springall, Zakir Durumeric, and J Alex Halderman. 2016. Measuring the Security Harm of TLS Crypto Shortcuts. In Proceedings of the Internet Measurement Conference (IMC). 33--47. Google ScholarDigital Library
- Robert Szerwinski and Tim Güneysu. 2008. Exploiting the Power of GPUs for Asymmetric Cryptography. In Proceedings of the 10th International Workshop on Cryptographic Hardware and Embedded Systems (CHES). Google ScholarDigital Library
- LiteSpeed Technologies. 2018. Event-Driven vs. Process-Based Web Servers. Retrieved November 30, 2018 from https://www.litespeedtech.com/products/litespeed-web-server/features/event-driven-architectureGoogle Scholar
- Changzheng Wei, Jian Li, Weigang Li, Ping Yu, and Haibing Guan. 2017. STYX: A Trusted and Accelerated Hierarchical SSL Key Management and Distribution System for Cloud Based CDN Application. In Proceedings of the ACM Symposium on Cloud Computing (SoCC). 201--213. Google ScholarDigital Library
- Wikipedia. 2018. AES instruction set. Retrieved November 30, 2018 from https://en.wikipedia.org/wiki/AES_instruction_setGoogle Scholar
- Wikipedia. 2018. Elliptic-curve cryptography. Retrieved November 30, 2018 from https://en.wikipedia.org/wiki/Elliptic-curve_cryptographyGoogle Scholar
- Wikipedia. 2018. Elliptic-curve Diffie-Hellman. Retrieved November 30, 2018 from https://en.wikipedia.org/wiki/Elliptic-curve_DiffieHellmanGoogle Scholar
- Wikipedia. 2018. Epoll. Retrieved November 30, 2018 from https://en.wikipedia.org/wiki/EpollGoogle Scholar
- Wikipedia. 2018. Fiber (computer science). Retrieved November 30, 2018 from https://en.wikipedia.org/wiki/Fiber_(computer_science)Google Scholar
- Wikipedia. 2018. File descriptor. Retrieved November 30, 2018 from https://en.wikipedia.org/wiki/File_descriptorGoogle Scholar
- Wikipedia. 2018. Kqueue. Retrieved November 30, 2018 from https://en.wikipedia.org/wiki/KqueueGoogle Scholar
- Wikipedia. 2018. OpenSSL. Retrieved November 30, 2018 from https://en.wikipedia.org/wiki/OpenSSLGoogle Scholar
- Wikipedia. 2018. Pseudorandom function family. Retrieved November 30, 2018 from https://en.wikipedia.org/wiki/Pseudorandom_function_familyGoogle Scholar
- Wikipedia. 2018. RSA (cryptosystem). Retrieved November 30, 2018 from https://en.wikipedia.org/wiki/RSA_(cryptosystem)Google Scholar
- Wikipedia. 2018. TLS termination proxy. Retrieved November 30, 2018 from https://en.wikipedia.org/wiki/TLS_termination_proxyGoogle Scholar
- Wikipedia. 2018. Transport Layer Security. Retrieved November 30, 2018 from https://en.wikipedia.org/wiki/Transport_Layer_SecurityGoogle Scholar
- Jason Yang and James Goodman. 2007. Symmetric Key Cryptography on Modern Graphics Hardware. In Proceedings of the International Conference on the Theory and Application of Cryptology and Information Security (ASIACRYPT). 249--264. Google ScholarDigital Library
- Shun Yao and Dantong Yu. 2017. PhiOpenSSL: Using the Xeon Phi Coprocessor for Efficient Cryptographic Calculations. In Proceedings of the IEEE International Parallel and Distributed Processing Symposium (IPDPS). 565--574.Google ScholarCross Ref
Index Terms
- QTLS: high-performance TLS asynchronous offload framework with Intel® QuickAssist technology
Recommendations
ACCENT: Cognitive cryptography plugged compression for SSL/TLS-based cloud computing services
Emerging cloud services, including mobile offices, Web-based storage services, and content delivery services, run diverse workloads under various device platforms, networks, and cloud service providers. They have been realized on top of SSL/TLS, which ...
ECL: A TLS Extension for Authentication in Complex PKIs
ICYCS '08: Proceedings of the 2008 The 9th International Conference for Young Computer ScientistsThe existing versions of the SSL and TLS protocols allow servers to request end-entity X.509 certificates from clients by specifying a list of certificate authorities (CAs) they trust. This model is insufficient in complex PKI meshes because clients and ...
CertLedger: A new PKI model with Certificate Transparency based on blockchain
AbstractIn conventional PKI, CAs are assumed to be fully trusted. However, in practice, CAs’ absolute responsibility for providing trustworthiness caused major security and privacy issues. To prevent such issues, Google introduced the concept ...
Comments