Abstract
Advances in quantum computing have spurred a significant amount of research into public-key cryptographic algorithms that are resistant against postquantum cryptanalysis. Lattice-based cryptography is one of the important candidates because of its reasonable complexity combined with reasonable signature sizes. However, in a postquantum world, not only the cryptography will change but also the computing platforms. Large amounts of resource-constrained embedded systems will connect to a cloud of powerful server computers. We present an optimization technique for lattice-based signature generation on such embedded systems; our goal is to optimize latency rather than throughput. Indeed, on an embedded system, the latency of a single signature for user identification or message authentication is more important than the aggregate signature generation rate. We build a high-performance implementation using hardware/software codesign techniques. The key idea is to partition the signature generation scheme into offline and online phases. The signature scheme allows this separation because a large portion of the computation does not depend on the message to be signed and can be handled before the message is given. Then, we can map complex precomputation operations in software on a low-cost processor and utilize hardware resources to accelerate simpler online operations. To find the optimum hardware architecture for the target platform, we define and explore the design space and implement two design configurations. We realize our solutions on the Altera Cyclone-IV CGX150 FPGA. The implementation consists of a NIOS soft-core processor and a low-latency hash and polynomial multiplication engine. On average, the proposed low-latency architecture can generate a signature with a latency of 96 clock cycles at 40MHz, resulting in a response time of 2.4μs for a signing request. On equivalent platforms, this corresponds to a performance improvement of 33 and 105 times compared to previous hardware and software implementations, respectively.
- Aydin Aysu, Cameron Patterson, and Patrick Schaumont. 2013. Low-cost and area-efficient FPGA implementations of lattice-based cryptography. In Proceedings of the 2013 IEEE International Symposium on Hardware-Oriented Security and Trust (HOST’13). 81--86. DOI:http://dx.doi.org/10.1109/HST. 2013.6581570Google ScholarCross Ref
- Boeing. 2013. Statistical Summary of Commercial Jet Airplane Accidents. Retrieved from http://www.boeing.com/news/techissues/pdf/statsum.pdf.Google Scholar
- Andrey Bogdanov, Thomas Eisenbarth, Andy Rupp, and Christopher Wolf. 2008. Time-area optimized public-key engines: MQ-cryptosystems as replacement for elliptic curves? In Cryptographic Hardware and Embedded Systems (CHES’08), Elisabeth Oswald and Pankaj Rohatgi (Eds.). Lecture Notes in Computer Science, Vol. 5154. Springer, Berlin, 45--61. DOI:http://dx.doi.org/10.1007/978-3-540-85053-3_4 Google ScholarDigital Library
- Ahmad Boorghany and Rasool Jalili. 2014. Implementation and comparison of lattice-based identification protocols on smart cards and microcontrollers. Cryptology ePrint Archive, Report 2014/078 (2014). http://eprint.iacr.org/.Google Scholar
- Julia Borghoff, Anne Canteaut, Tim Güneysu, Elif Bilge Kavun, Miroslav Knezević, Lars R. Knudsen, Gregor Leander, Ventzislav Nikov, Christof Paar, Christian Rechberger, Peter Rombouts, Soren S. Thomsen, and Tolga Yalçin. 2012. PRINCE — A low-latency block cipher for pervasive computing applications. In Advances in Cryptology (ASIACRYPT’12), Xiaoyun Wang and Kazue Sako (Eds.). Lecture Notes in Computer Science, Vol. 7658. Springer, Berlin, 208--225. DOI:http://dx.doi.org/10.1007/978-3-642-34961-4_14 Google ScholarDigital Library
- William N. Chelton and Mohammed Benaissa. 2008. Fast elliptic curve cryptography on FPGA. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 16, 2 (February 2008), 198--205. DOI:http://dx.doi.org/10.1109/TVLSI.2007.912228 Google ScholarDigital Library
- Léo Ducas. 2014. Accelerating Bliss: The geometry of ternary polynomials. Cryptology ePrint Archive, Report 2014/874.Google Scholar
- Léo Ducas, Alain Durmus, Tancréde Lepoint, and Vadim Lyubashevsky. 2013. Lattice signatures and bimodal Gaussians. In Advances in Cryptology (CRYPTO’13), Ran Canetti and Juan A. Garay (Eds.). Lecture Notes in Computer Science, Vol. 8042. Springer, Berlin, 40--56. DOI:http://dx.doi.org/10.1007/978-3-642-40041-4_3Google Scholar
- Pavel Emeliyanenko. 2009. Efficient multiplication of polynomials on graphics hardware. In Advanced Parallel Processing Technologies, Yong Dou, Ralf Gruber, and Josef M. Joller (Eds.). Lecture Notes in Computer Science, Vol. 5737. Springer, Berlin, 134--149. DOI:http://dx.doi.org/10.1007/978-3-642-03644-6_11 Google ScholarDigital Library
- Shimon Even, Oded Goldreich, and Silvio Micali. 1996. On-line/off-line digital signatures. Journal of Cryptology 9, 1 (1996), 35--67. DOI:http://dx.doi.org/10.1007/BF02254791 Google ScholarDigital Library
- Norman Göttert, Thomas Feller, Michael Schneider, Johannes Buchmann, and Sorin Huss. 2012. On the design of hardware building blocks for modern lattice-based encryption schemes. In Cryptographic Hardware and Embedded Systems (CHES’12), Emmanuel Prouff and Patrick Schaumont (Eds.). Lecture Notes in Computer Science, Vol. 7428. Springer, Berlin, 512--529. DOI:http://dx.doi.org/10.1007/978-3-642-33027-8_30 Google ScholarDigital Library
- Tim Güneysu, Vadim Lyubashevsky, and Thomas Pöppelmann. 2012. Practical lattice-based cryptography: A signature scheme for embedded systems. In Cryptographic Hardware and Embedded Systems (CHES’12), Emmanuel Prouff and Patrick Schaumont (Eds.). Lecture Notes in Computer Science, Vol. 7428. Springer, Berlin, 530--547. DOI:http://dx.doi.org/10.1007/978-3-642-33027-8_31 Google ScholarDigital Library
- Tim Güneysu, Tobias Oder, Thomas Pöppelmann, and Peter Schwabe. 2013. Software speed records for lattice-based signatures. In Post-Quantum Cryptography, Philippe Gaborit (Ed.). Lecture Notes in Computer Science, Vol. 7932. Springer, Berlin, 67--82. DOI:http://dx.doi.org/10.1007/978-3-642-38616-9_5Google Scholar
- Tàmas Györfi, Octavian Cret, and Zalán Borsos. 2013. Implementing modular FFTs in FPGAs—A basic block for lattice-based cryptography. In Proceedings of the 2013 Euromicro Conference on Digital System Design (DSD). 305--308. DOI:http://dx.doi.org/10.1109/DSD.2013.136 Google ScholarDigital Library
- Tàmas Györfi, Octavian Cret, Guillaume Hanrot, and Nicolas Brisebarre. 2012. High-Throughput Hardware Architecture for the SWIFFT/SWIFFTX Hash Functions. Cryptology ePrint Archive, Report 2012/343.Google Scholar
- Kashmir Hill. 2013. E-ZPasses Get Read All Over New York (Not Just at Toll Booths). Retrieved from http://www.forbes.com/sites/kashmirhill/2013/09/12/e-zpasses-get-read-all-over-new-york-not-just-at- toll-booths/.Google Scholar
- Abdel Alim Kamal and Amr M. Youssef. 2009. An FPGA implementation of the NTRUEncrypt cryptosystem. In Proceedings of the 2009 International Conference on Microelectronics (ICM’09). IEEE, 209--212.Google Scholar
- Miroslav Knezević, Ventzislav Nikov, and Peter Rombouts. 2012. Low-latency encryption “Is Lightweight = Light + Wait?”. In Cryptographic Hardware and Embedded Systems (CHES’12), Emmanuel Prouff and Patrick Schaumont (Eds.). Lecture Notes in Computer Science, Vol. 7428. Springer, Berlin, 426--446. DOI:http://dx.doi.org/10.1007/978-3-642-33027-8_25 Google ScholarDigital Library
- Yinan Kong and Yufeng Lai. 2013. Low latency modular multiplication for public-key cryptosystems using a scalable array of parallel processing elements. In, Proceedings of the 2013 IEEE 56th International Midwest Symposium on Circuits and Systems (MWSCAS). 1039--1042. DOI:http://dx.doi.org/10.1109/MWSCAS.2013.6674830Google ScholarCross Ref
- Nate Lawson. 2008. Highway to hell: Hacking toll systems. Presentation at Blackhat (2008).Google Scholar
- Yong Ki Lee, Herwin Chan, and Ingrid Verbauwhede. 2006. Throughput optimized SHA-1 architecture using unfolding transformation. In Proceedings of the International Conference on Application-specific Systems, Architectures and Processors (ASAP’06). 354--359. DOI:http://dx.doi.org/10.1109/ASAP.2006.68 Google ScholarDigital Library
- Vadim Lyubashevsky. 2009. Fiat-Shamir with aborts: Applications to lattice and factoring-based signatures. In Advances in Cryptology (ASIACRYPT’09), Mitsuru Matsui (Ed.). Lecture Notes in Computer Science, Vol. 5912. Springer, Berlin, 598--616. DOI:http://dx.doi.org/10.1007/978-3-642-10366-7_35 Google ScholarDigital Library
- Vadim Lyubashevsky. 2012. Lattice signatures without trapdoors. In Advances in Cryptology (EUROCRYPT’12), David Pointcheval and Thomas Johansson (Eds.). Lecture Notes in Computer Science, Vol. 7237. Springer, Berlin, 738--755. DOI:http://dx.doi.org/10.1007/978-3-642-29011-4_43 Google ScholarDigital Library
- Vadim Lyubashevsky, Chris Peikert, and Oded Regev. 2013. On ideal lattices and learning with errors over rings. Journal of the ACM 60, 6, Article 43 (November 2013), 35 pages. DOI:http://dx.doi.org/10.1145/2535925 Google ScholarDigital Library
- Jon Masters. 2013. Out with the old in with the new. ITS International 19, 2 (2013).Google Scholar
- Tobias Oder, Thomas Pöppelmann, and Tim Güneysu. 2014. Beyond ECDSA and RSA: Lattice-based digital signatures on constrained devices. In Proceedings of the the 51st Annual Design Automation Conference on Design Automation Conference (DAC’14). ACM, New York, NY, Article 110, 6 pages. DOI:http://dx.doi. org/10.1145/2593069.2593098 Google ScholarDigital Library
- John M. Pollard. 1971. The fast Fourier transform in a finite field. Mathematics of Computation 25, 114 (1971), 365--374.Google ScholarCross Ref
- Thomas Pöppelmann, Léo Ducas, and Tim Güneysu. 2014. Enhanced lattice-based signatures on reconfigurable hardware. In Cryptographic Hardware and Embedded Systems (CHES’14), Lejla Batina and Matthew Robshaw (Eds.). Lecture Notes in Computer Science, Vol. 8731. Springer, Berlin, 353--370. DOI:http://dx.doi.org/10.1007/978-3-662-44709-3_20 Google ScholarDigital Library
- Thomas Pöppelmann and Tim Güneysu. 2012. Towards efficient arithmetic for lattice-based cryptography on reconfigurable hardware. In Progress in Cryptology (LATINCRYPT’12), Alejandro Hevia and Gregory Neven (Eds.). Lecture Notes in Computer Science, Vol. 7533. Springer, Berlin, 139--158. DOI:http://dx.doi.org/10.1007/978-3-642-33481-8_8 Google ScholarDigital Library
- Thomas Poppelmann and Tim Guneysu. 2014. Area optimization of lightweight lattice-based encryption on reconfigurable hardware. In Proceedings of the 2014 IEEE International Symposium on Circuits and Systems (ISCAS’14). 2796--2799. DOI:http://dx.doi.org/10.1109/ISCAS.2014.6865754Google ScholarCross Ref
- Thomas Pöppelmann and Tim Güneysu. 2014. Towards practical lattice-based public-key encryption on reconfigurable hardware. In Selected Areas in Cryptography (SAC’13), Tanja Lange, Kristin Lauter, and Petr Lisonk (Eds.). Springer, Berlin, 68--85. DOI:http://dx.doi.org/10.1007/978-3-662-43414-7_4Google Scholar
- Chester Rebeiro, Sujoy Sinha Roy, and Debdeep Mukhopadhyay. 2012. Pushing the limits of high-speed GF(2m) elliptic curve scalar multiplication on FPGAs. In Cryptographic Hardware and Embedded Systems (CHES’12), Emmanuel Prouff and Patrick Schaumont (Eds.). Lecture Notes in Computer Science, Vol. 7428. Springer, Berlin, 494--511. DOI:http://dx.doi.org/10.1007/978-3-642-33027-8_29 Google ScholarDigital Library
- SujoySinha Roy, Frederik Vercauteren, Nele Mentens, Donald Donglong Chen, and Ingrid Verbauwhede. 2014b. Compact ring-LWE cryptoprocessor. In Cryptographic Hardware and Embedded Systems (CHES’14), Lejla Batina and Matthew Robshaw (Eds.). Lecture Notes in Computer Science, Vol. 8731. Springer, Berlin, 371--391. DOI:http://dx.doi.org/10.1007/978-3-662-44709-3_21 Google ScholarDigital Library
- Sujoy Sinha Roy, Frederik Vercauteren, and Ingrid Verbauwhede. 2014a. High precision discrete Gaussian sampling on FPGAs. In Selected Areas in Cryptography (SAC’13). Springer, 383--401.Google Scholar
- Peter W. Shor. 1994. Algorithms for quantum computation: Discrete logarithms and factoring. In Proceedings of the 35th Annual Symposium on Foundations of Computer Science. 124--134. DOI:http://dx.doi.org/10.1109/SFCS.1994.365700 Google ScholarDigital Library
- Chang Shu, Kris Gaj, and Tarek El-Ghazawi. 2005. Low latency elliptic curve cryptography accelerators for NIST curves over binary fields. In Proceedings of the 2005 IEEE International Conference on Field-Programmable Technology. 309--310. DOI:http://dx.doi.org/10.1109/FPT.2005.1568575Google Scholar
- Ralf Zimmermann, Tim Güneysu, and Christof Paar. 2010. High-performance integer factoring with reconfigurable devices. In Proceedings of the 2010 International Conference on Field Programmable Logic and Applications (FPL’10), 83--88. DOI:http://dx.doi.org/10.1109/FPL.2010.26 Google ScholarDigital Library
Index Terms
- The Future of Real-Time Security: Latency-Optimized Lattice-Based Digital Signatures
Recommendations
Implementing CRYSTALS-Dilithium Signature Scheme on FPGAs
ARES '21: Proceedings of the 16th International Conference on Availability, Reliability and SecurityIn July 2020, the lattice-based CRYSTALS-Dilithium digital signature scheme has been chosen as one of the three third-round finalists in the post-quantum cryptography standardization process by the National Institute of Standards and Technology (NIST). ...
Security Arguments for Digital Signatures and Blind Signatures
Since the appearance of public-key cryptography in the seminal Diffie--Hellman paper, many new schemes have been proposed and many have been broken. Thus, the simple fact that a cryptographic algorithm withstands cryptanalytic attacks for several years ...
A Constant Time Full Hardware Implementation of Streamlined NTRU Prime
Smart Card Research and Advanced ApplicationsAbstractThis paper presents a constant time hardware implementation of the NIST round 2 post-quantum cryptographic algorithm Streamlined NTRU Prime. We implement the entire KEM algorithm, including all steps for key generation, encapsulation and ...
Comments