Loading…

Algorithm-Hardware Co-Design of Split-Radix Discrete Galois Transformation for KyberKEM

KyberKEM is one of the final round key encapsulation mechanisms in the NIST post-quantum cryptography competition. Number theoretic transform (NTT), as the computing bottleneck of KyberKEM, has been widely studied. Discrete Galois Transformation (DGT) is a variant of NTT that reduces transform lengt...

Full description

Saved in:
Bibliographic Details
Published in:IEEE transactions on emerging topics in computing 2023-10, Vol.11 (4), p.1-15
Main Authors: Li, Guangyan, Chen, Donglong, Mao, Gaoyu, Dai, Wangchen, Sanka, Abdurrashid Ibrahim, Cheung, Ray C.C.
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Online Access:Request full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:KyberKEM is one of the final round key encapsulation mechanisms in the NIST post-quantum cryptography competition. Number theoretic transform (NTT), as the computing bottleneck of KyberKEM, has been widely studied. Discrete Galois Transformation (DGT) is a variant of NTT that reduces transform length into half but requires more multiplication operations than the latest NTT algorithm in theoretical analysis. This paper proposes the split-radix DGT, a novel DGT variant utilizing the split-radix method, to reduce the computing complexity without compromising the transform length. Specifically, for length-128 polynomial, the split-radix DGT algorithm saves at least 10% multiplication operations compared with the latest NTT algorithm in theoretical analysis. Furthermore, we proposed a unified split-radix DGT processor with the dedicated stream permutation network for KyberKEM and implemented it on the Xilinx Artix-7 FPGA. The processor achieves at least 49.4% faster transformation and 65.3% faster component-wise multiplication, with at most 87% and 32% LUT-NTT area-time product and LUT-CWM area-time product, compared with the state-of-the-art polynomial multipliers in KyberKEM with the same BFU setting on similar platforms. Lastly, we designed a highly efficient KyberKEM architecture using the proposed split-radix DGT processor. The implementation results on Artix-7 FPGA show significant performance improvements over the state-of-the-art KyberKEM designs.
ISSN:2168-6750
2168-6750
DOI:10.1109/TETC.2023.3270971