Loading…

Faster multiplication over F2[X] using AVX512 instruction set and VPCLMULQDQ instruction

Code-based cryptography is one of the main propositions for the post-quantum cryptographic context, and several protocols of this kind have been submitted on the NIST platform. Among them, BIKE and HQC are part of the five alternate candidates selected in the third round of the NIST standardization...

Full description

Saved in:
Bibliographic Details
Published in:Journal of cryptographic engineering 2023, Vol.13 (1), p.37-55
Main Authors: Robert, Jean-Marc, Véron, Pascal
Format: Article
Language:English
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by
cites
container_end_page 55
container_issue 1
container_start_page 37
container_title Journal of cryptographic engineering
container_volume 13
creator Robert, Jean-Marc
Véron, Pascal
description Code-based cryptography is one of the main propositions for the post-quantum cryptographic context, and several protocols of this kind have been submitted on the NIST platform. Among them, BIKE and HQC are part of the five alternate candidates selected in the third round of the NIST standardization process in the KEM category. These two schemes make use of multiplication of large polynomials over binary rings, and due to the polynomial size (from 10,000 to 60,000 bits), this operation is one of the costliest during key generation, encapsulation, or decapsulation mechanisms. In BIKE-2, there is also a polynomial inversion which is time-consuming and this problem has been addressed in Drucker (Fast polynomial inversion for post quantum QC-MDPC cryptography, 2020). In this work, we revisit the different existing constant-time algorithms for arbitrary polynomial multiplication. We explore the different Karatsuba and Toom–Cook constructions in order to determine the best combinations for each polynomial degree range, in the context of AVX2 and AVX512 instruction sets. This leads to different kernels and constructions in each case. In particular, in the context of AVX512, we use the VPCLMULQDQ instruction, which is a vectorized binary polynomial multiplication instruction. This instruction deals with up to four polynomial (of degree up to 63) multiplications, that is four operand pairs of 64-bit words with 128-bit word storing each result, the four results being stored in one single 512-bit word. This allows to divide by roughly 3 the retired instruction number of the operation in comparison with the AVX2 instruction set implementations, while the speedup is up to 39% in terms of processor clock cycles. These results are different than the ones estimated in Drucker (Fast multiplication of binary polynomials with the forthcoming vectorized vpclmulqdq instruction, 2018). To illustrate the benefit of the new VPCLMULQDQ instruction, we used the HQC code to evaluate our approaches. When implemented in the HQC protocol, for the security levels 128, 192, and 256, our approaches provide up to 12% speedup, for key pair generation.
doi_str_mv 10.1007/s13389-021-00278-3
format article
fullrecord <record><control><sourceid>proquest_sprin</sourceid><recordid>TN_cdi_proquest_journals_2792548841</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2792548841</sourcerecordid><originalsourceid>FETCH-LOGICAL-p1423-53b58167309bdb3f595aa26be9864ca2a0e397fca7639d415ab02d47e0e104653</originalsourceid><addsrcrecordid>eNpNkFFLwzAQx4MoOOa-gE8Bn6OXXNIkj2M6J1R04MZAJKRtJpXZ1qb181s3UZ_uuP-PO-5HyDmHSw6gryJHNJaB4AxAaMPwiIwEt8CM4snxbw_mlExiLDNAlKCUxhHZzH3sQkvf-11XNrsy911ZV7T-HGZz8bx5oX0sq1c6XW8UF7SsYtf2-Z6JoaO-Kuj6cZber9Ll9fJ_fEZOtn4Xw-SnjslqfvM0W7D04fZuNk1Zw6VApjBThicawWZFhltllfciyYI1icy98BDQ6m3udYK2kFz5DEQhdYDAQSYKx-TisLdp648-xM691X1bDSed0FYoaYzkA4UHKjbt8E5o_ygO7tuiO1h0g0W3t-gQvwBfamNQ</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2792548841</pqid></control><display><type>article</type><title>Faster multiplication over F2[X] using AVX512 instruction set and VPCLMULQDQ instruction</title><source>Springer Nature</source><creator>Robert, Jean-Marc ; Véron, Pascal</creator><creatorcontrib>Robert, Jean-Marc ; Véron, Pascal</creatorcontrib><description>Code-based cryptography is one of the main propositions for the post-quantum cryptographic context, and several protocols of this kind have been submitted on the NIST platform. Among them, BIKE and HQC are part of the five alternate candidates selected in the third round of the NIST standardization process in the KEM category. These two schemes make use of multiplication of large polynomials over binary rings, and due to the polynomial size (from 10,000 to 60,000 bits), this operation is one of the costliest during key generation, encapsulation, or decapsulation mechanisms. In BIKE-2, there is also a polynomial inversion which is time-consuming and this problem has been addressed in Drucker (Fast polynomial inversion for post quantum QC-MDPC cryptography, 2020). In this work, we revisit the different existing constant-time algorithms for arbitrary polynomial multiplication. We explore the different Karatsuba and Toom–Cook constructions in order to determine the best combinations for each polynomial degree range, in the context of AVX2 and AVX512 instruction sets. This leads to different kernels and constructions in each case. In particular, in the context of AVX512, we use the VPCLMULQDQ instruction, which is a vectorized binary polynomial multiplication instruction. This instruction deals with up to four polynomial (of degree up to 63) multiplications, that is four operand pairs of 64-bit words with 128-bit word storing each result, the four results being stored in one single 512-bit word. This allows to divide by roughly 3 the retired instruction number of the operation in comparison with the AVX2 instruction set implementations, while the speedup is up to 39% in terms of processor clock cycles. These results are different than the ones estimated in Drucker (Fast multiplication of binary polynomials with the forthcoming vectorized vpclmulqdq instruction, 2018). To illustrate the benefit of the new VPCLMULQDQ instruction, we used the HQC code to evaluate our approaches. When implemented in the HQC protocol, for the security levels 128, 192, and 256, our approaches provide up to 12% speedup, for key pair generation.</description><identifier>ISSN: 2190-8508</identifier><identifier>EISSN: 2190-8516</identifier><identifier>DOI: 10.1007/s13389-021-00278-3</identifier><language>eng</language><publisher>Berlin/Heidelberg: Springer Berlin Heidelberg</publisher><subject>Algorithms ; Circuits and Systems ; Communications Engineering ; Computer Communication Networks ; Computer Science ; Context ; Cryptography ; Cryptology ; Data Structures and Information Theory ; Instruction sets (computers) ; Microprocessors ; Networks ; Operating Systems ; Polynomials ; Regular Paper ; Rings (mathematics)</subject><ispartof>Journal of cryptographic engineering, 2023, Vol.13 (1), p.37-55</ispartof><rights>The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature 2021</rights><rights>The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature 2021.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,780,784,27924,27925</link.rule.ids></links><search><creatorcontrib>Robert, Jean-Marc</creatorcontrib><creatorcontrib>Véron, Pascal</creatorcontrib><title>Faster multiplication over F2[X] using AVX512 instruction set and VPCLMULQDQ instruction</title><title>Journal of cryptographic engineering</title><addtitle>J Cryptogr Eng</addtitle><description>Code-based cryptography is one of the main propositions for the post-quantum cryptographic context, and several protocols of this kind have been submitted on the NIST platform. Among them, BIKE and HQC are part of the five alternate candidates selected in the third round of the NIST standardization process in the KEM category. These two schemes make use of multiplication of large polynomials over binary rings, and due to the polynomial size (from 10,000 to 60,000 bits), this operation is one of the costliest during key generation, encapsulation, or decapsulation mechanisms. In BIKE-2, there is also a polynomial inversion which is time-consuming and this problem has been addressed in Drucker (Fast polynomial inversion for post quantum QC-MDPC cryptography, 2020). In this work, we revisit the different existing constant-time algorithms for arbitrary polynomial multiplication. We explore the different Karatsuba and Toom–Cook constructions in order to determine the best combinations for each polynomial degree range, in the context of AVX2 and AVX512 instruction sets. This leads to different kernels and constructions in each case. In particular, in the context of AVX512, we use the VPCLMULQDQ instruction, which is a vectorized binary polynomial multiplication instruction. This instruction deals with up to four polynomial (of degree up to 63) multiplications, that is four operand pairs of 64-bit words with 128-bit word storing each result, the four results being stored in one single 512-bit word. This allows to divide by roughly 3 the retired instruction number of the operation in comparison with the AVX2 instruction set implementations, while the speedup is up to 39% in terms of processor clock cycles. These results are different than the ones estimated in Drucker (Fast multiplication of binary polynomials with the forthcoming vectorized vpclmulqdq instruction, 2018). To illustrate the benefit of the new VPCLMULQDQ instruction, we used the HQC code to evaluate our approaches. When implemented in the HQC protocol, for the security levels 128, 192, and 256, our approaches provide up to 12% speedup, for key pair generation.</description><subject>Algorithms</subject><subject>Circuits and Systems</subject><subject>Communications Engineering</subject><subject>Computer Communication Networks</subject><subject>Computer Science</subject><subject>Context</subject><subject>Cryptography</subject><subject>Cryptology</subject><subject>Data Structures and Information Theory</subject><subject>Instruction sets (computers)</subject><subject>Microprocessors</subject><subject>Networks</subject><subject>Operating Systems</subject><subject>Polynomials</subject><subject>Regular Paper</subject><subject>Rings (mathematics)</subject><issn>2190-8508</issn><issn>2190-8516</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid/><recordid>eNpNkFFLwzAQx4MoOOa-gE8Bn6OXXNIkj2M6J1R04MZAJKRtJpXZ1qb181s3UZ_uuP-PO-5HyDmHSw6gryJHNJaB4AxAaMPwiIwEt8CM4snxbw_mlExiLDNAlKCUxhHZzH3sQkvf-11XNrsy911ZV7T-HGZz8bx5oX0sq1c6XW8UF7SsYtf2-Z6JoaO-Kuj6cZber9Ll9fJ_fEZOtn4Xw-SnjslqfvM0W7D04fZuNk1Zw6VApjBThicawWZFhltllfciyYI1icy98BDQ6m3udYK2kFz5DEQhdYDAQSYKx-TisLdp648-xM691X1bDSed0FYoaYzkA4UHKjbt8E5o_ygO7tuiO1h0g0W3t-gQvwBfamNQ</recordid><startdate>2023</startdate><enddate>2023</enddate><creator>Robert, Jean-Marc</creator><creator>Véron, Pascal</creator><general>Springer Berlin Heidelberg</general><general>Springer Nature B.V</general><scope/></search><sort><creationdate>2023</creationdate><title>Faster multiplication over F2[X] using AVX512 instruction set and VPCLMULQDQ instruction</title><author>Robert, Jean-Marc ; Véron, Pascal</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-p1423-53b58167309bdb3f595aa26be9864ca2a0e397fca7639d415ab02d47e0e104653</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Algorithms</topic><topic>Circuits and Systems</topic><topic>Communications Engineering</topic><topic>Computer Communication Networks</topic><topic>Computer Science</topic><topic>Context</topic><topic>Cryptography</topic><topic>Cryptology</topic><topic>Data Structures and Information Theory</topic><topic>Instruction sets (computers)</topic><topic>Microprocessors</topic><topic>Networks</topic><topic>Operating Systems</topic><topic>Polynomials</topic><topic>Regular Paper</topic><topic>Rings (mathematics)</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Robert, Jean-Marc</creatorcontrib><creatorcontrib>Véron, Pascal</creatorcontrib><jtitle>Journal of cryptographic engineering</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Robert, Jean-Marc</au><au>Véron, Pascal</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Faster multiplication over F2[X] using AVX512 instruction set and VPCLMULQDQ instruction</atitle><jtitle>Journal of cryptographic engineering</jtitle><stitle>J Cryptogr Eng</stitle><date>2023</date><risdate>2023</risdate><volume>13</volume><issue>1</issue><spage>37</spage><epage>55</epage><pages>37-55</pages><issn>2190-8508</issn><eissn>2190-8516</eissn><abstract>Code-based cryptography is one of the main propositions for the post-quantum cryptographic context, and several protocols of this kind have been submitted on the NIST platform. Among them, BIKE and HQC are part of the five alternate candidates selected in the third round of the NIST standardization process in the KEM category. These two schemes make use of multiplication of large polynomials over binary rings, and due to the polynomial size (from 10,000 to 60,000 bits), this operation is one of the costliest during key generation, encapsulation, or decapsulation mechanisms. In BIKE-2, there is also a polynomial inversion which is time-consuming and this problem has been addressed in Drucker (Fast polynomial inversion for post quantum QC-MDPC cryptography, 2020). In this work, we revisit the different existing constant-time algorithms for arbitrary polynomial multiplication. We explore the different Karatsuba and Toom–Cook constructions in order to determine the best combinations for each polynomial degree range, in the context of AVX2 and AVX512 instruction sets. This leads to different kernels and constructions in each case. In particular, in the context of AVX512, we use the VPCLMULQDQ instruction, which is a vectorized binary polynomial multiplication instruction. This instruction deals with up to four polynomial (of degree up to 63) multiplications, that is four operand pairs of 64-bit words with 128-bit word storing each result, the four results being stored in one single 512-bit word. This allows to divide by roughly 3 the retired instruction number of the operation in comparison with the AVX2 instruction set implementations, while the speedup is up to 39% in terms of processor clock cycles. These results are different than the ones estimated in Drucker (Fast multiplication of binary polynomials with the forthcoming vectorized vpclmulqdq instruction, 2018). To illustrate the benefit of the new VPCLMULQDQ instruction, we used the HQC code to evaluate our approaches. When implemented in the HQC protocol, for the security levels 128, 192, and 256, our approaches provide up to 12% speedup, for key pair generation.</abstract><cop>Berlin/Heidelberg</cop><pub>Springer Berlin Heidelberg</pub><doi>10.1007/s13389-021-00278-3</doi><tpages>19</tpages></addata></record>
fulltext fulltext
identifier ISSN: 2190-8508
ispartof Journal of cryptographic engineering, 2023, Vol.13 (1), p.37-55
issn 2190-8508
2190-8516
language eng
recordid cdi_proquest_journals_2792548841
source Springer Nature
subjects Algorithms
Circuits and Systems
Communications Engineering
Computer Communication Networks
Computer Science
Context
Cryptography
Cryptology
Data Structures and Information Theory
Instruction sets (computers)
Microprocessors
Networks
Operating Systems
Polynomials
Regular Paper
Rings (mathematics)
title Faster multiplication over F2[X] using AVX512 instruction set and VPCLMULQDQ instruction
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-30T13%3A12%3A09IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_sprin&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Faster%20multiplication%20over%20F2%5BX%5D%20using%20AVX512%20instruction%20set%20and%20VPCLMULQDQ%20instruction&rft.jtitle=Journal%20of%20cryptographic%20engineering&rft.au=Robert,%20Jean-Marc&rft.date=2023&rft.volume=13&rft.issue=1&rft.spage=37&rft.epage=55&rft.pages=37-55&rft.issn=2190-8508&rft.eissn=2190-8516&rft_id=info:doi/10.1007/s13389-021-00278-3&rft_dat=%3Cproquest_sprin%3E2792548841%3C/proquest_sprin%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-p1423-53b58167309bdb3f595aa26be9864ca2a0e397fca7639d415ab02d47e0e104653%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2792548841&rft_id=info:pmid/&rfr_iscdi=true