https://github.com/mupq/pqm4
Revision 992f0f226503d43b6d33278ecb60a9168ed8d787 authored by Michiel Van Beirendonck on 18 February 2021, 05:57:25 UTC, committed by GitHub on 18 February 2021, 05:57:25 UTC
* This is a large commit, grouping two types of changes on top of the NTT-based Saber.

Firstly, this commit merges improvements between different Saber implementations.

1) For round 3, the Saber reference code was thoroughly refactored and the codebase reduced [https://github.com/KULeuven-COSIC/SABER]. These changes are now integrated into the m4 code.

2) All unnecessary modular reductions have been removed. The only modular reductions are now in the packing functions.

3) Packing/unpacking functions are simplified [PQClean, commit f8503cb].

4) The secret-key is stored in compressed format [ia.cr/2020/268, Section 4.1]. This reduces the secret-key size, and the packing/unpacking functions are faster. (This requires a fix in pqm4’s testvectors.c, as the secret-key is checked against the one produced by PQclean).

5) During re-encryption, the verification of the ciphertext is performed in place [ia.cr/2020/268, Section 4.2].

6) Use symlinks for Light/FireSaber to make (minimal) differences with Saber more clear.

Secondly, this commit implements some optimizations and reduces the memory footprint of the NTT-based multiplication.

1) Saber does not require any modular reduction apart from bitstream packing. Elements can be kept in int16_t (central-reduced) format.

1.a) The secret-key is sign-extended from 4-bit to 16-bit when unpacked.
1.b) The vectors b and b' are sign-extended from 10-bit to 16-bit when unpacked.
1.c) 1.a and 1.b allow to remove NTT_pk (with central reduction) and use NTT (without central reduction) uniformly.
1.d) NTT_inv and NTT_inv_inner include a final step that converts from int16_t back to mod_p or mod_q. This is not necessary and removed.

2) During encryption, the NTT of s' is only computed once and reused between A*s' and b*s'.

3) Some just-in-time memory optimizations of [ia.cr/2018/682, Section 2.2] are implemented for the NTT-based multiplication. Polynomial vectors are generated from their seed just-in-time, converted to NTT domain, and pointwise multiplied. The next polynomial vectors can reuse all the buffers.

The idea is to extend this from polynomial vectors to individual polynomials. This still requires a new my_mul function.

For {Fire,Light}Saber (keygen/encaps/decaps) the resulting implementation is approximately (2.3-2.6%/4.7-5.5%/7.4-9.5%) faster and uses (27-36%/47-61%/49-62%) less dynamic memory than the current version in pqm4.

* Add central reduction for matrix A

* Add benchmarks

* WIP : more memory-efficient NTT implementation

* Make secret key compression optional
and comment out non-stack-optimized (very slightly faster) functions

* Reclaim ~1kB more stack space

shake_out was SABER_POLYVECBYTES instead of only SABER_POLYBYTES.

Introduced a few unions to overlap memory.

* rm redundant files

* clean ups; add soft links

* Reclaim ~1kB more stack space

shake_out was SABER_POLYVECBYTES instead of only SABER_POLYBYTES.

Introduced a few unions to overlap memory.

* typo

* Noinline no longer needed without fast funcs

* add benchmarks

Co-authored-by: vincentvbh <b05902122@ntu.edu.tw>
Co-authored-by: Matthias J. Kannwischer <matthias@kannwischer.eu>
1 parent 5fb6938
History
Tip revision: 992f0f226503d43b6d33278ecb60a9168ed8d787 authored by Michiel Van Beirendonck on 18 February 2021, 05:57:25 UTC
Stack optimizations and refactoring of NTT-based Saber (#181)
Tip revision: 992f0f2
File Mode Size
common
crypto_kem
crypto_sign
hostside
ldscripts
libopencm3 @ b1d8a4c
mupq @ decc52b
.gitignore -rw-r--r-- 77 bytes
.gitmodules -rw-r--r-- 168 bytes
Makefile -rw-r--r-- 5.5 KB
README.md -rw-r--r-- 21.6 KB
benchmarks.csv -rw-r--r-- 25.8 KB
benchmarks.md -rw-r--r-- 40.3 KB
benchmarks.py -rwxr-xr-x 696 bytes
build_everything.py -rwxr-xr-x 232 bytes
convert_benchmarks.py -rwxr-xr-x 417 bytes
interface.py -rw-r--r-- 4.2 KB
requirements.txt -rw-r--r-- 14 bytes
test.py -rwxr-xr-x 228 bytes
testvectors.py -rwxr-xr-x 228 bytes

README.md

back to top