https://github.com/mupq/pqm4
Revision 992f0f226503d43b6d33278ecb60a9168ed8d787 authored by Michiel Van Beirendonck on 18 February 2021, 05:57:25 UTC, committed by GitHub on 18 February 2021, 05:57:25 UTC
* This is a large commit, grouping two types of changes on top of the NTT-based Saber.

Firstly, this commit merges improvements between different Saber implementations.

1) For round 3, the Saber reference code was thoroughly refactored and the codebase reduced [https://github.com/KULeuven-COSIC/SABER]. These changes are now integrated into the m4 code.

2) All unnecessary modular reductions have been removed. The only modular reductions are now in the packing functions.

3) Packing/unpacking functions are simplified [PQClean, commit f8503cb].

4) The secret-key is stored in compressed format [ia.cr/2020/268, Section 4.1]. This reduces the secret-key size, and the packing/unpacking functions are faster. (This requires a fix in pqm4’s testvectors.c, as the secret-key is checked against the one produced by PQclean).

5) During re-encryption, the verification of the ciphertext is performed in place [ia.cr/2020/268, Section 4.2].

6) Use symlinks for Light/FireSaber to make (minimal) differences with Saber more clear.

Secondly, this commit implements some optimizations and reduces the memory footprint of the NTT-based multiplication.

1) Saber does not require any modular reduction apart from bitstream packing. Elements can be kept in int16_t (central-reduced) format.

1.a) The secret-key is sign-extended from 4-bit to 16-bit when unpacked.
1.b) The vectors b and b' are sign-extended from 10-bit to 16-bit when unpacked.
1.c) 1.a and 1.b allow to remove NTT_pk (with central reduction) and use NTT (without central reduction) uniformly.
1.d) NTT_inv and NTT_inv_inner include a final step that converts from int16_t back to mod_p or mod_q. This is not necessary and removed.

2) During encryption, the NTT of s' is only computed once and reused between A*s' and b*s'.

3) Some just-in-time memory optimizations of [ia.cr/2018/682, Section 2.2] are implemented for the NTT-based multiplication. Polynomial vectors are generated from their seed just-in-time, converted to NTT domain, and pointwise multiplied. The next polynomial vectors can reuse all the buffers.

The idea is to extend this from polynomial vectors to individual polynomials. This still requires a new my_mul function.

For {Fire,Light}Saber (keygen/encaps/decaps) the resulting implementation is approximately (2.3-2.6%/4.7-5.5%/7.4-9.5%) faster and uses (27-36%/47-61%/49-62%) less dynamic memory than the current version in pqm4.

* Add central reduction for matrix A

* Add benchmarks

* WIP : more memory-efficient NTT implementation

* Make secret key compression optional
and comment out non-stack-optimized (very slightly faster) functions

* Reclaim ~1kB more stack space

shake_out was SABER_POLYVECBYTES instead of only SABER_POLYBYTES.

Introduced a few unions to overlap memory.

* rm redundant files

* clean ups; add soft links

* Reclaim ~1kB more stack space

shake_out was SABER_POLYVECBYTES instead of only SABER_POLYBYTES.

Introduced a few unions to overlap memory.

* typo

* Noinline no longer needed without fast funcs

* add benchmarks

Co-authored-by: vincentvbh <b05902122@ntu.edu.tw>
Co-authored-by: Matthias J. Kannwischer <matthias@kannwischer.eu>
1 parent 5fb6938
Raw File
Tip revision: 992f0f226503d43b6d33278ecb60a9168ed8d787 authored by Michiel Van Beirendonck on 18 February 2021, 05:57:25 UTC
Stack optimizations and refactoring of NTT-based Saber (#181)
Tip revision: 992f0f2
Makefile
OPENCM3DIR  = ./libopencm3
OPENCM3NAME = opencm3_stm32f4
OPENCM3FILE = $(OPENCM3DIR)/lib/lib$(OPENCM3NAME).a
LDSCRIPT    = ldscripts/stm32f405x6.ld

PREFIX     ?= arm-none-eabi
CC          = $(PREFIX)-gcc
LD          = $(PREFIX)-gcc
OBJCOPY     = $(PREFIX)-objcopy

ARCH_FLAGS  = -mthumb -mcpu=cortex-m4 -mfloat-abi=hard -mfpu=fpv4-sp-d16
DEFINES     = -DSTM32F4

CFLAGS     += -O3 -std=gnu99 \
              -Wall -Wextra -Wimplicit-function-declaration \
              -Wredundant-decls -Wmissing-prototypes -Wstrict-prototypes \
              -Wundef -Wshadow \
              -I$(OPENCM3DIR)/include \
              -fno-common $(ARCH_FLAGS) -MD $(DEFINES)

CC_HOST    = gcc
LD_HOST    = gcc

CFLAGS_HOST = -O3 -Wall -Wextra -Wpedantic
LDFLAGS_HOST = -lm

# override as desired
TYPE=kem

COMMONSOURCES=mupq/common/fips202.c mupq/common/sp800-185.c mupq/common/nistseedexpander.c
COMMONSOURCES_HOST=$(COMMONSOURCES) mupq/common/keccakf1600.c mupq/pqclean/common/aes.c mupq/pqclean/common/sha2.c
COMMONSOURCES_M4=$(COMMONSOURCES) common/keccakf1600.S common/aes.c common/aes-encrypt.S common/aes-keyschedule.S common/aes-publicinputs.c common/aes-publicinputs.S mupq/common/sha2.c common/crypto_hashblocks_sha512.c common/crypto_hashblocks_sha512_inner32.s

COMMONINCLUDES=-I"mupq/common"
COMMONINCLUDES_M4=-I"common" $(COMMONINCLUDES)
COMMONINCLUDES_HOST=$(COMMONINCLUDES) -I"mupq/pqclean/common"

RANDOMBYTES_M4=common/randombytes.c

DEST_HOST=bin-host
DEST=bin

TARGET_NAME = $(shell echo $(IMPLEMENTATION_PATH) | sed 's@/@_@g')
TYPE = $(shell echo $(IMPLEMENTATION_PATH) | sed 's@^\([^/]*/\)*crypto_\([^/]*\)/.*$$@\2@')
IMPLEMENTATION_SOURCES = $(wildcard $(IMPLEMENTATION_PATH)/*.c) $(wildcard $(IMPLEMENTATION_PATH)/*.s) $(wildcard $(IMPLEMENTATION_PATH)/*.S)
IMPLEMENTATION_HEADERS = $(IMPLEMENTATION_PATH)/*.h

# allow schemes to use implementation-specific linker scripts
ifneq ("$(wildcard ldscripts/$(TARGET_NAME).ld)","")
    LDSCRIPT = ldscripts/$(TARGET_NAME).ld
endif

LDFLAGS    += --static -Wl,--start-group -lc -lgcc -lnosys -Wl,--end-group \
              -T$(LDSCRIPT) -nostartfiles -Wl,--gc-sections \
               $(ARCH_FLAGS) -L$(OPENCM3DIR)/lib -lm -l$(OPENCM3NAME)

.PHONY: all
all:
	@echo "Please use the scripts in this directory instead of using the Makefile"
	@echo
	@echo "If you really want to use it, please specify IMPLEMENTATION_PATH=path/to/impl"
	@echo "and a target binary, e.g.,"
	@echo "make IMPLEMENTATION_PATH=crypto_kem/kyber768/m4 bin/crypto_kem_kyber768_m4_test.bin"
	@echo "make clean also works"

$(DEST_HOST)/%_testvectors: $(COMMONSOURCES_HOST) $(IMPLEMENTATION_SOURCES) $(IMPLEMENTATION_HEADERS)
	mkdir -p $(DEST_HOST)
	$(CC_HOST) -o $@ \
		$(CFLAGS_HOST) -DMUPQ_NAMESPACE=$(MUPQ_NAMESPACE)\
		mupq/crypto_$(TYPE)/testvectors-host.c \
		$(COMMONSOURCES_HOST) \
		$(IMPLEMENTATION_SOURCES) \
		-I$(IMPLEMENTATION_PATH) \
		$(COMMONINCLUDES_HOST) \
		$(LDFLAGS_HOST)

$(DEST)/%.bin: elf/%.elf
	mkdir -p $(DEST)
	$(OBJCOPY) -Obinary $^ $@


# pattern rules, intended to match % to the type of test (i.e. test, speed, stack)
# note that this excludes testvectors, as that is a special case that provides its own randombytes
# TODO use notrandombytes more generically rather than included in testvectors.c
elf/$(TARGET_NAME)_%.elf: mupq/crypto_$(TYPE)/%.c $(COMMONSOURCES_M4) $(RANDOMBYTES_M4) $(IMPLEMENTATION_SOURCES) $(IMPLEMENTATION_HEADERS) $(OPENCM3FILE) common/hal-stm32f4.c
	mkdir -p elf
	$(CC) -o $@ $(CFLAGS) -DMUPQ_NAMESPACE=$(MUPQ_NAMESPACE) \
		$< $(COMMONSOURCES_M4) $(RANDOMBYTES_M4) $(IMPLEMENTATION_SOURCES) common/hal-stm32f4.c \
		-I$(IMPLEMENTATION_PATH) $(COMMONINCLUDES_M4) $(LDFLAGS)

elf/$(TARGET_NAME)_testvectors.elf: mupq/crypto_$(TYPE)/testvectors.c $(COMMONSOURCES_M4) $(IMPLEMENTATION_SOURCES) $(IMPLEMENTATION_HEADERS) $(OPENCM3FILE) common/hal-stm32f4.c
	mkdir -p elf
	$(CC) -o $@ $(CFLAGS) -DMUPQ_NAMESPACE=$(MUPQ_NAMESPACE)\
		$< $(COMMONSOURCES_M4) $(IMPLEMENTATION_SOURCES) common/hal-stm32f4.c \
		-I$(IMPLEMENTATION_PATH) $(COMMONINCLUDES_M4) $(LDFLAGS)

elf/$(TARGET_NAME)_hashing.elf: mupq/crypto_$(TYPE)/hashing.c $(COMMONSOURCES_M4) $(IMPLEMENTATION_SOURCES) $(IMPLEMENTATION_HEADERS) $(OPENCM3FILE) common/hal-stm32f4.c
	mkdir -p elf
	$(CC) -o $@ $(CFLAGS) -DPROFILE_HASHING -DMUPQ_NAMESPACE=$(MUPQ_NAMESPACE) \
		$< $(COMMONSOURCES_M4) $(RANDOMBYTES_M4) $(IMPLEMENTATION_SOURCES) common/hal-stm32f4.c \
		-I$(IMPLEMENTATION_PATH) $(COMMONINCLUDES_M4) $(LDFLAGS)

obj/$(TARGET_NAME)_%.o: $(IMPLEMENTATION_PATH)/%.c $(IMPLEMENTATION_HEADERS)
	mkdir -p obj
	$(CC) -o $@ -c $(CFLAGS) -DMUPQ_NAMESPACE=$(MUPQ_NAMESPACE) \
		-I$(IMPLEMENTATION_PATH) $(COMMONINCLUDES_M4) $<

obj/$(TARGET_NAME)_%.o: $(IMPLEMENTATION_PATH)/%.s $(IMPLEMENTATION_HEADERS)
	mkdir -p obj
	$(CC) -o $@ -c $(CFLAGS) -DMUPQ_NAMESPACE=$(MUPQ_NAMESPACE) \
		-I$(IMPLEMENTATION_PATH) $(COMMONINCLUDES_M4) $<

obj/$(TARGET_NAME)_%.o: $(IMPLEMENTATION_PATH)/%.S $(IMPLEMENTATION_HEADERS)
	mkdir -p obj
	$(CC) -o $@ -c $(CFLAGS) -DMUPQ_NAMESPACE=$(MUPQ_NAMESPACE) \
		-I$(IMPLEMENTATION_PATH) $(COMMONINCLUDES_M4) $<

$(OPENCM3FILE):
	@if [ ! "`ls -A $(OPENCM3_DIR)`" ] ; then \
		printf "######## ERROR ########\n"; \
		printf "\tlibopencm3 is not initialized.\n"; \
		printf "\tPlease run (in the root directory):\n"; \
		printf "\t$$ git submodule init\n"; \
		printf "\t$$ git submodule update\n"; \
		printf "\tbefore running make.\n"; \
		printf "######## ERROR ########\n"; \
		exit 1; \
		fi
	make -C $(OPENCM3DIR)

.PHONY: clean libclean

clean:
	rm -rf elf/
	rm -rf bin/
	rm -rf bin-host/
	rm -rf obj/
	rm -rf testvectors/
	rm -rf benchmarks/

libclean:
	make -C $(OPENCM3DIR) clean
back to top