Revision - ed529e0 - Align the base when doing strided loads from constant addresses

Revision ed529e04bb181185dd68abc8681929c1cb72959c authored by Andrew Adams on 29 November 2020, 22:07:28 UTC, committed by Andrew Adams on 29 November 2020, 22:07:28 UTC

Align the base when doing strided loads from constant addresses

When we codegen something like f[ramp(x + 1, 2, 16)], where f is an
internal allocation, we subtract the 1, do the dense load f[ramp(x, 1,
32)] and then take the odd lanes of the result. The reason for this is
that it's likely that there's an f[ramp(x, 2, 16)] nearby, and aligning
down the x+1 to x means we can share the dense loads and just
deinterleave.

This PR does the same when there's no x, just an odd constant. This
means that cases like f[ramp(64, 2, 16)] + f[ramp(65, 2, 16)] now
generate much better assembly. In one case I have it speeds up an entire
pipeline by 8%, because aligning the loads in this way causes them to
all be promoted off the stack into registers.

1 parent bfbfacd

Files
Changes

Permalinks

File	Mode	Size
.github
apps
cmake
dependencies
doc
packaging
python_bindings
src
test
tools
tutorial
util
.clang-format	-rw-r--r--	1.5 KB
.clang-format-ignore	-rw-r--r--	265 bytes
.clang-tidy	-rw-r--r--	1.8 KB
.gitattributes	-rw-r--r--	342 bytes
.gitignore	-rw-r--r--	1.1 KB
.gitmodules	-rw-r--r--	0 bytes
CMakeLists.txt	-rw-r--r--	4.5 KB
CODE_OF_CONDUCT.md	-rw-r--r--	3.5 KB
LICENSE.txt	-rw-r--r--	3.2 KB
Makefile	-rw-r--r--	94.7 KB
README.md	-rw-r--r--	20.6 KB
README_cmake.md	-rw-r--r--	67.0 KB
README_rungen.md	-rw-r--r--	12.1 KB
README_webassembly.md	-rw-r--r--	7.5 KB
run-clang-format.sh	-rwxr-xr-x	1.1 KB
run-clang-tidy.sh	-rwxr-xr-x	2.8 KB

Showing with 0 additions and 0 deletions (0 / 0 diffs computed)

Computing file changes ...

Align the base when doing strided loads from constant addresses

README.md