https://github.com/halide/Halide
Revision 8b68f85ee07814c1896bdbcde57f2927f11cc732 authored by Andrew Adams on 23 November 2021, 21:13:48 UTC, committed by GitHub on 23 November 2021, 21:13:48 UTC
* Avoid needless gather in fast_integer_divide lowering

fast_integer_divide did two lookups, one for a multiplier, and one for a
shift. It turns out you can just use count leading zeros to compute a
workable shift instead of having to do a lookup. This PR speeds up use
of fast_integer_divide in cases where the denominator varies across
vector lanes by ~70% or so by avoiding one of the two expensive gathers.

* Fix slash direction

* Pacify clang-tidy

* Use portable bit-counting methods

* Cleaner initialization of tables
1 parent d12fbd1
History
Tip revision: 8b68f85ee07814c1896bdbcde57f2927f11cc732 authored by Andrew Adams on 23 November 2021, 21:13:48 UTC
Avoid needless gather in fast_integer_divide lowering (#6441)
Tip revision: 8b68f85
File Mode Size
.github
apps
cmake
dependencies
doc
packaging
python_bindings
src
test
tools
tutorial
util
.clang-format -rw-r--r-- 1.4 KB
.clang-format-ignore -rw-r--r-- 265 bytes
.clang-tidy -rw-r--r-- 1.8 KB
.gitattributes -rw-r--r-- 342 bytes
.gitignore -rw-r--r-- 1.1 KB
.gitmodules -rw-r--r-- 0 bytes
CMakeLists.txt -rw-r--r-- 5.5 KB
CMakePresets.json -rw-r--r-- 5.2 KB
CODE_OF_CONDUCT.md -rw-r--r-- 3.5 KB
LICENSE.txt -rw-r--r-- 3.2 KB
Makefile -rw-r--r-- 101.0 KB
README.md -rw-r--r-- 16.4 KB
README_cmake.md -rw-r--r-- 69.2 KB
README_rungen.md -rw-r--r-- 12.1 KB
README_webassembly.md -rw-r--r-- 8.4 KB
run-clang-format.sh -rwxr-xr-x 1.4 KB
run-clang-tidy.sh -rwxr-xr-x 3.2 KB

README.md

back to top