https://github.com/halide/Halide
Revision 8b68f85ee07814c1896bdbcde57f2927f11cc732 authored by Andrew Adams on 23 November 2021, 21:13:48 UTC, committed by GitHub on 23 November 2021, 21:13:48 UTC
* Avoid needless gather in fast_integer_divide lowering

fast_integer_divide did two lookups, one for a multiplier, and one for a
shift. It turns out you can just use count leading zeros to compute a
workable shift instead of having to do a lookup. This PR speeds up use
of fast_integer_divide in cases where the denominator varies across
vector lanes by ~70% or so by avoiding one of the two expensive gathers.

* Fix slash direction

* Pacify clang-tidy

* Use portable bit-counting methods

* Cleaner initialization of tables
1 parent d12fbd1
History
Tip revision: 8b68f85ee07814c1896bdbcde57f2927f11cc732 authored by Andrew Adams on 23 November 2021, 21:13:48 UTC
Avoid needless gather in fast_integer_divide lowering (#6441)
Tip revision: 8b68f85
File Mode Size
AddCudaToTarget.cmake -rw-r--r-- 1.6 KB
BundleStatic.cmake -rw-r--r-- 7.7 KB
CheckFilesExist.cmake -rw-r--r-- 455 bytes
HalideGeneratorHelpers.cmake -rw-r--r-- 19.1 KB
HalideTargetHelpers.cmake -rw-r--r-- 2.4 KB
HalideTestHelpers.cmake -rw-r--r-- 3.3 KB
MakeShellPath.cmake -rw-r--r-- 350 bytes
TargetExportScript.cmake -rw-r--r-- 2.2 KB
WipeStandardFlags.cmake -rw-r--r-- 702 bytes
toolchain.linux-aarch64.cmake -rw-r--r-- 689 bytes
toolchain.linux-arm32.cmake -rw-r--r-- 1.3 KB
toolchain.linux-i386.cmake -rw-r--r-- 612 bytes

back to top