https://github.com/halide/Halide
Revision 96698170814ff8aa4aaecacb5d23bbe2e3a1e17e authored by Andrew Adams on 16 August 2020, 20:54:08 UTC, committed by Andrew Adams on 16 August 2020, 20:54:08 UTC
BGU on CUDA had regressed from its stated performance due to the atomic
floating point adds being compiled to CAS loops due to complex indexing
expressions diverging on the LHS and RHS of the +=. Inlining less stuff
into the += operations makes it succeed again, and the schedule was
improved with a few other tweaks.

Longer-term we need a first-class way to represent += so that we're not
sensitive to this sort of divergence.
1 parent 9f55e10
History
Tip revision: 96698170814ff8aa4aaecacb5d23bbe2e3a1e17e authored by Andrew Adams on 16 August 2020, 20:54:08 UTC
Reschedule BGU to fix performance regression
Tip revision: 9669817
File Mode Size
.github
apps
cmake
dependencies
doc
packaging
python_bindings
src
test
tools
tutorial
util
.clang-format -rw-r--r-- 1.5 KB
.clang-format-ignore -rw-r--r-- 265 bytes
.clang-tidy -rw-r--r-- 469 bytes
.gitattributes -rw-r--r-- 342 bytes
.gitignore -rw-r--r-- 1.1 KB
.gitmodules -rw-r--r-- 0 bytes
CMakeLists.txt -rw-r--r-- 4.6 KB
CODE_OF_CONDUCT.md -rw-r--r-- 3.5 KB
LICENSE.txt -rw-r--r-- 3.2 KB
Makefile -rw-r--r-- 91.9 KB
README.md -rw-r--r-- 19.1 KB
README_cmake.md -rw-r--r-- 12.3 KB
README_rungen.md -rw-r--r-- 12.1 KB
README_webassembly.md -rw-r--r-- 7.5 KB

README.md

back to top