Revision history - refs/heads/vksnk/fix_halide_xtensa_narrow_with_rounding_shift_i16 - origin: https://github.com/halide/Halide

visit type:

Newer
Older

Revision	Author	Date	Message	Commit Date
48d52c4	Volodymyr Kysenko	06 January 2022, 18:42:41 UTC	Alternative implementation of halide_xtensa_narrow_with_rounding_shift_i16 Change-Id: I2c3f8a40c5279ec09bcc18b077b9badd7ee253fe	06 January 2022, 18:42:41 UTC
f694314	Volodymyr Kysenko	06 January 2022, 17:45:18 UTC	Merge branch 'xtensa-codegen' of https://github.com/halide/Halide into xtensa-codegen Change-Id: I54c3f9ff3ca3eec9b4f294c6833645a1ca3e14b3	06 January 2022, 17:45:18 UTC
8688394	Volodymyr Kysenko	06 January 2022, 17:44:37 UTC	Disable halide_xtensa_narrow_with_rounding_shift_i16 due to (likely) a compiler bug Change-Id: Ib88961d8d08332e34ee69cd136eff9647387a965	06 January 2022, 17:44:37 UTC
7484f22	Steven Johnson	06 January 2022, 01:32:14 UTC	Update CodeGen_Xtensa.cpp	06 January 2022, 01:32:14 UTC
1212efb	Steven Johnson	06 January 2022, 00:46:45 UTC	Avoid unused-var warning/error	06 January 2022, 00:46:45 UTC
b244a83	Steven Johnson	06 January 2022, 00:13:02 UTC	Merge branch 'master' into xtensa-codegen	06 January 2022, 00:13:02 UTC
6f7d5ce	Steven Johnson	06 January 2022, 00:12:51 UTC	Revert "Make it possible to interpret a wide type as multiple smaller elements (#6506)" (#6541) This reverts commit 1b180a8e93339aac2d19db57d2ef99b67253a0bc.	06 January 2022, 00:12:51 UTC
3f4feb6	Steven Johnson	05 January 2022, 21:46:37 UTC	Merge branch 'master' into xtensa-codegen	05 January 2022, 21:46:37 UTC
935c05e	Steven Johnson	05 January 2022, 21:46:06 UTC	Fix GeneratorOutput_Buffer::set_estimates() (#6540) The existing wrapper wouldn't work for Outputs that have Tuple-valued elemets.	05 January 2022, 21:46:06 UTC
95737be	Alex Reinking	05 January 2022, 21:45:18 UTC	Update CMake documentation (#6535) * Allow third parties to externally override SOVERSION Debian and other third-party packagers might need or want to patch our sources for a variety of reasons. In those cases, they might also need to override the SOVERSION. See here for a practical example: https://salsa.debian.org/pkg-llvm-team/halide/-/blob/f881de70cd83095053e13047b63f61faf6bc7a36/debian/patches/0006-Fixup-libhalide-version-soversion-for-debian-package.patch * Update CMake documentation. * Fix typo * Add link to ToC	05 January 2022, 21:45:18 UTC
7bb8198	Alex Reinking	05 January 2022, 21:44:36 UTC	Allow third parties to externally override SOVERSION (#6534) Debian and other third-party packagers might need or want to patch our sources for a variety of reasons. In those cases, they might also need to override the SOVERSION. See here for a practical example: https://salsa.debian.org/pkg-llvm-team/halide/-/blob/f881de70cd83095053e13047b63f61faf6bc7a36/debian/patches/0006-Fixup-libhalide-version-soversion-for-debian-package.patch	05 January 2022, 21:44:36 UTC
4960a00	Volodymyr Kysenko	05 January 2022, 21:38:29 UTC	make format Change-Id: I3aa5d16074b5cf64c6bad76a3168848e92d1ee53	05 January 2022, 21:38:29 UTC
fee1abb	Volodymyr Kysenko	05 January 2022, 21:33:02 UTC	Add reinterpret to the list of ops which don't need slicing Change-Id: I315fc4f9af3e6e1398fdd0b1bec3be6f82123679	05 January 2022, 21:33:02 UTC
f8459da	Steven Johnson	05 January 2022, 19:10:34 UTC	Remove unnecessary `std::move` calls (#6537) Compilers with `-Werror` will fail with `error: moving a temporary object prevents copy elision`	05 January 2022, 19:10:34 UTC
50edb64	Steven Johnson	05 January 2022, 19:10:03 UTC	Revert "Make random faster by putting the innermost var last (#6504)" (#6538) This reverts commit 00211656fd208c5e6eb28f943dbbe8c65b45622f.	05 January 2022, 19:10:03 UTC
8e4b09f	Volodymyr Kysenko	05 January 2022, 05:33:30 UTC	Optimizations: * better type conversions * narrowing rounding right shift * remove debug pring from DMA initializer * 2x and 4x vector reduce patterns. Change-Id: I460233c765a95aebcd906da4c1b16db751d91bfc	05 January 2022, 05:33:30 UTC
7d2713a	Steven Johnson	05 January 2022, 00:21:45 UTC	Add forwarding method & python wrapper for Func::dma()	05 January 2022, 00:21:45 UTC
60f9c32	Steven Johnson	05 January 2022, 00:10:13 UTC	Merge branch 'master' into xtensa-codegen	05 January 2022, 00:10:13 UTC
3a4e4c7	Infinoid	04 January 2022, 23:35:51 UTC	If cmake built a python module, teach cmake to install the python module. (#6523)	04 January 2022, 23:35:51 UTC
b8eb22d	Roman Lebedev	04 January 2022, 21:56:22 UTC	Fix Python GIL lock handling (Fixes #6524, Fixes #5631) (#6525) * Fix Python GIL lock handling (Fixes #6524, Fixes #5631) As disscussed in https://github.com/halide/Halide/pull/6523#issuecomment-1003545664 and later in https://github.com/halide/Halide/issues/6524, pybind11 v2.8.1 added some defensive checks that fail for halide, namely in `python_tutorial_lesson_04_debugging_2` and `python_tutorial_lesson_05_scheduling_1`. https://github.com/halide/Halide/issues/6524#issuecomment-1003569810 notes: > * Python calls a Halide-JIT-generated function , which runs with the GIL held. > * Halide runtime spawns worker threads. > * The worker threads try to call pybind11's py::print function to emit traces. > * Pybind11 complains, correctly, that the worker thread doesn't hold the GIL. > > Trying to acquire the GIL hangs, because the main thread is still holding it. I tried teaching the main thread to release the GIL (as suggested in #5631), but I still saw hangs when I tried this. I have tried, and just dropping the lock before calling into halide, or just acquiring it in `halide_python_print` doesn't work, we need to do both. I have verified that the two tests fail without this fix, and pass with it.	04 January 2022, 21:56:22 UTC
bce2ef4	Roman Lebedev	04 January 2022, 21:06:29 UTC	Install Python tutorials (#6530) * Install Python tutorials I know we have previously discussed that `TYPE DOC` should be used, but unfortunately i'm not sure that will work here, because doc/tutorial directory is already occupied by C++ tutorials, and i don't think they should be mixed. I'm open to alternative suggestions.	04 January 2022, 21:06:29 UTC
0021165	Andrew Adams	04 January 2022, 16:40:23 UTC	Make random faster by putting the innermost var last (#6504) * Make random 2x faster by putting the innermost var last * Improve period of low bits of random noise * Add new rewrite rules for quadratics By pulling constant additions outside of quadratics, we can shave off a few add instructions in the inner loop for random number generation, which uses a quadratic modulo 2^32 I also removed the !overflows predicates, because rules already fail to match if a fold overflows. New rules formally verified. * Make expensive_zero actually always zero	04 January 2022, 16:40:23 UTC
f11d820	Roman Lebedev	04 January 2022, 16:34:58 UTC	Implement SanitizerCoverage support (Refs. #6513) (#6517) * Implement SanitizerCoverage support (Refs. #6513) Please refer to https://clang.llvm.org/docs/SanitizerCoverage.html TLDR: `ModuleSanitizerCoveragePass` instruments the IR by inserting calls to callbacks at certain constructs. What the callbacks should do is up to the implementation. They are effectively required for fuzzing to be effective, and are provided by e.g. libfuzzer. One huge caveat is `SanitizerCoverageOptions` which controls which which callbacks should actually be inserted. I just don't know what to do about it. Right now i have hardcoded the set that would have been enabled by `-fsanitize=fuzzer-no-link`, because the alternative, due to halide unflexibility, would be to introduce ~16 suboptions to control each one. * Simplify test * sancov test: avoid potential signedness warnings. * Rename all instances of sancov to sanitizecoverage * Adjust spelling of "SanitizerCoverage" in some places * Actually adjust the feature name in build system for the test * Hopefully fix Makefile build Co-authored-by: Steven Johnson <srj@google.com>	04 January 2022, 16:34:58 UTC
7eb9949	Roman Lebedev	04 January 2022, 16:32:52 UTC	[NFC-ish] Finish MSAN handling (#6516) Somehow, initially i missed that there was MSan support, so it might be good to actually mention that we don't need to run any MSan passes here, and that we didn't forget to run them. Secondly, it seems inconsistent not annotate the functions with `Attribute::SanitizeMemory`, like we do for others. I suppose it isn't strictly required, since they are used to actually drive the instrumentation passes, and we don't run MSan pass, but they are also used to disable some LLVM optimizations, and that //might// be important. Or not, but then i suppose there should be a comment about it? Co-authored-by: Steven Johnson <srj@google.com>	04 January 2022, 16:32:52 UTC
5c33902	Andrew Adams	04 January 2022, 16:08:43 UTC	free shape storage last (#6511) Some decref-triggered runtime methods need the shape Fixes #6509 Co-authored-by: Steven Johnson <srj@google.com>	04 January 2022, 16:08:43 UTC
0089de9	Andrew Adams	04 January 2022, 16:08:28 UTC	Handle mixed-width args to mul-shift-right (#6526) and codegen it to pmulhuw on x86 Co-authored-by: Steven Johnson <srj@google.com>	04 January 2022, 16:08:28 UTC
1b180a8	Andrew Adams	03 January 2022, 23:04:31 UTC	Make it possible to interpret a wide type as multiple smaller elements (#6506) * Make it possible to interpret a wide type as multiple smaller elements This is helpful for things like reinterpreting 32-bit packed rgba values as individual components for free. * clang-format	03 January 2022, 23:04:31 UTC
f9ea2d4	Steven Johnson	03 January 2022, 22:12:35 UTC	Fix use-after-free bug in SlidingWindow.cpp (#6527)	03 January 2022, 22:12:35 UTC
2651402	Steven Johnson	03 January 2022, 20:45:32 UTC	Fix simd-op-check for top-of-tree LLVM (#6529) * Fix simd-op-check for top-of-tree LLVM * Update simd_op_check.cpp	03 January 2022, 20:45:32 UTC
9a530b1	Roman Lebedev	29 December 2021, 22:58:10 UTC	Fix weird CMake issue with custom LLVM (#6519) Without this, cmake fails with: ``` CMake Error in dependencies/llvm/CMakeLists.txt: Target "Halide_LLVM" INTERFACE_INCLUDE_DIRECTORIES property contains path: "/repositories/halide/dependencies/llvm/" which is prefixed in the source directory. ``` `LLVM_INCLUDE_DIRS` there is `/repositories/llvm-project/llvm/include;/builddirs/llvm-project/build-Clang13/include`, and `INTERFACE_INCLUDE_DIRECTORIES`'s property beforehand is `` (empty), but after this line it suddenly becomes `/repositories/halide/dependencies/llvm/$<BUILD_INTERFACE:/repositories/llvm-project/llvm/include;/builddirs/llvm-project/build-Clang13/include>`. This is quite obscure. I don't really understand what is going on, but with the patch it builds fine.	29 December 2021, 22:58:10 UTC
6ed65ba	Roman Lebedev	29 December 2021, 22:57:41 UTC	Mullapudi2016: don't hardcode the list of supported targets (#6520) As discussed in https://github.com/halide/Halide/issues/6518, this is a bit dubious, and e.g. prevents building on RISC-V, because there is no way to not build autoschedulers currently.	29 December 2021, 22:57:41 UTC
1d1f06a	Jin Yue	23 December 2021, 15:02:14 UTC	Support new warp shuffle intrinsics after CUDA Volta architecture (#6505) * warp shuffle for volta. * Add a warp shuffle test. * Remove TODO because we have HoistWarpShuffles. * Fix test case position. * Pass target to lower_warp_shuffles. * format Co-authored-by: jinyue.jy <jinyue.jy@alibaba-inc.com>	23 December 2021, 15:02:14 UTC
e7f655b	Andrew Adams	22 December 2021, 02:41:03 UTC	Fix a missing case in clamp_unsafe_accesses (#6508) * Fix a missing case in clamp_unsafe_accesses * Don't check func_value_bounds of images	22 December 2021, 02:41:03 UTC
b0f4681	Roman Lebedev	19 December 2021, 00:59:16 UTC	Try to fix riscv64 build (#6503) https://buildd.debian.org/status/fetch.php?pkg=halide&arch=riscv64&ver=13.0.2-1&stamp=1639833165&raw=0 ``` [1283/3260] /usr/bin/clang++-13 -DHALIDE_ENABLE_RTTI -DHALIDE_WITH_EXCEPTIONS -DHalide_EXPORTS -DLLVM_VERSION=130 -DWITH_AARCH64 -DWITH_AMDGPU -DWITH_ARM -DWITH_D3D12 -DWITH_HEXAGON -DWITH_INTROSPECTION -DWITH_METAL -DWITH_MIPS -DWITH_NVPTX -DWITH_OPENCL -DWITH_OPENGLCOMPUTE -DWITH_POWERPC -DWITH_RISCV -DWITH_WEBASSEMBLY -DWITH_X86 -D_GNU_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -I/usr/lib/llvm-13/include -g -O3 -DNDEBUG -fPIC -Wall -Wcast-qual -Wignored-qualifiers -Woverloaded-virtual -Winconsistent-missing-destructor-override -Winconsistent-missing-override -Wno-deprecated-declarations -Wno-double-promotion -Wno-float-conversion -Wno-float-equal -Wno-missing-field-initializers -Wno-old-style-cast -Wno-shadow -Wno-sign-conversion -Wno-switch-enum -Wno-undef -Wno-unused-function -Wno-unused-macros -Wno-unused-parameter -Wno-c++98-compat-pedantic -Wno-c++98-compat -Wno-cast-align -Wno-comma -Wno-covered-switch-default -Wno-documentation-unknown-command -Wno-documentation -Wno-exit-time-destructors -Wno-global-constructors -Wno-implicit-float-conversion -Wno-implicit-int-conversion -Wno-implicit-int-float-conversion -Wno-missing-prototypes -Wno-nonportable-system-include-path -Wno-reserved-id-macro -Wno-return-std-move-in-c++11 -Wno-shadow-field-in-constructor -Wno-shadow-field -Wno-shorten-64-to-32 -Wno-undefined-func-template -Wno-unused-member-function -Wno-unused-template -pthread -std=c++17 -MD -MT src/CMakeFiles/Halide.dir/Target.cpp.o -MF src/CMakeFiles/Halide.dir/Target.cpp.o.d -o src/CMakeFiles/Halide.dir/Target.cpp.o -c /<<PKGBUILDDIR>>/src/Target.cpp FAILED: src/CMakeFiles/Halide.dir/Target.cpp.o /usr/bin/clang++-13 -DHALIDE_ENABLE_RTTI -DHALIDE_WITH_EXCEPTIONS -DHalide_EXPORTS -DLLVM_VERSION=130 -DWITH_AARCH64 -DWITH_AMDGPU -DWITH_ARM -DWITH_D3D12 -DWITH_HEXAGON -DWITH_INTROSPECTION -DWITH_METAL -DWITH_MIPS -DWITH_NVPTX -DWITH_OPENCL -DWITH_OPENGLCOMPUTE -DWITH_POWERPC -DWITH_RISCV -DWITH_WEBASSEMBLY -DWITH_X86 -D_GNU_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -I/usr/lib/llvm-13/include -g -O3 -DNDEBUG -fPIC -Wall -Wcast-qual -Wignored-qualifiers -Woverloaded-virtual -Winconsistent-missing-destructor-override -Winconsistent-missing-override -Wno-deprecated-declarations -Wno-double-promotion -Wno-float-conversion -Wno-float-equal -Wno-missing-field-initializers -Wno-old-style-cast -Wno-shadow -Wno-sign-conversion -Wno-switch-enum -Wno-undef -Wno-unused-function -Wno-unused-macros -Wno-unused-parameter -Wno-c++98-compat-pedantic -Wno-c++98-compat -Wno-cast-align -Wno-comma -Wno-covered-switch-default -Wno-documentation-unknown-command -Wno-documentation -Wno-exit-time-destructors -Wno-global-constructors -Wno-implicit-float-conversion -Wno-implicit-int-conversion -Wno-implicit-int-float-conversion -Wno-missing-prototypes -Wno-nonportable-system-include-path -Wno-reserved-id-macro -Wno-return-std-move-in-c++11 -Wno-shadow-field-in-constructor -Wno-shadow-field -Wno-shorten-64-to-32 -Wno-undefined-func-template -Wno-unused-member-function -Wno-unused-template -pthread -std=c++17 -MD -MT src/CMakeFiles/Halide.dir/Target.cpp.o -MF src/CMakeFiles/Halide.dir/Target.cpp.o.d -o src/CMakeFiles/Halide.dir/Target.cpp.o -c /<<PKGBUILDDIR>>/src/Target.cpp warning: unknown warning option '-Wno-return-std-move-in-c++11' [-Wunknown-warning-option] /<<PKGBUILDDIR>>/src/Target.cpp:114:5: error: use of undeclared identifier 'cpuid' cpuid(info, 1, 0); ^ /<<PKGBUILDDIR>>/src/Target.cpp:148:9: error: use of undeclared identifier 'cpuid' cpuid(info2, 7, 0); ^ /<<PKGBUILDDIR>>/src/Target.cpp:181:17: error: use of undeclared identifier 'cpuid' cpuid(info3, 7, 1); ^ 1 warning and 3 errors generated. ``` ... which doesn't make sense because that code is supposed to only compile for X86. But that is because RISCV header guard is wrong, https://github.com/riscv-non-isa/riscv-toolchain-conventions says: ``` C/C++ preprocessor definitions * __riscv: defined for any RISC-V target. Older versions of the GCC toolchain defined __riscv__. ```	19 December 2021, 00:59:16 UTC
1d86751	Steven Johnson	16 December 2021, 19:30:06 UTC	Grab Bag of minor cleanups to LowerParallelTasks (#6498) * Grab Bag of minor cleanups to LowerParallelTasks Basically OCD code stuff I noted down when debugging the issues, this restructures the inner loop to avoid calling a local function that has non-obvious side effects (setting just the right slot in the closure args), as well as consolidating via helper functions, hoisting common stuff used in both paths, using std::move where seemingly appropriate, adding some (hopefully correct) comments about arg expectations, and other things that aren't likely to really move the needle in terms of Halide compile speed, but (hopefully) make the code a little bit more understandable after some time away. (There was a todo about "find a better place for generate_closure_ir()"; this PR eliminates it entirely, just inlining it into the caller, which I think is reasonable given thhe number of assumptions the caller has to make in the first place...) * Update LowerParallelTasks.cpp	16 December 2021, 19:30:06 UTC
dffae98	Steven Johnson	16 December 2021, 01:24:59 UTC	Update simd_op_check for arm64 upz1 code generation (#6499) (#6500)	16 December 2021, 01:24:59 UTC
084236c	Steven Johnson	16 December 2021, 01:24:33 UTC	Fix size_t -> int conversion warning (#6501)	16 December 2021, 01:24:33 UTC
45e1809	Steven Johnson	15 December 2021, 20:41:39 UTC	Update WABT to 1.0.25 (#6497) * Update WABT to 1.0.25 (cannot land until https://github.com/WebAssembly/wabt/pull/1788 lands) * tickle buildbots	15 December 2021, 20:41:39 UTC
7e233f1	Steven Johnson	15 December 2021, 02:16:34 UTC	Update Codegen_Xtensa::print_assignment() from #6195 Handles need to be `auto *` for the previous PR to work properly. (This should be refactored more intelligently to reduce code reuse; this is just a quick-fix to unbreak.)	15 December 2021, 02:16:34 UTC
d518030	Steven Johnson	14 December 2021, 21:39:30 UTC	Update XtensaOptimize.cpp	14 December 2021, 21:39:30 UTC