https://github.com/halide/Halide

sort by:
Revision Author Date Message Commit Date
a63a31a Merge branch 'master' into srj/printer-size 05 January 2022, 01:55:04 UTC
3087f92 Update printer.cpp 05 January 2022, 01:54:59 UTC
3a4e4c7 If cmake built a python module, teach cmake to install the python module. (#6523) 04 January 2022, 23:35:51 UTC
b8eb22d Fix Python GIL lock handling (Fixes #6524, Fixes #5631) (#6525) * Fix Python GIL lock handling (Fixes #6524, Fixes #5631) As disscussed in https://github.com/halide/Halide/pull/6523#issuecomment-1003545664 and later in https://github.com/halide/Halide/issues/6524, pybind11 v2.8.1 added some defensive checks that fail for halide, namely in `python_tutorial_lesson_04_debugging_2` and `python_tutorial_lesson_05_scheduling_1`. https://github.com/halide/Halide/issues/6524#issuecomment-1003569810 notes: > * Python calls a Halide-JIT-generated function , which runs with the GIL held. > * Halide runtime spawns worker threads. > * The worker threads try to call pybind11's py::print function to emit traces. > * Pybind11 complains, correctly, that the worker thread doesn't hold the GIL. > > Trying to acquire the GIL hangs, because the main thread is still holding it. I tried teaching the main thread to release the GIL (as suggested in #5631), but I still saw hangs when I tried this. I have tried, and just dropping the lock before calling into halide, or just acquiring it in `halide_python_print` doesn't work, we need to do both. I have verified that the two tests fail without this fix, and pass with it. 04 January 2022, 21:56:22 UTC
bce2ef4 Install Python tutorials (#6530) * Install Python tutorials I know we have previously discussed that `TYPE DOC` should be used, but unfortunately i'm not sure that will work here, because doc/tutorial directory is already occupied by C++ tutorials, and i don't think they should be mixed. I'm open to alternative suggestions. 04 January 2022, 21:06:29 UTC
9c6e7a2 Merge branch 'master' into srj/printer-size 04 January 2022, 18:39:41 UTC
0021165 Make random faster by putting the innermost var last (#6504) * Make random 2x faster by putting the innermost var last * Improve period of low bits of random noise * Add new rewrite rules for quadratics By pulling constant additions outside of quadratics, we can shave off a few add instructions in the inner loop for random number generation, which uses a quadratic modulo 2^32 I also removed the !overflows predicates, because rules already fail to match if a fold overflows. New rules formally verified. * Make expensive_zero actually always zero 04 January 2022, 16:40:23 UTC
f11d820 Implement SanitizerCoverage support (Refs. #6513) (#6517) * Implement SanitizerCoverage support (Refs. #6513) Please refer to https://clang.llvm.org/docs/SanitizerCoverage.html TLDR: `ModuleSanitizerCoveragePass` instruments the IR by inserting calls to callbacks at certain constructs. What the callbacks should do is up to the implementation. They are effectively required for fuzzing to be effective, and are provided by e.g. libfuzzer. One huge caveat is `SanitizerCoverageOptions` which controls which which callbacks should actually be inserted. I just don't know what to do about it. Right now i have hardcoded the set that would have been enabled by `-fsanitize=fuzzer-no-link`, because the alternative, due to halide unflexibility, would be to introduce ~16 suboptions to control each one. * Simplify test * sancov test: avoid potential signedness warnings. * Rename all instances of sancov to sanitizecoverage * Adjust spelling of "SanitizerCoverage" in some places * Actually adjust the feature name in build system for the test * Hopefully fix Makefile build Co-authored-by: Steven Johnson <srj@google.com> 04 January 2022, 16:34:58 UTC
7eb9949 [NFC-ish] Finish MSAN handling (#6516) Somehow, initially i missed that there was MSan support, so it might be good to actually mention that we don't need to run any MSan passes here, and that we didn't forget to run them. Secondly, it seems inconsistent not annotate the functions with `Attribute::SanitizeMemory`, like we do for others. I suppose it isn't strictly required, since they are used to actually drive the instrumentation passes, and we don't run MSan pass, but they are also used to disable some LLVM optimizations, and that //might// be important. Or not, but then i suppose there should be a comment about it? Co-authored-by: Steven Johnson <srj@google.com> 04 January 2022, 16:32:52 UTC
5c33902 free shape storage last (#6511) Some decref-triggered runtime methods need the shape Fixes #6509 Co-authored-by: Steven Johnson <srj@google.com> 04 January 2022, 16:08:43 UTC
0089de9 Handle mixed-width args to mul-shift-right (#6526) and codegen it to pmulhuw on x86 Co-authored-by: Steven Johnson <srj@google.com> 04 January 2022, 16:08:28 UTC
1b180a8 Make it possible to interpret a wide type as multiple smaller elements (#6506) * Make it possible to interpret a wide type as multiple smaller elements This is helpful for things like reinterpreting 32-bit packed rgba values as individual components for free. * clang-format 03 January 2022, 23:04:31 UTC
f9ea2d4 Fix use-after-free bug in SlidingWindow.cpp (#6527) 03 January 2022, 22:12:35 UTC
2651402 Fix simd-op-check for top-of-tree LLVM (#6529) * Fix simd-op-check for top-of-tree LLVM * Update simd_op_check.cpp 03 January 2022, 20:45:32 UTC
9a530b1 Fix weird CMake issue with custom LLVM (#6519) Without this, cmake fails with: ``` CMake Error in dependencies/llvm/CMakeLists.txt: Target "Halide_LLVM" INTERFACE_INCLUDE_DIRECTORIES property contains path: "/repositories/halide/dependencies/llvm/" which is prefixed in the source directory. ``` `LLVM_INCLUDE_DIRS` there is `/repositories/llvm-project/llvm/include;/builddirs/llvm-project/build-Clang13/include`, and `INTERFACE_INCLUDE_DIRECTORIES`'s property beforehand is `` (empty), but after this line it suddenly becomes `/repositories/halide/dependencies/llvm/$<BUILD_INTERFACE:/repositories/llvm-project/llvm/include;/builddirs/llvm-project/build-Clang13/include>`. This is quite obscure. I don't really understand what is going on, but with the patch it builds fine. 29 December 2021, 22:58:10 UTC
6ed65ba Mullapudi2016: don't hardcode the list of supported targets (#6520) As discussed in https://github.com/halide/Halide/issues/6518, this is a bit dubious, and e.g. prevents building on RISC-V, because there is no way to not build autoschedulers currently. 29 December 2021, 22:57:41 UTC
1d1f06a Support new warp shuffle intrinsics after CUDA Volta architecture (#6505) * warp shuffle for volta. * Add a warp shuffle test. * Remove TODO because we have HoistWarpShuffles. * Fix test case position. * Pass target to lower_warp_shuffles. * format Co-authored-by: jinyue.jy <jinyue.jy@alibaba-inc.com> 23 December 2021, 15:02:14 UTC
e7f655b Fix a missing case in clamp_unsafe_accesses (#6508) * Fix a missing case in clamp_unsafe_accesses * Don't check func_value_bounds of images 22 December 2021, 02:41:03 UTC
b0f4681 Try to fix riscv64 build (#6503) https://buildd.debian.org/status/fetch.php?pkg=halide&arch=riscv64&ver=13.0.2-1&stamp=1639833165&raw=0 ``` [1283/3260] /usr/bin/clang++-13 -DHALIDE_ENABLE_RTTI -DHALIDE_WITH_EXCEPTIONS -DHalide_EXPORTS -DLLVM_VERSION=130 -DWITH_AARCH64 -DWITH_AMDGPU -DWITH_ARM -DWITH_D3D12 -DWITH_HEXAGON -DWITH_INTROSPECTION -DWITH_METAL -DWITH_MIPS -DWITH_NVPTX -DWITH_OPENCL -DWITH_OPENGLCOMPUTE -DWITH_POWERPC -DWITH_RISCV -DWITH_WEBASSEMBLY -DWITH_X86 -D_GNU_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -I/usr/lib/llvm-13/include -g -O3 -DNDEBUG -fPIC -Wall -Wcast-qual -Wignored-qualifiers -Woverloaded-virtual -Winconsistent-missing-destructor-override -Winconsistent-missing-override -Wno-deprecated-declarations -Wno-double-promotion -Wno-float-conversion -Wno-float-equal -Wno-missing-field-initializers -Wno-old-style-cast -Wno-shadow -Wno-sign-conversion -Wno-switch-enum -Wno-undef -Wno-unused-function -Wno-unused-macros -Wno-unused-parameter -Wno-c++98-compat-pedantic -Wno-c++98-compat -Wno-cast-align -Wno-comma -Wno-covered-switch-default -Wno-documentation-unknown-command -Wno-documentation -Wno-exit-time-destructors -Wno-global-constructors -Wno-implicit-float-conversion -Wno-implicit-int-conversion -Wno-implicit-int-float-conversion -Wno-missing-prototypes -Wno-nonportable-system-include-path -Wno-reserved-id-macro -Wno-return-std-move-in-c++11 -Wno-shadow-field-in-constructor -Wno-shadow-field -Wno-shorten-64-to-32 -Wno-undefined-func-template -Wno-unused-member-function -Wno-unused-template -pthread -std=c++17 -MD -MT src/CMakeFiles/Halide.dir/Target.cpp.o -MF src/CMakeFiles/Halide.dir/Target.cpp.o.d -o src/CMakeFiles/Halide.dir/Target.cpp.o -c /<<PKGBUILDDIR>>/src/Target.cpp FAILED: src/CMakeFiles/Halide.dir/Target.cpp.o /usr/bin/clang++-13 -DHALIDE_ENABLE_RTTI -DHALIDE_WITH_EXCEPTIONS -DHalide_EXPORTS -DLLVM_VERSION=130 -DWITH_AARCH64 -DWITH_AMDGPU -DWITH_ARM -DWITH_D3D12 -DWITH_HEXAGON -DWITH_INTROSPECTION -DWITH_METAL -DWITH_MIPS -DWITH_NVPTX -DWITH_OPENCL -DWITH_OPENGLCOMPUTE -DWITH_POWERPC -DWITH_RISCV -DWITH_WEBASSEMBLY -DWITH_X86 -D_GNU_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -I/usr/lib/llvm-13/include -g -O3 -DNDEBUG -fPIC -Wall -Wcast-qual -Wignored-qualifiers -Woverloaded-virtual -Winconsistent-missing-destructor-override -Winconsistent-missing-override -Wno-deprecated-declarations -Wno-double-promotion -Wno-float-conversion -Wno-float-equal -Wno-missing-field-initializers -Wno-old-style-cast -Wno-shadow -Wno-sign-conversion -Wno-switch-enum -Wno-undef -Wno-unused-function -Wno-unused-macros -Wno-unused-parameter -Wno-c++98-compat-pedantic -Wno-c++98-compat -Wno-cast-align -Wno-comma -Wno-covered-switch-default -Wno-documentation-unknown-command -Wno-documentation -Wno-exit-time-destructors -Wno-global-constructors -Wno-implicit-float-conversion -Wno-implicit-int-conversion -Wno-implicit-int-float-conversion -Wno-missing-prototypes -Wno-nonportable-system-include-path -Wno-reserved-id-macro -Wno-return-std-move-in-c++11 -Wno-shadow-field-in-constructor -Wno-shadow-field -Wno-shorten-64-to-32 -Wno-undefined-func-template -Wno-unused-member-function -Wno-unused-template -pthread -std=c++17 -MD -MT src/CMakeFiles/Halide.dir/Target.cpp.o -MF src/CMakeFiles/Halide.dir/Target.cpp.o.d -o src/CMakeFiles/Halide.dir/Target.cpp.o -c /<<PKGBUILDDIR>>/src/Target.cpp warning: unknown warning option '-Wno-return-std-move-in-c++11' [-Wunknown-warning-option] /<<PKGBUILDDIR>>/src/Target.cpp:114:5: error: use of undeclared identifier 'cpuid' cpuid(info, 1, 0); ^ /<<PKGBUILDDIR>>/src/Target.cpp:148:9: error: use of undeclared identifier 'cpuid' cpuid(info2, 7, 0); ^ /<<PKGBUILDDIR>>/src/Target.cpp:181:17: error: use of undeclared identifier 'cpuid' cpuid(info3, 7, 1); ^ 1 warning and 3 errors generated. ``` ... which doesn't make sense because that code is supposed to only compile for X86. But that is because RISCV header guard is wrong, https://github.com/riscv-non-isa/riscv-toolchain-conventions says: ``` C/C++ preprocessor definitions * __riscv: defined for any RISC-V target. Older versions of the GCC toolchain defined __riscv__. ``` 19 December 2021, 00:59:16 UTC
1d86751 Grab Bag of minor cleanups to LowerParallelTasks (#6498) * Grab Bag of minor cleanups to LowerParallelTasks Basically OCD code stuff I noted down when debugging the issues, this restructures the inner loop to avoid calling a local function that has non-obvious side effects (setting just the right slot in the closure args), as well as consolidating via helper functions, hoisting common stuff used in both paths, using std::move where seemingly appropriate, adding some (hopefully correct) comments about arg expectations, and other things that aren't likely to really move the needle in terms of Halide compile speed, but (hopefully) make the code a little bit more understandable after some time away. (There was a todo about "find a better place for generate_closure_ir()"; this PR eliminates it entirely, just inlining it into the caller, which I think is reasonable given thhe number of assumptions the caller has to make in the first place...) * Update LowerParallelTasks.cpp 16 December 2021, 19:30:06 UTC
21d2a05 Merge branch 'master' into srj/printer-size 16 December 2021, 01:26:28 UTC
dffae98 Update simd_op_check for arm64 upz1 code generation (#6499) (#6500) 16 December 2021, 01:24:59 UTC
084236c Fix size_t -> int conversion warning (#6501) 16 December 2021, 01:24:33 UTC
12fc4e9 Merge branch 'master' into srj/printer-size 15 December 2021, 20:42:29 UTC
45e1809 Update WABT to 1.0.25 (#6497) * Update WABT to 1.0.25 (cannot land until https://github.com/WebAssembly/wabt/pull/1788 lands) * tickle buildbots 15 December 2021, 20:41:39 UTC
0717cda Trim code size for Printer This builds on top of https://github.com/halide/Halide/pull/6472 to de-inline as much of Printer as possible, moving things into a new 'printer' runtime module using a PrinterBase class. This is, admittedly, a pretty small improvement: comparing before-and-after on OSX for target=host shows only a ~4k reduction in object size (115k -> 111k for `runtime.o`) but adding targets with more verbose error reporting and such increases the benefit (eg host-opencl gives a 179k -> 164k reduction for runtime.o). Since ~all of the usages of Printer in runtime are for error handling, debugging, profiling, or tracing, any possible reduction in performance seems unlikely to be significant. 15 December 2021, 00:05:49 UTC
c3ff4d2 Restore support for using V8 as the Wasm JIT interpreter (#6478) * Support using V8 as the Wasm JIT interpreter This is a partial revert of https://github.com/halide/Halide/pull/5097. It brings back a bunch of the code in WasmExecutor to set up and use V8 to run Wasm code. All of the code is copy-pasted. There are some small cleanups to move common code (like BDMalloc, structs, asserts) to a common area guarded by `if WITH_WABT || WITH_V8`. Enabling V8 requires setting 2 CMake options: - V8_INCLUDE_PATH - V8_LIB_PATH The first is a path to v8 include folder, to find headers, the second is the monolithic v8 library. This is because it's pretty difficult to build v8, and there are various flags you can set. Comments around those options provide some instructions for building v8. By default, we still use the wabt for running Wasm code, but we can use V8 by setting WITH_WABT=OFF WITH_V8=ON. Maybe in the future, with more testing, we can flip this. Right now this requires a locally patched build of V8 due to https://crbug.com/v8/10461, but once that is resolved, the version of V8 that includes the fix will be fine. Also enable a single test, block_transpose, to run on V8, with these results: $ HL_JIT_TARGET=wasm-32-wasmrt-wasm_simd128 \ ./test/performance/performance_block_transpose Dummy Func version: Scalar transpose bandwidth 3.45061e+08 byte/s. Wrapper version: Scalar transpose bandwidth 3.38931e+08 byte/s. Dummy Func version: Transpose vectorized in y bandwidth 6.74143e+08 byte/s. Wrapper version: Transpose vectorized in y bandwidth 3.54331e+08 byte/s. Dummy Func version: Transpose vectorized in x bandwidth 3.50053e+08 byte/s. Wrapper version: Transpose vectorized in x bandwidth 6.73421e+08 byte/s. Success! For comparison, when targeting host: $ ./test/performance/performance_block_transpose Dummy Func version: Scalar transpose bandwidth 1.33689e+09 byte/s. Wrapper version: Scalar transpose bandwidth 1.33583e+09 byte/s. Dummy Func version: Transpose vectorized in y bandwidth 2.20278e+09 byte/s. Wrapper version: Transpose vectorized in y bandwidth 1.45959e+09 byte/s. Dummy Func version: Transpose vectorized in x bandwidth 1.45921e+09 byte/s. Wrapper version: Transpose vectorized in x bandwidth 2.21746e+09 byte/s. Success! For comparison, running with wabt: Dummy Func version: Scalar transpose bandwidth 828715 byte/s. Wrapper version: Scalar transpose bandwidth 826204 byte/s. Dummy Func version: Transpose vectorized in y bandwidth 1.12008e+06 byte/s. Wrapper version: Transpose vectorized in y bandwidth 874958 byte/s. Dummy Func version: Transpose vectorized in x bandwidth 879031 byte/s. Wrapper version: Transpose vectorized in x bandwidth 1.10525e+06 byte/s. Success! * Add instructions to build V8 * Formatting * More documentation * Update README_webassembly.md * Update README_webassembly.md * Update WasmExecutor.cpp * Update WasmExecutor.cpp * Skip tests * Update WasmExecutor.cpp * Skip performance tests * Update WasmExecutor.cpp * Address review comments * 9.8.147 -> 9.8.177 Co-authored-by: Ng Zhi An <zhin@google.com> 14 December 2021, 00:32:31 UTC
46d8ca8 Move parallel/async lowering from LLVM codegen to a standard Halide IR lowering pass. (#6195) * First cut at factoring parallel task compilation, including closure generating and calling, into a normal IR to IR lowering pass. Includes adding struct handling intrinsics to LLVM and C++ backends. Still a work in progress. * Fix formating that got munged by emacs somehow. * Checkpoint progress. * Small fixes. * Checkpoint progress. * Checkpoint preogress. * Checkpoint progress. * Checkpoint progress. Debugging code will be removed. * Try a fix for make_typed_struct in C++ codegen. * Another attempt to fix C++ codegen. * Another C codegen fix. * Checkpoint progress. * Use make_typed_struct rather than make_struct to construct closure. Ensure all types are carried through exactly the same to both make_struct_type and make_typed_struct. * Checkpoint. * Uniqueify closure names because LLVM was doing that to function names. * Small formatting cleanups. Fixes to call graph checker. Disable tests related to this while Andrew and I figure out how to get it to work across closures. * Get generated C++ to compile via a combination of fixing types and bludgeoning types into submission via subterfuge. * Typo fix to a typo fix. * Restore inadvertently deleted code. * Rename make_struct_type to declare_struct_type. * Add new file to CMake. * Add fixes for Hexagon offload and any passes that might add additional LoweredFunctions in the future. * Add comment with a bit of info for the future.. * Typo fix. * Don't duplicate the closure call to test the error return. Don't declare halide_buffer_t type if there are no buffers in closure. * Use _ucon in C++ code to get rid of constness casting ugliness. * Change resolve_function_name intrinsic to use a Call node to designate the function. This makes generating the C++ declaration in the C++ backend trivial. Few more changes to type handling and naming. * Small C++ backend output formating change. Don't generate For loops with no variable. Update internal test for C++ output. * Add halide_semaphore_acquire_t as a well known type for use inside compiler. * Add handling for halide_semaphore_t allocation. Formating fixes. * Fix type for halide_semaphore_t. * Reapply C++ backend formatting fix. * Add support for calling legacy halide_do_par_for runtime routine in cases where it is valid. * Formatting fixes. * Format and tidy fixes. * Attempt to pass formatting check. * Fix last set of test failures. * Formatting whitespace fixes. * Update comments. * Attempt to fix pointer cast error with some versions of LLVM. * Another attempt at fixing bool compatibility casting. * Another iteration. * Remove likely useless extern argument check logic. * Add hacky fix for losing global variables. * Comment typo fixes. * Remove no-longer-used Closure code from Codegen_Internal * Remove unused MayBlock visitor class * clang-tidy * Attempt to fix parallel offloads for HVX * Update parallel_nested_1.cpp * Augment Closure debugging * Add some std::move usage * Fix hvx lock/unlock semantics for PR #6457 (#6462) Fix qurt_hvx_lock issues * Sort IntrinsicOp and corresponding names * Remove unused `is_const_pointer()` function * Minor hygiene in LowerParallelTasks - normalize local functions to snake_case - put all local functions & classes in anon namespace - move MinThreads visitor to file scope to reduce nestedness of code * use Closure::include * Switch to PureIntrinsics per review feedback. * Minor cleanup of parallel refactor intrinsics (#6465) * Minor cleanup of parallel refactor intrinsics - Renamed `load_struct_member` to `load_typed_struct_member` to make it more clear that it is intended for use only with the results of `make_typed_struct`. - Split `declare_struct_type` into two intrinsics, `define_typed_struct` and `forward_declare_typed_struct`, removing the need for the underdocumented `mode` argument and hopefully making usage clearer - Added / clarified comments for the intrinsics modified above * Update comments * Fix comments * Update CodeGen_C.cpp * Remove 'foo.buffer' from Closure entirely This is a direct adaptation of what #6481 does for the old LLVM-based code, and allows elimination of one use of `get_pointer_or_null()`. PR is meant to be merged into factor_parallel_codegen, not master. * Update LowerParallelTasks.cpp * Keep track of task_parent inside LowerParallelTasks; remove no-longer-needed get_pointer_or_symbol() intrinsic (#6486) * Fix potential issue with additional LoweredFuncs (#6490) I think this is the right fix for this case; that said, I can't find a single instance in any of our test cases that actually triggers this. * factor parallel codegen with fewer intrinsics (#6487) * Rework some of parallel closure lowering to avoid some intrinsics This version relies more heavily on the existing make_struct, and puts function names in the Variable namespace as globals. Co-authored-by: Steven Johnson <srj@google.com> Co-authored-by: dsharletg <dsharlet@google.com> Co-authored-by: Steven Johnson <srj@google.com> Co-authored-by: Andrew Adams <andrew.b.adams@gmail.com> 14 December 2021, 00:31:45 UTC
e23b6f0 rounding shift rights should use rounding halving add (#6494) * rounding shift rights should use rounding halving add On x86 currently we lower cast<uint8_t>((cast<uint16_t>(x) + 8) / 16) to: cast<uint8_t>(shift_right(widening_add(x, 8), 4)) This compiles to 8 instructions on x86: Widen each half of the input vector, add 8 to each half-vector, shift each half-vector, then narrow each half-vector. First, this should have been a rounding_shift_right. Some patterns were missing in FindIntrinsics. Second, rounding_shift_right had suboptimal codegen in the case where the second arg is a positive const. On archs without a rounding shift right instruction you can further rewrite this to: shift_right(rounding_halving_add(x, 7), 3) which is just two instructions on x86. 13 December 2021, 20:00:52 UTC
11448b2 Document the usage of llvm::legacy::PassManager (#6491) * Document the usage of llvm::legacy::PassManager There is some confusion about whether this usage is acceptable. TL;DR: it's not just acceptable, it's required for the forseeable future. Add comments to capture this to avoid future such questions. (With great thanks to Alina for pointing me at the relevant LLVM discussion links!) * Add date 10 December 2021, 19:31:01 UTC
7fe1e2c Let lerp lowering incorporate a final cast. (#6480) * Let lerp lowering incorporate a final cast This lets it save a few instructions on x86 and arm. cast(UInt(16), lerp(some_u8s)) produces the following, before and after this PR Before: x86: vmovdqu (%r15,%r13), %xmm4 vpmovzxbw -2(%r15,%r13), %ymm5 vpxor %xmm0, %xmm4, %xmm6 vpmovzxbw %xmm6, %ymm6 vpmovzxbw -1(%r15,%r13), %ymm7 vpmullw %ymm6, %ymm5, %ymm5 vpmovzxbw %xmm4, %ymm4 vpmullw %ymm4, %ymm7, %ymm4 vpaddw %ymm4, %ymm5, %ymm4 vpaddw %ymm1, %ymm4, %ymm4 vpmulhuw %ymm2, %ymm4, %ymm4 vpsrlw $7, %ymm4, %ymm4 vpand %ymm3, %ymm4, %ymm4 vmovdqu %ymm4, (%rbx,%r13,2) addq $16, %r13 decq %r10 jne .LBB0_10 arm: ldr q0, [x17] ldur q2, [x17, #-1] ldur q1, [x17, #-2] subs x0, x0, #1 // =1 mvn v3.16b, v0.16b umull v4.8h, v2.8b, v0.8b umull2 v0.8h, v2.16b, v0.16b umlal v4.8h, v1.8b, v3.8b umlal2 v0.8h, v1.16b, v3.16b urshr v1.8h, v4.8h, #8 urshr v2.8h, v0.8h, #8 raddhn v1.8b, v1.8h, v4.8h raddhn v0.8b, v2.8h, v0.8h ushll v0.8h, v0.8b, #0 ushll v1.8h, v1.8b, #0 add x17, x17, #16 // =16 stp q1, q0, [x18, #-16] add x18, x18, #32 // =32 b.ne .LBB0_10 After: x86: vpmovzxbw -2(%r15,%r13), %ymm3 vmovdqu (%r15,%r13), %xmm4 vpxor %xmm0, %xmm4, %xmm5 vpmovzxbw %xmm5, %ymm5 vpmullw %ymm5, %ymm3, %ymm3 vpmovzxbw -1(%r15,%r13), %ymm5 vpmovzxbw %xmm4, %ymm4 vpmullw %ymm4, %ymm5, %ymm4 vpaddw %ymm4, %ymm3, %ymm3 vpaddw %ymm1, %ymm3, %ymm3 vpmulhuw %ymm2, %ymm3, %ymm3 vpsrlw $7, %ymm3, %ymm3 vmovdqu %ymm3, (%rbp,%r13,2) addq $16, %r13 decq %r10 jne .LBB0_10 arm: ldr q0, [x17] ldur q2, [x17, #-1] ldur q1, [x17, #-2] subs x0, x0, #1 // =1 mvn v3.16b, v0.16b umull v4.8h, v2.8b, v0.8b umull2 v0.8h, v2.16b, v0.16b umlal v4.8h, v1.8b, v3.8b umlal2 v0.8h, v1.16b, v3.16b ursra v4.8h, v4.8h, #8 ursra v0.8h, v0.8h, #8 urshr v1.8h, v4.8h, #8 urshr v0.8h, v0.8h, #8 add x17, x17, #16 // =16 stp q1, q0, [x18, #-16] add x18, x18, #32 // =32 b.ne .LBB0_10 So on X86 we skip a pointless and instruction, and on ARM we get a rounding add and shift right instead of a rounding narrowing add shift right followed by a widen. * Add test * Fix bug in test * Don't produce out-of-range lerp values 10 December 2021, 15:06:30 UTC
bcfd6af Fail if no_bounds_query specified for HL_JIT_TARGET (#6489) * Fail if no_bounds_query specified for HL_JIT_TARGET JIT requires the use of bounds_query; disabling it will almost certainly fail in JIT mode, either with a confusing assert message, or a crash (if you also specify no_asserts). This adds a more useful failure message. * Update Target.cpp 09 December 2021, 23:06:05 UTC
59118de Deal with Printer::scratch (#6469) (#6472) Instead of trying to optimize every Printer instance to use stack (and failing), move the StackPrinter concept into printer.h directly and require opt-in at the point of compilation to use stack instead of malloc. This PR also does a few other drive-by cleanups: - Ensures that all Printer ctors are explicit - Makes some template aliases to make using (e.g.) ErrorPrinter with a custom buffer size slightly cleaner syntax - Have tracing use the `.str()` method, which already deals with MSAN internally - Make all the Printer data members private - Fix some evil code in opencl.cpp that previously used the now-private data members 08 December 2021, 22:12:53 UTC
d089588 Move null check from Printer to halide_string_to_string() The Printer is (currently) usually inlined into every module, so this check is repeated in multiple chunks of code. Since the goal is to avoid crashing when debugging, let's move it to halide_string_to_string() (which will catch all these, and possibly more) and save some code size. (Further improvements in Printer code size on the way; this change seems worthy of considering separately.) 08 December 2021, 19:11:28 UTC
7199e7d Try removing optional buffer added to closure 08 December 2021, 18:53:35 UTC
7992369 Add a fast integer divide that rounds to zero (#6455) * Add a version of fast_integer_divide that rounds towards zero * clang-format * Fix test condition * Clean up debugging code * Add explanatory comment to performance test * Pacify clang tidy 07 December 2021, 16:16:50 UTC
fb305fd `apps/linear_algebra/benchmarks/macros.h`: don't forget SSE guard (#6471) This is breaking i386 build: https://buildd.debian.org/status/fetch.php?pkg=halide&arch=i386&ver=13.0.1-3&stamp=1638786518&raw=0 07 December 2021, 02:15:18 UTC
e0df687 decommissioning StackPrinter (#6470) 06 December 2021, 20:34:44 UTC
392430d Fix Closure API (#6464) The current API requires calling a Visitor from the Closure ctor, which means we implicitly call virtual methods from the class ctor, which is a no-no for a non-final class (see comments on https://github.com/halide/Halide/pull/6443). 02 December 2021, 21:23:25 UTC
0ed461b Add operator<< for Closure (#6443) * Add operator<< for Closure Moves the ad-hoc implementation our of HostClosure::arguments() for easier debugging usage. Also, drive-by elimination of the body of HostClosure ctor, which was identical to the one inherited from Closure. * Update DeviceArgument.cpp * Add explanatory comment 02 December 2021, 18:38:50 UTC
5cf9ae5 Reduce overhead of sampling profiler by having only one thread do it (#6433) * Reduce overhead of sampling profiler by having only one thread do it * Use const ref * One line per member 02 December 2021, 15:04:43 UTC
479d839 Add LinkageType::ExternalPlusArgv (#6452) (#6463) Allows us to skip generating metadata for offloaded hexagon funcs, which will never use it. 02 December 2021, 03:42:04 UTC
4877d26 Tweak Hexagon codegen output to match the pattern in Lower.cpp more accurately (for level 1 vs 2); also prefix the outputs so they are easier to read as Hexagon-specific when debugging (#6461) 01 December 2021, 19:22:00 UTC
c0192ff Re-enable performance_async_gpu for D3D12Compute (#6450) * Re-enable performance_async_gpu for D3D12Compute It's been disabled for ~2 years because of flaky failures (#3586); we should see if the many changes since then have improved things or not. * tickle buildbots 30 November 2021, 06:13:44 UTC
5aeea6a Fixes for c++20 (#6446) Fixes #6445 26 November 2021, 22:32:24 UTC
76c0946 Syntax highlighting for embedded PTX code. (#6447) * Include GPU source kernels in Stmt and StmtHtml file. * Syntax highlighting for embedded PTX code. 26 November 2021, 20:03:24 UTC
3bde22a Include GPU source kernels in Stmt and StmtHtml file. (#6444) 24 November 2021, 20:59:37 UTC
8b68f85 Avoid needless gather in fast_integer_divide lowering (#6441) * Avoid needless gather in fast_integer_divide lowering fast_integer_divide did two lookups, one for a multiplier, and one for a shift. It turns out you can just use count leading zeros to compute a workable shift instead of having to do a lookup. This PR speeds up use of fast_integer_divide in cases where the denominator varies across vector lanes by ~70% or so by avoiding one of the two expensive gathers. * Fix slash direction * Pacify clang-tidy * Use portable bit-counting methods * Cleaner initialization of tables 23 November 2021, 21:13:48 UTC
d12fbd1 Codegen_C: buffer compilation needs to special-case scalar buffers (#6442) The existing code will emit something like `halide_dimension_t foo_buffer_shape[] = {};` for these, which is a zero-length array, which some compilers will (justifiably) say has no effect. We should be able to just use nullptr for the shape in these cases. 23 November 2021, 17:33:38 UTC
59d6da7 Skip custom cuda context test on older GPUs (#6437) 23 November 2021, 17:25:47 UTC
a89041b Ensure that halide_start_clock() is called before halide_current_time_ns() in hexagon_host.cpp (#6438) This oversight was causing an assert with the -debug feature flag enabled (with presumably-misleading timing results as well) 22 November 2021, 21:29:11 UTC
57d1e05 Set up SANITIZER_FLAGS and OPTIMIZE for apps/Makefile.inc (#6435) Minor hygiene to make it easy to build AOT apps with TSAN or ASAN. 22 November 2021, 19:46:52 UTC
2239443 Do target-specific lowering of lerp (#6432) * Do target-specific lowering of lerp Saves instructions on x86. Before #6426 vpaddw %ymm0, %ymm1, %ymm1 vpsrlw $8, %ymm1, %ymm2 vpaddw %ymm2, %ymm1, %ymm1 vpsrlw $8, %ymm1, %ymm1 After #6426 vpsrlw $7, %ymm2, %ymm3 vpand %ymm0, %ymm3, %ymm3 vpsrlw $8, %ymm2, %ymm4 vpaddw %ymm2, %ymm4, %ymm2 vpaddw %ymm3, %ymm2, %ymm2 vpsrlw $7, %ymm2, %ymm3 vpand %ymm0, %ymm3, %ymm3 vpsrlw $8, %ymm2, %ymm2 vpaddw %ymm2, %ymm3, %ymm2 vpand %ymm1, %ymm2, %ymm2 This PR: vpaddw %ymm0, %ymm3, %ymm3 vpmulhuw %ymm1, %ymm3, %ymm3 vpsrlw $7, %ymm3, %ymm3 * Target is a struct 19 November 2021, 22:56:12 UTC
cfd03c9 Don't remap the function name or the target in the metadata (#6430) The remapping is only intended to be used for output argument(s), not the function name; if you have an output with the same name as the function, you can get the metadata emitted with incorrect information. (And remapping the target string is just silly.) This is almost impossible to do currently, but if you construct a Generator just right, you can make it happen. 19 November 2021, 17:41:05 UTC
c3040cb Rewrite integer lerp using intrinsics (#6426) * Rewrite integer lerp using intrinsics * Comment 19 November 2021, 17:10:15 UTC
0e40edc Include LICENSE.txt in package (#6428) Co-authored-by: Ashish Uthama <you@example.com> 18 November 2021, 21:27:53 UTC
36dd10f Fix Introspection issues (#6424) - DWARF v5 has a slightly different header; this recognizes it so we don't fail immediately - Add support for the line_strp form - Allow for a graceful failure if a debug abbreviation is missing; I've only seen this when compiling for TSAN, and I'm honestly not entirely sure if this is a bug in the DWARF generation for those tools vs a subtle flaw in our parsing, but bailing out early and skipping introspection seems kinder than assert-fail. 17 November 2021, 23:14:21 UTC
16fa3ce [hannk] Pacify clang-tidy (#6412) * [hannk] Pacify clang-tidy * One more ASAN fix We must use use_global_gc = false to work properly with the JIT * Revert "One more ASAN fix" This reverts commit 9ed07a70b4a656790236a5ff6966155df823a319. * Rework Op::mutate() to avoid UB 12 November 2021, 23:17:53 UTC
b63f6af [hannk] Fix lower_tflite_fullyconnected (#6414) Fixed the bounds calculation in lower_tflite_fullyconnected() to preserve the invariants expected, and added a testcase that previously failed. 12 November 2021, 20:56:57 UTC
8c2dd5f One more ASAN fix (#6413) We must use use_global_gc = false to work properly with the JIT 12 November 2021, 20:34:14 UTC
0153c6b Revamp Hannk IR (#6379) Refactor Hannk IR and transforms to use a Mutator-based approach 12 November 2021, 16:35:37 UTC
79da2a0 Fix broken ASAN code (#6408) * Fix broken ASAN code Various changes and merges ended up with us using multiple ASAN passes, which was pretty crashy (we just didn't notice because it isn't tested well enough on our buildbots, but is elsewhere). I think we really only want to use the ModuleAddressSanitizerPass (not the non-Module version), which is what Clang does. * set UseAfterScope = true 12 November 2021, 16:34:30 UTC
02a394d x86_cpuid_halide must preserve all 64 bits of rbx/rsi (#6409) The existing code attempts to preserve ebx (since the cpuid instruction can trash it), but it only preserves the lower 32 bits; on 64-bit systems, this (amazingly) usually works OK unless you are compiling in (e.g.) ASAN mode, which can subtly change codegen such that the full 32 bits of rbx must be preserved. I'm genuinely astonished this hasn't bitten us before now! 12 November 2021, 03:25:52 UTC
d763406 Change implementation of round_f* in CodeGen_C to use nearbyint to match CodeGen_LLVM (#6406) 12 November 2021, 01:30:05 UTC
9ff87ce _halide_buffer_crop() needs to check for runtime failures (v2) (#6403) * _halide_buffer_crop() needs to check for runtime failures (v2) (Alternate to #6402) We currently assume that _halide_buffer_crop() will never fail. This is a bad assumption, as it can call device_crop(), which can fail due to unexpected runtime errors, or from a backend simply leaving the device_crop field at the default (unimplemented) case (as is currently the case for the OGLC backend). When this happens, the dst buffer was left in an inconsistent, invalid state (which was what led to the crashes fixed by #6401). This change modifies _halide_buffer_crop() to return nullptr in the event of an error, and ensure that all cropped buffers are checked for null at the right point. (This is not optimal, of course, since the specific error returned by device_crop is getting dropped on the floor, but the existence of an error is no longer ignored.) This addresses at least some of the failure issues we are seeing in performance_async_gpu with the OpenGLCompute backend. (Also: drive-by whitespace fix in CodegenC) * Oops 11 November 2021, 18:04:09 UTC
d343e76 Fix obscure bug in widening let substitution (#6405) Fix obscure bug in widening let substitution 11 November 2021, 17:06:00 UTC
8e34a35 Remove halide_abort_if_false() usage in runtime/metal (#6398) * Remove halide_abort_if_false() usage in runtime/metal This converts all the usage of `halide_abort_if_false()` in runtime/metal into either an explicit runtime check-and-return-error-code (if the check looks plausible), or `halide_debug_assert()` (if the check seems to be stating an invariant that shouldn't be possible in well-structured code). These changes are admittedly subjective, so feedback is especially welcome. Also, driveby change to sync-common.h to use `halide_debug_assert()` rather than a local equivalent. * nits 09 November 2021, 23:10:40 UTC
4f70271 Add defensive checks to halide_buffer_copy_already_locked (#6401) Found while debugging crashes with performance_async_gpu for OpenGLCompute: the 'if' tree wasn't robust enough for malformed buffers being passed, and could attempt to deref and use a null src->device_interface or dst->device_interface in some cases. This patch just improves this function to return an error in these cases (rather than crashing); the fact that we are getting malformed buffers passed to us is likely a separate bug. 09 November 2021, 22:51:05 UTC
b189722 [hannk] Upgrade hannk to use TFLite 2.7.0 by default (#6393) * [hannk] Upgrade hannk to use TFLite 2.7.0 by default * Fix unused-vars warnings 09 November 2021, 21:35:24 UTC
b021f87 Move PyTorch test into standalone tests (#6397) * Move PyTorch test into standalone tests It doesn't need to be internal. Also simplified to use only public API, updated the expected correctness, and avoided the need to have cuda present on the system to test for cuda output (since we can cross-compile to generate the C++ output anywhere). * fixes * Fix Windows text file endings * Update pytorch.cpp * Update pytorch.cpp 09 November 2021, 21:25:26 UTC
4286c78 Drop support for LLVM11 (#6396) * Drop support for LLVM11 With Halide 13 released, we should drop support for LLVM11 in Halide trunk, since we only promise to support LLVM trunk + two releases. * Update packaging.yml * Update config.cmake * Update CMakeLists.txt 09 November 2021, 17:13:09 UTC
d3ea755 Fix OGLC debug builds (#6399) If you try to build and run something with `openglcompute` and `debug`, you may crash with a div-by-zero, because the openglcompute runtime never calls `halide_start_clock()`, and all implementations of `halide_current_time_ns()` assume that it has been called. On (e.g.) OSX, this results in div by zero. This fixes it by inserting the correct call into openglruntime.cpp, and also adding debug-only asserts to all the `halide_current_time_ns()` implementations. (I was tempted to fix this by removing `halide_start_clock()` entirely and just lazily initing the initial value in `halide_current_time_ns()`, but I figured that would likely get pushback...) 09 November 2021, 17:03:17 UTC
d6f1345 Rename halide_assert -> halide_abort_if_false (#6382) * Rename halide_assert -> HALIDDE_CHECK A crashing bug got mistakenly inserted because a new contributor (reasonably) assumed that the `halide_assert()` macro in our runtime code was like a C `assert()` (i.e., something that would vanish in optimized builds). This is not the case; it is a check that happens in all build modes and always triggers an `abort()` if it fires. We should remove any ambiguity about it, so this proposes to rename it to somethingmore like the Google/Abseil-style CHECK() macro, to make it stand out more. (We may want to do a followup to verify that all of the uses really are unrecoverable errors that aren't better handled by returning an error.) * clang-format * Fix for top-of-tree LLVM * Fix for older versions * HALIDE_CHECK -> halide_abort_if_false * Update runtime_internal.h 08 November 2021, 23:13:13 UTC
1312817 Clean up CodeGen_LLVM names to match ASAN nomenclature changes (#6395) 08 November 2021, 22:01:29 UTC
6071cf6 Check results of all runtime function calls (#6389) * Check results of all runtime function calls This cherry-picks just the changes to callsites internal to Halide (and tests) from #6388. (It doesn't attempt to annotate runtime functions to enforce checking the results.) * Update write_debug_image.cpp * Add checks + comment to buffer_copy_aottest * Add comment to gpu_object_lifetime_aottest * Update memory_profiler_mandelbrot_aottest.cpp * Update user_context_insanity_aottest.cpp * Update process.cpp 08 November 2021, 20:13:44 UTC
a798909 Add halide_debug_assert() macro (#6390) * Add halide_debug_assert() macro Also convert usage of halide_assert()/HALIDE_CHECK() in hashmap.h and gpu_context_common.h to halide_debug_assert(), as all the usages looked to be appropriate for debug-mode only. (Rebased version of #6385, which this replaces) * appease clang-format 08 November 2021, 20:13:28 UTC
656c6b5 [hannk] Have CMake emit .s, .stmt, .ll files (#6392) 04 November 2021, 23:00:28 UTC
26ccb54 Support vectorized Select in OpenGLCompute backend (#6371) The ternary operator in GLSL does not work with vector types. While the mix function have overloads to boolean vectors, it is only supported in version 4.5, so it is not exactly portable. To work around this, we use the ternary operator on all elements of the vector type. Necessary for #6348. 04 November 2021, 00:35:13 UTC
c005b9f Support vectorization in OpenGLCompute backend (#6348) * Support vectorization in OpenGLCompute backend This patch adds support for vector load and store operations. First, a pass identifies the buffers whose loads and stores are all dense, aligned, and have the same number of lanes. Such buffers are declared with a vector base type and accessed accordingly. Loads and stores that do not satisfy those criteria are implemented as gathers and scatters from buffers whose base type is scalar. Resolves #4976. Partially resolves #1687. * Move buffer name instead of copy (clang-tidy) 04 November 2021, 00:32:20 UTC
657bb03 Fix for top-of-tree LLVM (#6386) * Fix for top-of-tree LLVM * Fix for older versions 03 November 2021, 23:29:38 UTC
76315a2 Vectorize Ramp in OpenGLCompute backend (#6372) Currently, ramps are generated as a number of independent scalar expressions that are finally gathered into a vector. For instance, indexing in vectorized code is filled with ramps like the following: ``` int _11 = int(1) * int(1); int _12 = _10 + _11; int _13 = int(2) * int(1); int _14 = _10 + _13; int _15 = int(3) * int(1); int _16 = _10 + _15; ivec4 _17 = ivec4(_10, _12, _14, _16); ``` This patch simplifies the generated code using a multiply add expression on a vector containing an arithmetic expression, such that the code is as follows: ``` ivec4 _11 = ivec4(0, 1, 2, 3) * int(1) + _10; ``` This is more performant due to vectorization, more compact, and more readable because the base and the stride are easily identifiable. 03 November 2021, 22:54:44 UTC
2cf3afb [hannk] Fix MeanOp (#6336) * [hannk] Fix MeanOp The `reducing()` method didn't handle negative values for indices, and didn't reverse the value of the axis as we do elsewhere, so results were incorrect. Also, we now parse and save the value of `keep_dims`, though I can't find evidence that it does much of anything: test cases pass different values for it but none of them fail (even though we ignore it), and at least one reference implementation I see doesn't seem to do anything with it. * Remove keep_dims handling for MeanOp 03 November 2021, 22:47:03 UTC
7ec8d70 Convert various halide_assert -> static_assert (#6383) The type-size checks in d3d12compute.cpp don't need to be runtime checks. 03 November 2021, 22:19:15 UTC
a227440 Remove halide_assert() from halide_default_device_wrap_native (#6381) This was inserted in https://github.com/halide/Halide/pull/6310, probably mistakenly, since `halide_assert()` in the Halide runtime is *not* a debug-only assertion). Instead of a controlled runtime failure, we just abort, which is not OK. 03 November 2021, 20:55:28 UTC
415ce0c Fix empty INSTALL_COMMAND in hannk super-build (#6387) * Fix empty INSTALL_COMMAND in hannk super-build * Fix 3.16 missing command * Fix the fix... 03 November 2021, 20:27:46 UTC
0d6b0f5 Fix for top-of-tree LLVM (#6380) 03 November 2021, 16:18:45 UTC
ac2673b Add super-build for cross-compiling HANNK (#6374) * Add super-build for cross-compiling HANNK * Relax CMake version 03 November 2021, 00:57:12 UTC
6070821 Update README for Halide 13. (#6378) 02 November 2021, 19:42:02 UTC
5b8f473 Fix for the crash from #6367 (#6375) * Skip empty boxes * Address the comments 02 November 2021, 15:36:19 UTC
4225eba Add helper for cross-compiling Halide generators. (#6366) * Add helper for cross-compiling Halide generators. Created a new function, `add_halide_generator`, that helps users write correct cross-compiling builds by establishing the following convention for creating a generator named `TARGET`: 1. Define Halide generators and libraries in the same project 2. Assume two builds: a host build and a cross build. 3. When creating a generator, check to see if we can load a pre-built version of the target. 4. If so, just use it. 5. If not, make sure the full Halide package is loaded and create a target for the generator. a. If `CMAKE_CROSSCOMPILING` is set, then _warn_ the user (the variable is unreliable on macOS) that something seems fishy. b. Create export rules for the generator. It creates a package `PACKAGE_NAME` and appends to its `EXPORT_FILE`. c. Create a custom target also named `PACKAGE_NAME` for building the generators. d. Create an alias `${PACKAGE_NAMESPACE}${TARGET}`. 6. Users are expected to use the alias in conjunction `add_halide_library`. Users can test the existence of `TARGET` to determine whether a pre-built one was loaded (and set additional properties if not). 7. Setting `${PACKAGE_NAME}_ROOT` is enough to load pre-built generators. `PACKAGE_NAME` is `${PROJECT_NAME}-halide_generators` by default. `PACKAGE_NAMESPACE` is `${PROJECT_NAME}::halide_generators::` by default. `EXPORT_FILE` is `${PROJECT_BINARY_DIR}/cmake/${PACKAGE_NAME}-config.cmake` by default. Users are free to avoid the helper if it would not fit their workflow. * Make HANNK use the new add_halide_generator helper 01 November 2021, 23:03:09 UTC
f5ce5f3 [hannk] Clean up aliasing (v2) (#6364) * wip * [hannk] Clean up aliasing (v2) The code for aliasing tensors was janky. This cleans it up and makes a clear distinction between aliasing done to overlay buffers with crop-and-translate, vs the aliasing done when we reshape tensors. We no longer allow a given tensor to do both of these, and we give preference to Reshape aliasing first. (Cherry-picked from #6321) * Move alias_type into shared ptr 01 November 2021, 20:40:36 UTC
1a1c97f [hannk] Add support for building/running for wasm (#6361) * [hannk] Allow disabling TFLite+Delegate build in CMake Preparatory work for allowing building of hannk with Emscripten; TFLite (and its dependees) problematic to build in that environment, but this will allow us to build a tflite-parser-only environment. (Note that more work is needed to get this working for wasm, as crosscompiling in CMake is still pretty painful; this work was split out to make subsequent reviews simpler) * [hannk] Add support for building/running for wasm * HANNK_BUILD_TFLITE_DELEGATE -> HANNK_BUILD_TFLITE * Use explicit host build strategy for cross compiling HANNK (#6365) * Ignore local emsdk clone * Fix usage of CMAKE_BUILD_TYPE * Only print the Halide target info once per CMake run * Fix Halide "cmake" target detection for Emscripten * Prefer target_link_options to _link_libraries when applicable * Validate, rather than find, NODE_JS_EXECUTABLE (set by emsdk) * Emscripten already wraps tests with node. * Add dependency on Android logging library. * For cross-compiling, find host tools instead of recursive call. Rather than shelling out via execute_process and potentially guessing the toolchain options wrong, expect to find our host tools (i.e. generators) in a package called "hannk_tools". The package is created by the host build via the CMake export() command. Importing this package in the cross build creates IMPORTED targets with the same names as our generators. We then use these generators rather than creating generators for the target build. * Rework cross-compiling script. * Respond to (easy) reviewer comments. * Add HANNK_AOT_HOST_ONLY option. Use in script. * [hannk] tests should only process .tflite files (#6368) currently, random dotfiles (e.g. .DS_Store on OSX) can creep in, causing bogus failures * Add comment about node wrapping. * Rename hannk_tools to hannk-halide_generators * Add comment about exporting targets. * Bump version to Halide 14.0.0 (#6369) Co-authored-by: Steven Johnson <srj@google.com> Co-authored-by: Alex Reinking <alex_reinking@berkeley.edu> 01 November 2021, 20:28:50 UTC
69d8ef0 Bump version to Halide 14.0.0 (#6369) 30 October 2021, 01:33:50 UTC
3c52df1 [hannk] tests should only process .tflite files (#6368) currently, random dotfiles (e.g. .DS_Store on OSX) can creep in, causing bogus failures 29 October 2021, 23:46:50 UTC
541bc37 [hannk] Allow disabling TFLite+Delegate build in CMake (#6360) * [hannk] Allow disabling TFLite+Delegate build in CMake Preparatory work for allowing building of hannk with Emscripten; TFLite (and its dependees) problematic to build in that environment, but this will allow us to build a tflite-parser-only environment. (Note that more work is needed to get this working for wasm, as crosscompiling in CMake is still pretty painful; this work was split out to make subsequent reviews simpler) * Update hannk_delegate.h * HANNK_BUILD_TFLITE_DELEGATE -> HANNK_BUILD_TFLITE 28 October 2021, 21:14:42 UTC
e10f104 Update Emscripten settings (#6362) The settings we use to build C++ in wasm were slightly out of date now that we've updated our runtime to Node instead of d8. Also drive-by gitignore fix. 28 October 2021, 17:34:27 UTC
1c7388a Allow users to use their own cuda contexts and streams in JIT mode (#6345) * Deprecate JIT runtime override methods that take void * * Make it possible to use custom cuda contexts and streams in JIT mode * Clean up comments * Tolerate null handlers in the JITUserContext These can come up if a JITUserContext is passed to something like copy_to_device before getting fully populated by passing it to a call to realize. * Remove reliance on dlsym in test and reuse the runtime's name resolution mechanism instead * Handle case where cuda and cuda-debug runtime modules both exist This change means we'll only ever create one built-in cuda context in this circumstance. * Slight simplification * Improve comments 28 October 2021, 17:25:58 UTC
4f573bf Add missing widening_absd patterns (#6359) * Add missing widening_absd patterns * Add a comment 28 October 2021, 02:05:29 UTC
8f1ae2a Use Node instead of d8 for Wasm AOT testing (#6356) * Use Node instead of d8 for Wasm AOT testing This requires the right version of Node is installed on your system. Since EMSDK often puts a too-old version of Node in the path, allow overriding via an env var. * wip 27 October 2021, 20:37:00 UTC
34534f5 [hannk] Add missing call to Interpreter::prepare in benchmark app (#6358) 27 October 2021, 20:35:39 UTC
back to top