https://github.com/halide/Halide

sort by:
Revision Author Date Message Commit Date
bb689ee Redo CPU schedule for bilateral grid 17 March 2023, 22:24:03 UTC
badf486 Disable performance_boundary_conditions under WebGPU pending #7420 (#7435) 17 March 2023, 17:41:48 UTC
643b2f1 Modify runtime calls to always return a valid halide_error_code_t value (#7404) * Modify runtime calls to always return a valid halide_error_code_t value Currently, the return values from our runtime code are a mishmash -- there's lots of code that returns any random nonzero value to indicate an error. This isn't wrong per se, but it's not clean, and it's desirable that the return values are predictable. This PR doesn't change the call signature of any (public) Halide Runtime functions, but modifies the internal logic so that all return values are valid values of `enum halide_error_code_t`. Generally, there should be minimal change to the code otherwise, although I did leave in a few drive-by changes that I couldn't resist (e.g., better error-checking when dynamically loading symbols). My long-term goal here is to eventually propose changing the signature of runtime functions that return errors to actually return `enum halide_error_code_t`; as you might imagine, making that transition might be controversial for a number of reasons. This PR is intended to be a way to make such a future transition easier to reason about, while arguably improving the code quality of the runtime slightly. * tidy * Update opencl.cpp * trigger buildbots * trigger buildbots * Fix merge mistake * Update gpu_context_common.h * Update cuda.cpp * Werror * status_ * if-with-initializer format * Update cuda.cpp * Update opencl.cpp * Update cuda.cpp * Update device_interface.h * Update hexagon_cache_allocator.cpp * Update printer.h * remove prefixes * "device field is already non-zero" * Update opencl.cpp * Update cuda.cpp * Fix error spacing * trigger buildbots 16 March 2023, 21:42:34 UTC
88b3ef8 Split WebGPU runtime into two variants (#7248 workaround) (#7419) * Split WebGPU runtime into two variants (#7248 workaround) Halide promises that you can crosscompile to *any* supported target from a 'stock' build of libHalide. Unfortunately, the initial landing of WebGPU support breaks that promise: we compile the webgpu runtime support (webgpu.cpp) with code that is predicated on `WITH_DAWN_NATIVE` (for Dawn vs Emscripten, respectively). This means that if you build Halide with `WITH_DAWN_NATIVE` defined, you can *only* target Dawn with that build of Halide; similarly, if you build with `WITH_DAWN_NATIVE` not-defined, you can only target Emscripten. (Trying to use the 'wrong' version will produce link-time errors.) For people who build everything from source, this isn't a big deal, but for people who just pull binary builds, this is a big problem. This PR proposes a temporary workaround until the API discrepancies are resolved: - Compile the existing webgpu.cpp runtime *both* ways - in LLVM_Runtime_Linker.cpp, select the correct variant based on whether the Target is targeting wasm or not - Profit! This is a rather ugly hack, but it should hopefully be (relatively) temporary. * A few more fixes * Update HalideGeneratorHelpers.cmake * Update interpreter.cpp * Update interpreter.cpp 16 March 2023, 21:35:52 UTC
cc74ee8 Move long boilerplate C/C++ code into template files (#7426) * Move long boilerplate C/C++ code into template files Codegen_C has a couple of long strings with boilerplate code that is conditionally emitted; at least one of these is too long for a single string literal under MSVC. Let's try moving these into standalone files instead; this may make it easier to use conventional tooling on the C++ code, and make Codegen_C easier to read and think about. (Note that the not-yet-landed Xtensa branch should also use this approach, if we decide this approach is good, since it has even more such code.) TODO: probably would be good to augment `binary2cpp` to allow option comments in the source file that are stripped in the output file (e.g. "This file is used in CodeGen_C.cpp for blah blah blah, look out for blah") * Remove detritus 16 March 2023, 18:22:23 UTC
50f8c85 Disable correctness_atomics on Windows with Cuda, alas (#7423) (#7424) 15 March 2023, 17:16:01 UTC
ae59b91 Ignore assertions inside WebGPU kernels (#7418) This is the approach that (e.g.) the OpenCL backend takes to assertions inside kernel code. 14 March 2023, 20:56:33 UTC
e966163 Log name of failing function for !function_takes_user_context (#7417) 14 March 2023, 20:33:34 UTC
05fa61a Fix for top-of-tree LLVM (#7416) 14 March 2023, 18:40:40 UTC
9199849 A few minor cleanups in WebGPU backend (#7413) * A few minor cleanups in WebGPU backend Mostly just using using halide_error_code_t values everywhere. * trigger buildbots * Update webgpu.cpp 14 March 2023, 18:01:46 UTC
b63139e Update mini_webgpu.h with latest changes from Dawn (#7415) 14 March 2023, 17:01:25 UTC
d383cb9 Fix null device crash during WebGPU initialization (#7414) If RequestDevice fails, make sure we exit initialization early instead of trying to create a staging buffer with a nullptr device. 14 March 2023, 16:58:12 UTC
6d19cec printer.h should include HalideRuntime.h (#7412) printer.h uses uint64_t, so it needs to include something that ensures that type is defined. HalideRuntime.h is probably the right choice (since it always transitively includes runtime_internal.h when compiling runtime. 13 March 2023, 22:36:38 UTC
078465c Add initial support for WebGPU (#6492) * [WebGPU] Add runtime stubs and codegen skeleton All runtime functions are currently unimplemented, and the shader codegen just emits an empty compute shader. * [WebGPU] Implement lazy device initialization Sychronize device access using a WebGpuContext object, as is done in the other GPU runtimes. Device initialization is asynchronous, so we rely on Emscripten's Asyncify mechanism to wait for it to complete. * [WebGPU] Implement device release Release the device and adapter. * [WebGPU] Add scoped error handling mechanism WebGPU uses asynchronous callbacks for error handling, so we need to spin-lock while waiting for them to fire in order to determine success/failure status for any WebGPU APIs that we call. * [WebGPU] Implement device malloc/free Create a WGPUBuffer with Storage|CopyDst|CopySrc usages. We'll need a staging buffer to perform host<->device transfers, as a buffer used as a storage buffer cannot be mapped. * [WebGPU] Implement basic host<->device copies Use a staging buffer to copy data from the device, since we cannot map a buffer that is used a storage buffer. This logic will need an overhaul in order to support buffers that represent sub-regions of larger buffers. * [WebGPU] Implement halide_webgpu_device_sync Just wait for all submitted commands to complete. * [WebGPU] Implement shader compilation * [WebGPU] Implement core of WGSL codegen This implements enough of the WGSL codegen required to generate code for a 32-bit version of the apps/blur example. Buffer arguments are emitted as read_write storage buffers. Non-buffer arguments are collected into a structure and generated as a single uniform buffer. Workgroup sizes are currently required to be constant. This can relaxed in the future by using pipeline-overridable constants. * [WebGPU] Implement Cast node WGSL uses different syntax for type casts. * [WebGPU] Implement the float_from_bits() intrinsic * [WebGPU] Implement run function Create a compute pipeline, create a uniform buffer for non-buffer arguments, and dispatch a compute command to the queue. Does not handle workgroup storage yet. * [WebGPU] Move queue into WgpuContext class The queue handle received from wgpuDeviceGetQueue needs to be released, so it makes more sense to automatically get/release this handle in the WgpuContext constructor/destructor. * [WebGPU] Add support for JIT This requires a native implementation of WebGPU (e.g. Dawn or wgpu). * [WebGPU] Enable the gpu_only AOT generator test Link generator tests against the native WebGPU library if specified. * [WebGPU] Add support for targeting dawn-native We have to make this a compile-time switch for now, as the supported APIs currently differ between Dawn and Emscripten. We should be able to remove all of these conditionals when the native API stabilizes. * [WebGPU] Add support for AOT tests when using WASM Use Dawn's node bindings to run these tests, by pre-pending some initialization code to the script that nodejs will invoke. * [WebGPU] Print explicit types for let declarations This is makes the generated code a little more human-readable. * [WebGPU] Address first round of review comments * [WebGPU] Add copy of webgpu.h This is a verbatim copy of the Emscripten version, just without the include directives. * [WebGPU] Add comment about ASYNCIFY requirement * [WebGPU] Remove -Wno-atomic-alignment Tweak atomic primitive usage to avoid the need for this. * pacify clang-tidy * Fix more clang-tidy errors * Only use ASYNCIFY for tests when targeting WebGPU * Fix even more clang-tidy errors * [WebGPU] Add basic support to Makefile * [WebGPU] Don't wrap buffers inside structures This requirement has been removed from the WGSL specification, and the corresponding implementation changes have now landed in Dawn. * [WebGPU] Fix debug message tag * [WebGPU] Update WGPUErrorFilter enum in header * [WebGPU] Update WGSL attribute syntax The WGSL specification recently changed attribute syntax from [[attribute]] to @attribute. * [WebGPU] Add README_webgpu.md Explains how to configure Halide to target WebGPU for both Emscripten and Dawn native. Also lists several known limitations. * [WebGPU] Move native WebGPU library CMake logic This is the correct place, otherwise the link line order is wrong. * [WebGPU] Implement WGSL codegen for serial loops * [WebGPU] Implement WGSL codegen for Allocate Use array types for stack allocations. Leave GPU shared memory unimplemented for now. * [WebGPU] Implement WGSL codegen for Select Use the WGSL select builtin function, which supports bool vector conditions for component-wise selection too. * [WebGPU] Mark 64-bit types as unsupported * [WebGPU] Implement device_and_host_{malloc,free} Just use the default implementations. * [WebGPU] Fixed WGSL codegen for boolean vectors * [WebGPU] Implement f32 math intrinsics * [WebGPU] Implement inverse and inverse sqrt * [WebGPU] Fixup some errors in WGSL codegen * [WebGPU] Implement logical and/or for bool vectors * [WebGPU] Implement WGSL codegen for Broadcast node * [WebGPU] Implement WGSL codegen for Ramp node * [WebGPU] Emulate 8- and 16-bit integers Use atomics to emulate storage, and widen the values to 32-bits when operating on them. * [WebGPU] Avoid buffer name collisions Buffers are declared as global variables in WGSL, so prefix them with the kernel name to avoid collisions. * [WebGPU] Fix divide-by-power-of-two WGSL requires the RHS of a shift to be unsigned. * [WebGPU] Implement codegen for gpu_thread_barrier * [WebGPU] Implement WGSL codegen for Evaluate This fixes an issue with the halide_unused() intrinsic. * [WebGPU] Add support for shared memory This currently only supports statically-sized shared memory allocations; dynamically-sized allocations require using pipeline-overridable constants which are not yet implemented in Dawn. * [WebGPU] Fix 8/16-bit load/store emulation Loads and stores that implicitly casted to/from 32-bits were casting when they should not have been, and not casting when they should. * Use generic 64-bit support query in gpu_mixed_shared_mem_types This is more robust than checking for specific APIs. * [WebGPU] Fix object cleanup during teardown * [WebGPU] Do not re-allocate device buffers * [WebGPU] Mark maximum vector width as 4 in tests * [WebGPU] Add functions to object lifetime tracker * [WebGPU] Override FloatImm handling We need to explicitly suffix the integer literal with a `u` to make it unsigned. * [WebGPU] Scalarize predicated loads * [WebGPU] Implement if_then_else intrinsic This is generated when scalarizing predicated loads. * [WebGPU] Enable gpu_free_sync test * [WebGPU] Implement print_reinterpret Use WGSL's bitcast<> operator. * [WebGPU] Implement print_extern_call This is just a regular function call in WGSL. * Add missing include and namespace for isnan/isinf * [WebGPU] Avoid short-circuiting operators These can cause issues with WGSL's uniformity analysis. * [WebGPU] Use commas for struct member seperators * [WebGPU] Update API headers and usages Two API functions were renamed. * [WebGPU] Use CodeGen_GPU_C base class for codegen Introduce a new enum value for WGSL's vector syntax. * [WebGPU] Add warning for emulated narrow integers * [WebGPU] Update README with latest status * [WebGPU] Add support for non-contiguous copies Also adds support for buffer cropping/slicing. * [WebGPU] Fix clang-tidy error * [WebGPU] Use atomicCmpXchg for 8/16-bit emulation Halide assumes that write-write data-races are benign when both threads are writing the same value, but this is not true when those writes are implemented using atomicXor. We need to use atomicCompareExchangeWeak in a loop to perform this emulation instead. Unfortunately this makes things even slower :-( * [WebGPU] Support non-32-bit parameter types Expand them to 32-bits in the shader and the runtime. * [WebGPU] Fix mixed types in buffers The atomic emulation of narrow types shouldn't use the allocation type, since the same buffer may be re-used for multiple types. This means we also sometimes need to perform 32-bit accesses using atomics as well. Instead of using the buffer allocation type, we now pre-traverse the IR to check for accesses that will require emulation, and mark the corresponding buffer as such. * [WebGPU] Show validation errors for failed maps * [WebGPU] Round up buffer offsets and sizes The WebGPU API requires that these are multiples of 4. * [WebGPU] Update implementation status in README * [WebGPU] Replace @stage(compute) with @compute * [WebGPU] Polyfill the pow_f32 intrinsic The pow builtin in WGSL only has the correct semantics for x>0, so we need to emulate the behavior for the other cases. * [WebGPU] Skip the gpu_allocation_cache test Allocation cache is not yet implemented for the WebGPU backend, and so this test takes forever. * [WebGPU] Use builtins for inverse hyperbolics * [WebGPU] Map rint() to round() WGSL's round() builtin has round-to-nearest-even semantics. * [WebGPU] Set device lost callback This provides more information about events that cause the device to become inaccessible. * [WebGPU] Use i32 for bool parameters The bool type cannot be used in the uniform storage class in WGSL. * [WebGPU] Raise limits for buffer size and workgroup storage * [WebGPU] Update mini_webgpu.h Disable clang-format for this file. * [WebGPU] Add support for dynamic workgroups This is implemented using pipeline-overridable constants. * [WebGPU] Avoid using 'new' as an identifier This is a reserved keyword in WGSL. * [WebGPU] Do not merge workgroup allocations Since we promote 8- and 16-bit integers to 32-bit in workgroup memory, merging multiple different GPUShared allocations into a single one can cause different regions to overlap and data races ensue. * [WebGPU] Fix Emscripten support * [WebGPU] Use const for integer immediates This avoids some MSL compiler ICEs with the code generated by Dawn, and also makes it easier for the WGSL compiler to constant-fold these values. * [WebGPU] Squelch clang-tidy error * [WebGPU] Note Dawn's dependency on go in README * [WebGPU] Add links to Emscripten vs Dawn issue * [WebGPU] Show error finding WebGPU library fails * [WebGPU] Add link to issue about Windows support * [WebGPU] Rename roundUpToMultipleOf4 ...and use it in one place where we weren't. * [WebGPU] Add links to wrap_native issue * [WebGPU] Use debug_assert for some runtime errors * [WebGPU] Stop using designated initializers Add -Wc++20-designator flag to runtime build to prevent this from happening again. * [WebGPU] Update mini_webgpu.h * [WebGPU] Fix a validation issue with ToT Dawn A bitcast that produces a NaN is invalid in WGSL, so implement the `nan()` intrinsic via `float_from_bits()`. * Update README_webgpu.md * Add is_finite_f32, is_inf_f32, is_nan_f32 * Update isinf, isnan, isfinite; add inf_f32, neg_inf_f32 * correctness_isnan should be skipped for WebGPU * Update isnan.cpp * Update atomics_gpu_8_bit.cpp * Fix python_tutorial_lesson_10_aot_compilation_run * Partial fix for generator_aot_acquire_release() This adds the necessary (horrible hackery) to bring the WebGPU case in line with the other backends... but the test still fails, apparently due to the same copy-to-host bug we suspect for generator_aot_gpu_only.Pushing this anyway because it was annoying to write :-) * [WebGPU] Fix AOT test build for non-WASM * [WebGPU] Move README to root * [WebGPU] Address review comments * wip * Update CMakeLists.txt * Work-in-progress for generator_aot_gpu_multi_context_threaded * [WebGPU] Use a per-context staging buffer This fixes the generator_aot_gpu_multi_context_threaded tests. * [WebGPU] Fix clang-format issue * [WebGPU] Move staging buffer creation We shouldn't be doing this inside a callback as we use async error checking routines. * Revert "Merge branch 'webgpu' of https://github.com/jrprice/Halide into pr/6492" This reverts commit 9d79ac60153e12ed51ea9f1a91a544f3fc7ccc78, reversing changes made to 0673e6754e53e62cf6029efdb94a57447db2f03a. * Skip correctness_multi_way_select on x86 Macs (for now) * clang-format * [WebGPU] Recommit 742db3feb888394ff6529c357da3f32ae9286ea4 13 March 2023, 20:30:38 UTC
78097a7 Dont use negative values for exit() (#7405) * Dont use negative values for exit() A program that terminates via `exit(-1)` leaves `$? = 255` (per the exit(3) manpage), as the value is chopped with `& 0377`. This makes it hard to use `git bisect` to track down bugs, as considers any exit code > 127 (or = 15, oddly enough) equivalent to an `abort()` and terminates the bisect. IMHO this is perverse behavior on the part of `git bisect`, but it is what it is, so let's revamp our tests to avoid calling `exit(-1);`. * trigger buildbots 13 March 2023, 17:01:35 UTC
822f5da Don't return negative values from main() (#7406) * Don't return negative values from main() A program that returns `-1` from `main()` leaves $? = 255 (per the exit(3) manpage), as the value is chopped with & 0377. This makes it hard to use git bisect to track down bugs, as considers any exit code > 127 (or = 15, oddly enough) equivalent to an abort() and terminates the bisect. IMHO this is perverse behavior on the part of git bisect, but it is what it is, so let's revamp our tests to avoid returning `-1` from main to indicate failure, and return `1` instead. * trigger buildbots * validate_gpu_object_lifetime() handling 13 March 2023, 17:00:50 UTC
f76bcc8 Cleanup of src/runtime/internal and test/runtime (#7399) * Cleanup of src/runtime/internal and test/runtime - Don't include .cpp files. - Don't use header-only "libraries" that rely on include order or being included only once (to wit: test/runtime/common.h -> common.cpp) - All files should explicitly #include what they need, even if they think it's already included (ie, order of include files should not matter) - In src/runtime/internal, change all `halide_abort_if_false` -> `halide_debug_assert` - in test/runtime, add HALIDE_CHECK to common.h and use it for tests instead of `halide_abort_if_false` * 32-bit fixes 12 March 2023, 17:49:46 UTC
0beb081 docs: Use halide.imageio instead of imageio (#7409) * docs: Use halide.imageio instead of imageio * trigger buildbots --------- Co-authored-by: Steven Johnson <srj@google.com> 11 March 2023, 20:45:17 UTC
4bf660e Fix overflow in x86 absd lowering (#7407) * Fix overflow in x86 absd lowering * Fix default lowering too 11 March 2023, 00:53:44 UTC
ead83e6 Docs/update readme python (#7402) * docs: Update simple usage in README_python.md * docs: Add imports 10 March 2023, 18:23:21 UTC
b02de4b Update all use of actions/checkout to v3 (#7400) 09 March 2023, 17:46:43 UTC
6a6c842 Backport changes to CodegenC from the xtensa branch (#7396) Backported with some trivial style changes (virtual method instead of flag, since this should never changes for a given instance) 07 March 2023, 23:56:03 UTC
a2df5bd Attempt to give proper labels to pip dev builds (v2) (#7395) A much-simpler alternative to #7394 07 March 2023, 19:29:08 UTC
4a80251 Destringify CanonicalizeGPUVars (#7386) * Destringify CanonicalizeGPUVars This new implementation takes the high-water marks of each type of GPU loop, instead of filtering using the prefix of the loop name. * Better comments * fix typo 03 March 2023, 20:57:47 UTC
aa8fcad hannk: Provide weak symbol functions to use op profiling (#7388) * hannk: Provide weak symbol functions to use op profiling You can add your own profiler with strong symbols. * hannk: Guard profiler feature with HANNK_PROFILER 03 March 2023, 20:56:52 UTC
387a19c Make README consistent on supported LLVM versions (#7390) Make README consistent with https://github.com/halide/Halide/pull/7093 03 March 2023, 01:10:50 UTC
122b5b6 hannk: Add device_sync method hannk::Tensor class (#7387) hannk: Add devic_sync method hannk::Tensor class This method is useful to debug performance with synchronous execution. 01 March 2023, 22:22:20 UTC
303a90c Minor cleanup of GPUCompilationCache (#7376) * Minor cleanup of GPUCompilationCache While tracking down an apparently-unrelated threading bug in the webgpu backend, I made some tweaks to this code that I think are worth keeping. The main one of importance is that `release_hold()` and `release_context()` really should acquire the mutex -- they weren't before -- so now all public methods are properly mutexed. The other changes are mostly cosmetic: - Moved helper methods to be private rather than public - Changed the id value size to be `uintptr_t` rather than `uint32_t`; the space allocated for them is sizeof(void*). (Not sure this moves the needle but it felt right.) - Removed unused ctor for CachedCompilation Also, drive-by change in printer.h to capture some logging improvements. * Fix deadlock release_all() and release_context() were contending for the mutex 01 March 2023, 17:21:48 UTC
fba892a Specify a full type for llvm::IRBuilder (#7384) Specify full type for llvm::IRBuilder 01 March 2023, 00:29:42 UTC
5c02ae2 Bounds visitor for div was missing single_point mutated case (#7379) * Bounds visitor for div was missing single_point mutated case Signed-off-by: Adrian Lebioda <adrian.lebioda@hexagon.com> * Add test Signed-off-by: Adrian Lebioda <adrian.lebioda@hexagon.com> --------- Signed-off-by: Adrian Lebioda <adrian.lebioda@hexagon.com> Co-authored-by: Adrian Lebioda <adrian.lebioda@hexagon.com> 28 February 2023, 02:14:41 UTC
bdba694 Add Callable default ctor + `defined()` method (#7380) * Add Callable default ctor + `defined()` method This allows it to behave like * Add user_assert + test 28 February 2023, 02:14:24 UTC
c42a5b2 Remove a gross hack from gpu_only_aottest (#7378) * Remove a gross hack from gpu_only_aottest Also add metal support * Add missing include 27 February 2023, 16:46:41 UTC
09400f6 Bounds visitors for min/max were missing single_point mutated case (#7377) * Bounds visitors for min/max were missing single_point mutated case Partially fixes #7374 * Add test 25 February 2023, 17:16:32 UTC
b6a18b8 Update WABT to 1.0.32; Increase stack size for WASM AOT apps (#7373) 23 February 2023, 19:36:32 UTC
144c1a4 correctness_round should use Target::supports_type() (#7372) This gives it proper support for new GPU backends 23 February 2023, 17:59:39 UTC
b17806d Use HalideFreeHelper for the register_destructor (#7371) Slightly cleaner code. Also, drive-by change of NULL -> nullptr 23 February 2023, 17:59:13 UTC
629da52 Use single-char form of `unique_name` for semaphores (#7370) The multi-char form of `unique_name` will append a `$` to the identifier, e.g. `sema$4`. This isn't really legal for a C/C++ identifier. 23 February 2023, 17:58:37 UTC
386a2d1 Clean Up HalideFreeHelper code (main) (#7369) * Clean Up HalideFreeHelper code (main) - Revise HalideFreeHelper to be a templated struct, to save the unnecessary stack storage for the function - Add emit_halide_free_helper() method to consolidate usage * Update CodeGen_C.cpp 22 February 2023, 23:39:25 UTC
3246844 Use a std::unique_ptr for the IR Builder (#7356) * Use a std::unique_ptr for the IR Builder instead of a raw owning pointer * Use make_unique 21 February 2023, 21:18:30 UTC
e19a036 Overflow on casts is fine for ints < 32 bits (#7366) 21 February 2023, 17:28:43 UTC
ccc085a Update CMakePresets.json to use VS2022 instead of VS2019 (main) (#7359) 16 February 2023, 19:54:27 UTC
b65ea62 Remove unused code in VectorizeLoops (#7354) 15 February 2023, 18:25:04 UTC
18eb7d8 Permit vectorization of non-recursive atomic operations (#7346) * Vectorization of non-recursive atomic operations * Remove dead Vars 15 February 2023, 17:18:56 UTC
e5ed226 Fix Python error handling (#7352) * Fix Python error handling Error handling in the Python bindings wasn't quite right for JIT: We previously replaced halide_error() to throw a C++ exception. Sounds good, but unfortunately, doesn't work reliably: if called from jitted code (which doesn't know about C++ exceptions), the throw statement may be unable to find the enclosing try block (which is outside jitted code), meaning it will call std::terminate. Now, instead, we just leave the JIT error handler unset, and call with an explicit JITUserContext with a custom print handler; in theory, this meant that the code in JITFuncCallContext::finalize() would check for an error after the call into jitted code, and call `halide_runtime_error` if so (which would then trigger an all-in-C++-exception). Unfortunately... (2) JITFuncCallContext is broken by design; it mutates the input JITUserContext, so that trying to use the same JITUserContext for two calls in a row leaves you with a JITUserContext with (at least) the error_handler set. Since at least one of the realize() calls does this twice (once for bounds query, once for execution), this means that an error in the second call would never be seen, since finalize() only reported errors if there wasn't a custom error handler on input. Per @abadams suggestion, we work around this by treating 'JITErrorBuffer::handler' as 'no custom error handler', which is mostly true. (But really, JITFuncCallContext and JITUserContext are a hard-to-reason-about mess and arguably need to rethought entirely.) (3) Removed entirely-unnecessary overrides of runtime print and error handlers from PyStubImpl; despite the comments, this code is unnecessary. * format 15 February 2023, 17:03:41 UTC
7963cd4 Change early-bound default args in Python bindings to late-bound (#7347) In PyBind11, if you specify a default argument for a method, it is evaluated when the Python module is initialized, *not* when the method is called (as you might expect in C++). For defaults that are just constants/literals, this is no big deal, but when calling get_*_target_from_environment, this means it is called at module init time -- also normally not a big deal (since the values ~never change at runtime anyway), with one big exception (no pun intended): if the function throws an exception (e.g. via calling user_assert() or similar), that exception is thrown at Module-initialization time, which is a much more inscrutable crash, and one that is very hard to recover from. This may seem unlikely, but can happen pretty easily if you set (say) HL_JIT_TARGET=host-cuda (or other gpu) and the given GPU runtime isn't present on the given system; the current behavior is basically "make if impossible for the libHalidePython bindings to run", whereas what we want is "runtime exception thrown when you call the method". This changes the relevant methods to use `Target()` as the default value, and inside the method wrapper, if the value passed equals `Target()`, it replaces the value with the righ `get_*_target_from_environment()` call. (This turned up while doing some testing of https://github.com/halide/Halide/pull/6924 on a system without Vulkan available) 14 February 2023, 17:14:05 UTC
8bd07fb Fix tuple output bounds checks (#7345) Fix #7343 Tuple outputs weren't getting appropriate bounds checks due to overzealous culling of uninteresting code in the add_image_checks pass. 14 February 2023, 01:52:53 UTC
22aed20 Devirtualize the protected compile() methods in Codegen_C (#7341) With the addition of `preprocess_function_body()`, neither of these need to be virtual, and devirtualizing them avoid `hidden overloaded virtual function` warnings in subclasses that don't override them 11 February 2023, 02:42:05 UTC
6c5ca8e Tiny improvements in codegen in C backend (#7337) * Tiny improvements in codegen in C backend (1) Emit `true` or `false` instead of `(bool)(0ull)` etc for bool literals (2) Avoid redundant temporaries in print_cast_expr(), which occur in a small but nonzero number of cases Basically this means that code currently like ``` bool _523 = (bool)(0ull); bool _524 = (bool)(_523); ... foo(_524); ``` becomes ``` foo(false); ``` ...I'm sure this has no output on final object code, but it makes the generated C code less weird to read. * Also avoid extra intermediates for typed nullptr * Also use std::isnan() and std::isinf() * Update CodeGen_C.cpp 11 February 2023, 00:34:41 UTC
a6c5be7 Add a hook to Codegen_C::compile() (#7335) At least one subclass of Codegen_C currently has to replicate ~all of the compile(LoweredFunc) method, with the result that it has often gone stale (and still is stale) wrt changes in the base; this adds an optional method to allow some modifications to the function body just before it is printed, to avoid redundant code. 10 February 2023, 21:28:51 UTC
88d40c2 Fix issue in find_package in cross-compilation for no OS (#7282) When using toolchain where Threads libs are not available, which is the case in baremetal target cross-compilation, we were not able to load even HalideHelpers pacakge. Co-authored-by: Alex Reinking <alex.reinking@gmail.com> 10 February 2023, 18:07:56 UTC
35322c3 Fix a subtle uninitialized-memory-read in Buffer::for_each_value() (#7330) * Fix a subtle uninitialized-memory-read in Buffer::for_each_value() When we flattened dimensions in for_each_value_prep(), we would copy from one past the end, meaning the last element contained uninitialized garbage. (This wasn't noticed as an out-of-bounds read because we overallocated in structure in for_each_value_impl()). This garbage stride was later used to advance ptrs in for_each_value_helper()... but only on the final iteration, so even if the ptr was wrong, it didn't matter, as the ptr was never used again. Under certain MSAN configurations, though, the read would be (correctly) flagged as uninitialized. This fixes the MSAN bug, and also (slightly) improves the efficiency by returning the post-flattened number of dimensions, potentially reducing the number of iterations f for_each_value_helper() needed. * Oopsie * Update HalideBuffer.h * Update HalideBuffer.h 10 February 2023, 00:22:10 UTC
ae3f401 Explicitly remove -D_GLIBCXX_ASSERTIONS from LLVM definitions (#7332) Explicitly remove -D_GLIBCXX_ASSERTIONS from LLVM definitions as a workaround for https://reviews.llvm.org/D142279 09 February 2023, 23:08:37 UTC
4156c5a Allow _Float16 as alias for float16_t in halide_type_of<>() (#7325) (#7326) 09 February 2023, 21:51:22 UTC
734e34a Remove deprecated `HVX_shared_object` feature (#7331) This has been marked 'deprecated' for quite a while, and has no affect on codegen or, well, anything else. Let's remove it. 09 February 2023, 17:59:12 UTC
0f6003e Float16: Remove unused header dependency (#7324) IRMutator.h is not needed for the Float16.h. 08 February 2023, 20:26:31 UTC
c3f3318 Fixes for top-of-tree LLVM (#7329) * Fixes for top-of-tree LLVM * fix * times ten * Update LLVM_Output.cpp 08 February 2023, 20:25:37 UTC
ddb515a Improve support for Arm baremetal compilation and runtime (#7286) * Improve support for Arm baremetal compilation and runtime - Add Target feature "semihosting" mode for baremetal runtime - Fix error of aligned_alloc() when compiled by Arm GNU toolchain * Modify comments for Target feature semihosting * Add an example app to guide cross-compilation for baremetal target * Update build steps in HelloBaremetal * Fix line-ending * Set CMake variable BAREMETAL in toolchain file 07 February 2023, 18:41:04 UTC
34d256f Make auto scheduler libs available in HalideHelpers package (#7285) * Make auto scheduler libs available in HalideHelpers package find_package(HalideHelpers) allows us to use add_halide_library(). But auto scheduler libs are not available unless they are in Halide-Interfaces.cmake. Note: Those libraries are not actually linked to the target application, but need to be available for add_custom_command call. 07 February 2023, 18:40:22 UTC
0c7722f Add buffer sync methods hannk::Tensor class (#7323) Add few methods for GPU memory interaction. 07 February 2023, 17:37:09 UTC
0b7379f Warn emulated float16 equivalent is generated (#7307) * Warn emulated float16 equivalent is generated 07 February 2023, 17:08:32 UTC
a55a09a Fix Halide cross-compilation (#7073) Use CMAKE_CROSSCOMPILING_EMULATOR for llvm-as and clang imported targets 07 February 2023, 14:17:58 UTC
1ad328a Fix LLVM 17+ build integration on 32-bit systems (#7322) * Fix LLVM 17+ build integration on 32-bit systems Fixes #7319 * add detail and precision to comment 07 February 2023, 01:18:21 UTC
91f3ac0 Fix segfault by nonconstant bound in Adams2019 (#7321) Fix segmentation fault in Adams2019 in case the estimate or bound of Func is set to nonconstant Expr. 06 February 2023, 22:23:47 UTC
01f9e2d Replace some push_backs with emplace_back (#7317) 06 February 2023, 19:05:01 UTC
e9aecee Make visit_leaf() public in hannk/ops.h (#7318) * Make visit_leaf() public in hannk/ops.h This makes it easier for downstream code to experiment with adding ops * Update ops.h 03 February 2023, 21:17:47 UTC
0782d80 Make Callable::call_argv_fast public (#7315) * Make Callable::call_argv_fast public * Add rough specification of the calling convention * Fix a typo 01 February 2023, 18:24:01 UTC
beba53a halide_popcount<uint64_t> is broken (#7313) Would not compile for Win32 or any other compiler without __builtin_popcountll available. (How did this get checked in without being tested on MSVC?) 31 January 2023, 21:08:15 UTC
fe76ab2 Minimal updates to allow Halide building with LLVM17 (#7309) * Minimal updates to allow Halide building with LLVM17 (Opening as draft initially until Buildbots build the new LLVM versions) * trigger buildbots 30 January 2023, 22:01:42 UTC
dd973f4 Improved halide_popcount (#7225) * Improved halide_popcount * reused popcount64 from Utils.cpp in CodeGen_C * Fixed comment for popcount 25 January 2023, 21:40:54 UTC
810bd0b Hoist vector slices using rewrite rules (#7243) * Hoist slices using rewrite rules This lets us add associative variants more easily, which are helpful in the work on staging strided loads. * Don't hoist extract_element shuffles The Shuffle visitor wants to sink them * Add some static asserts * Add explanatory comment on shuffle hoisting * Fix comment * add lanes predicate to slice hoisting * add vector slice hoisting test cases Co-authored-by: Steven Johnson <srj@google.com> Co-authored-by: Alexander <ajroot@stanford.edu> 21 January 2023, 22:08:30 UTC
bafd60f [x86 & wasm] Split up double saturating-narrows from i32 (#7280) * better x86 double sat-cast + add test * fix wasm too + test Co-authored-by: Steven Johnson <srj@google.com> 20 January 2023, 18:03:25 UTC
c601e4e Add workaround for the const-or-not user_context issue (#635) (#7291) Add a workaround for the const-or-not user_context issue (https://github.com/halide/Halide/issues/635) 20 January 2023, 17:43:56 UTC
2cc0468 Fix issue in add_halide_runtime in cross-compilation (#7284) * Fix issue in add_halide_runtime in cross-compilation add_halide_runtime() tries to build generator executable, but it fails if we are working with cross-compiler toolchain. By using existing generator set as "FROM", we can work around this. 20 January 2023, 17:39:41 UTC
d44e99d Fix error of add_halide_generator in cross-compilation (#7283) In case the project name is CamelCase, add_halide_generator() was not able to find the generator package, because CMake searches <name>Config.cmake or <lower-case-name>-config.cmake 20 January 2023, 13:12:30 UTC
147ff48 Remove dependency on platform threads library (#7297) * Refactor internal ThreadPool.h into halide_thread_pool.h tool * Drop dependency of libHalide on threads library * Remove other redundant uses of Threads::Threads * Update CMake documentation. 20 January 2023, 12:54:34 UTC
314b2fd [HVX] Fix EliminateInterleaves (#7279) * fix EliminateInterleaves Co-authored-by: Steven Johnson <srj@google.com> 20 January 2023, 00:35:14 UTC
c9f3602 Remove the watchdog timer from generator_main(). It was intended to k… (#7295) Remove the watchdog timer from generator_main(). It was intended to kill pathologically slow builds, but in the environment it was added for (Google build servers), it ended up being redundant to existing mechanisms, and removing it allows us to remove a dependency on threading libraries in libHalide. 19 January 2023, 23:48:26 UTC
51a4f6c Emit prototypes for destructor functions in C Backend (#7296) We gathered up the destructors, but only emitted the prototypes if there was at least one non-C++ function declaration needed -- so if you built with cpp_name_mangling enabled, you might omit the right prototype. Fixed and added the right flag to a Generator test to tickle this behavior. 19 January 2023, 23:36:47 UTC
e8e1481 Drop support for MIPS (#7287) (#7289) * Drop support for MIPS (#7287) * Update Target.cpp 18 January 2023, 21:56:14 UTC
888c41c Add CMake support for C++ backend in test/generator (#7274) * Add support for C++ backend in test/generator When the CMake rules were rewritten a while back, the support for building/testing generators with the C++ backend (instead of the standard LLVM, etc) got lost. This adds it back in. Also made some drive-by fixes to the Makefile to enable some tests there that work correctly now. Also made a drive-by fix in in Codegen_C to fix allocation nodes that were just wrappers around buffer_get_host -- this prevented the cleanup_on_error test from building with the C++ backend. 18 January 2023, 00:47:52 UTC
0d43318 Optimize Module::compile() for some edge cases (#7269) * Optimize Module::compile() for some edge cases Avoid redundant `compile_to_buffer()` calls for output requests that can't possibly ever need them. * Avoid mutation 10 January 2023, 19:23:58 UTC
a8d88bb Use ::aligned_alloc() instead of std::aligned_alloc() in HalideBuffer.h (#7268) 10 January 2023, 17:51:38 UTC
eea7696 Update README_python.md (#7266) 09 January 2023, 17:58:04 UTC
c070bb8 Update change following LLVM WASM change f841ad30d77eeb4c51663e68efefdb734c7a3d07 (#7264) * Update change following LLVM WASM change https://github.com/llvm/llvm-project/commit/f841ad30d77eeb4c51663e68efefdb734c7a3d07 * Update checks conditional on LLVM version. 06 January 2023, 00:00:34 UTC
4b74049 Inline into extern function args during bounds inference (#7261) * Inline into extern function args during bounds inference Fixes #7260 * Run CSE once at the end * Actually recursively inline * clang-tidy * trigger buildbots * Make test invariant to the number of times the warning is printed as long as it's at least once Co-authored-by: Steven Johnson <srj@google.com> 05 January 2023, 21:12:50 UTC
04bb986 Conditional allocations shouldn't fail for size=0 in C++ backend (#7255) (#7256) * Conditional allocations shouldn't fail for size=0 in C++ backend (#7255) Allocations can be conditional; if the condition evaluates to false, we end up calling `halide_malloc(0)` (or `halide_tcm_malloc(0)` in the xtensa branch). Since it's legal via spec for `malloc(0)` to return nullptr, we need to be cautious here: if we are compiling with assertions enabled, *and* have a malloc() (etc) implementation that returns nullptr for alloc(0), we need to skip the assertion check, since we know the result won't be used. Note: a similar check will be inserted in the xtensa branch separately. Note 2: LLVM backend already has this check via Codegen_Posix.cpp * Update CodeGen_C.cpp 28 December 2022, 17:31:01 UTC
ade8b56 Remove deprecated halide_target_feature_disable_llvm_loop_opt (#7247) * Remove deprecated halide_target_feature_disable_llvm_loop_opt Was deprecated in Halide 15; let's remove in Halide 16 * trigger buildbots * trigger buildbots * Update CodeGen_LLVM.cpp 20 December 2022, 20:05:19 UTC
10345d4 Explicitly stage strided loads (#7230) * Add a pass to do explicit densification of strided loads * densify more types of strided load * Reorder downsample in local laplacian for slightly better performance * Move allocation padding into the IR. Still WIP. * Simplify concat_bits handling * Use evidence from parent scopes to densify * Disallow padding allocations with custom new expressions * Add test for parent scopes * Remove debugging prints. Avoid nested ramps. * Avoid parent scope loops * Update cmakefiles * Fix for large_buffers * Pad stack allocations too * Restore vld2/3/4 generation on non-Apple ARM chips * Appease clang-format and clang-tidy * Silence clang-tidy * Better comments * Comment improvements * Nuke code that reads out of bounds * Fix stage_strided_loads test * Change strategy for loads from external buffers Some backends don't like non-power-of-two vectors. Do two overlapping half-sized loads and shuffle instead of one funny-sized load. * Add explanatory comment to ARM backend * Fix cpp backend shuffling * Fix missing msan annotations * Magnify heap cost effect in stack_vs_heap performance test * Address review comments * clang-tidy * Fix for when same load node occurs in two different allocate nodes 16 December 2022, 17:56:08 UTC
382f813 Fix "may be used uninitialized" warnings in Codegen_C::print_scalarized_expr() (#7244) 16 December 2022, 17:54:21 UTC
da6746e correctness/exception.cpp needs to check HALIDE_WITH_EXCEPTIONS (fixes #7240) (#7241) correctness/exception.cpp needs to check HALIDE_WITH_EXCEPTIONS 14 December 2022, 05:29:45 UTC
1a4a469 Fix some sources of signed integer overflow in the compiler (#7231) * Fix some sources of signed integer overflow in the compiler Also, use compiler intrinsics when possible to handle overflow, as it generates faster code. * Fix msvc macro * Must use result * Actually perform the requested operation 13 December 2022, 16:11:54 UTC
533e6e5 Remove rogue string suffix in simd_op_check_arm.cpp (#7227) * Remove rogue string suffix in simd_op_check_arm.cpp Interestingly, it compiles here, but in some compilers it will fail with "unexpected token". * Update simd_op_check_arm.cpp 12 December 2022, 20:54:46 UTC
6ecdcbd Tighten alignment promises for halide_malloc() (#7222) This makes a couple of changes to the behavior/implementation of `halide_malloc()`: * Currently, halide_malloc must return a pointer aligned to the maximum meaningful alignment for the platform for the purpose of vector loads and stores. This PR also adds the requirement that the memory returned must be legal to access in an integral multple of alignment >= the requested size (in other words: you should be able to do vector load/stores "off the end" without causing any faults). * Currently, the `halide_malloc_alignment()` function is used to determine the default alignment; this cannot be overridden by user code (well, it can be, but the override will have no useful effect). It is intended to be "internal only" but is used in at least one place outside the runtime (apps/hannk). This change removes the call entirely, in favor of a call that is harder to access from outside the runtime and much less likely for end users to attempt to call. (It also changes apps/hannk to stop using it.) 11 December 2022, 18:05:55 UTC
16421a7 Revise simd_op_check tests to ignore HL_TARGET (#7207) (#7216) * Revise simd_op_check tests to ignore HL_TARGET (#7207) The simd_op_check tests have historically only run using the value of HL_TARGET, which mean that the coverage they had was low (since HL_TARGET is only set to values that are runnable on at least one buildbot). This change completely disconnects these tests from HL_TARGET; instead, each test now tests for a range of targets appropriate to the architecture being tested. On all platforms, they still compile to assembly and verify that the correct instructions are generated; additionally, if the host platform can JIT for the given target, it verifies that the results are as expected. * Update simd_op_check_riscv.cpp * Update simd_op_check_x86.cpp * Update simd_op_check_x86.cpp * Update simd_op_check_arm.cpp * Add more features that must match; re-enable the bfloat instructions * Update simd_op_check_x86.cpp * Update simd_op_check_riscv.cpp * trigger buildbots * Fix simd_op_check_wasm 09 December 2022, 17:21:30 UTC
ba31688 Increase __clang_major__ check in Float16.h to 16 (#7224) 09 December 2022, 01:22:54 UTC
066559b Remove check_jit_user_context() from V8 bindings (#7220) Obsolete code from early V8 work, it can trigger inappropriately in some corner-case scenarios. Remove it entirely to avoid false errors. 08 December 2022, 23:35:01 UTC
8fa8221 Fix bonehead version-checking test in HalideBuffer.h for Apple (#7218) 08 December 2022, 04:34:13 UTC
e8615bb clang-tidy: add [[maybe-unused]] to the DECLARE_NO_INITMOD stubs. (#7215) 08 December 2022, 01:22:38 UTC
a7fa32e Use aligned_alloc() as default allocator for HalideBuffer.h on most platforms (#7190) Use aligned_alloc() as default allocator for HalideBuffer.h on most platforms (See also https://github.com/halide/Halide/pull/7189) Modify H::R::Buffer to default to using `aligned_alloc()` instead of `malloc()`, except: - If user code passes a non-null `allocate_fn` or `deallocate_fn`, we always use those (and/or malloc/free) - If the code is compiling under MSVC, never use `aligned_alloc` (Windows doesn't support it) - If HALIDE_RUNTIME_BUFFER_USE_ALIGNED_ALLOC is defined to be 0, never use `aligned_alloc` (this is to allow for usage on e.g. older Android and OSX versions which don't provide `aligned_alloc()` in the stdlib, regardless of C++ versions.) Also, as with #7189, this ensures that the allocated space has the start of the host data as 128-aligned, and also now ensures that the size allocated 128-aligned (rounding up as needed). 07 December 2022, 17:31:01 UTC
8ce1212 Fix bitrot in PowerPC testing (#7211) * Fix bitrot in PowerPC testing (See #7208) - DataLayout was wrong (and has been for a long time) - simd_op_check_powerpc had errors. Some were easy to fix; the rest I commented out with a TODO since this backend doesn't appear to be in active use. (Want to fix this in preparation for fixing #7207) * Move x86 absd tests to the right place Co-authored-by: Andrew Adams <andrew.b.adams@gmail.com> 07 December 2022, 17:29:19 UTC
back to top