Revision history - refs/heads/srj/xbuf - origin: https://github.com/halide/Halide

visit type:

Revision	Author	Date	Message	Commit Date
61700af	Steven Johnson	31 March 2023, 16:48:29 UTC	Remove unused 'external_buffers' member in Codegen_Xtensa	31 March 2023, 16:48:29 UTC
9f58631	Steven Johnson	31 March 2023, 16:37:07 UTC	Minor improvements to Xtensa codegen (#7463) * Minor improvements to Xtensa codegen Harvesting some minor code improvements from an experiment that didn't get finished: - Move all the `is_native_xtensa_vector()` and related code from XtensaOptimize into CodegenXtensa, which is the only caller - Make these all member methods so they can just use the `get_target()` member, which was the only target they ever wanted - Simplify the implementation of them a bit, especially `is_native_xtensa_vector()`, which should still be equivalent but terser and smaller codegen - Use `halide_type_t` instead of `Halide::Type` in many places, since the former is smaller (exactly 32 bits) and is easier to hash; similarly, use `std::unordered_set` for the lookups we need. (Note that Halide::Type implicitly converts to halide_type_t at no cost.) * Update mini_webgpu.h	31 March 2023, 16:37:07 UTC
48903c4	Misha Gutman	30 March 2023, 16:50:27 UTC	[xtensa] fixed few correctness issues in codegen + added new correctness tests (#7444) * [xtensa] fixed few correctness issues * [xtensa] Fixed white space	30 March 2023, 16:50:27 UTC
33dee7f	Steven Johnson	29 March 2023, 21:19:55 UTC	Merge branch 'main' into xtensa-codegen	29 March 2023, 21:19:55 UTC
b6b15ac	Andrew Adams	29 March 2023, 21:17:06 UTC	Alternative approach to deprecating internal fixed-point intrinsics (#7461)	29 March 2023, 21:17:06 UTC
e0a1e65	Steven Johnson	29 March 2023, 17:43:34 UTC	Merge branch 'main' into xtensa-codegen	29 March 2023, 17:43:34 UTC
95b8543	Steven Johnson	29 March 2023, 17:43:12 UTC	Fix PseudoExpr for FuncRef (followup to #7446) (#7458) * Remove references to deprecated variants of fixed-point operators Fix PseudoExpr for FuncRef (followup to #7446) * format * Update pool_generator.cpp	29 March 2023, 17:43:12 UTC
d06498c	Andrew Adams	29 March 2023, 16:46:45 UTC	Remove apparently pointless shift in autoscheduler tutorial (#7455) Fixes #7451	29 March 2023, 16:46:45 UTC
6865960	Steven Johnson	29 March 2023, 03:01:34 UTC	Remove references to deprecated variants of fixed-point operators (#7457)	29 March 2023, 03:01:34 UTC
c7fb421	Volodymyr Kysenko	28 March 2023, 23:06:15 UTC	Add missing store_predicated	28 March 2023, 23:06:15 UTC
7524d06	Volodymyr Kysenko	28 March 2023, 23:00:55 UTC	Fix is_double_native_vector_type	28 March 2023, 23:00:55 UTC
0f23f14	Steven Johnson	28 March 2023, 21:37:13 UTC	Merge branch 'main' into xtensa-codegen	28 March 2023, 21:37:13 UTC
7ab0e8f	Andrew Adams	28 March 2023, 21:23:47 UTC	Skip simd_op_check for disabled targets (#7452) Co-authored-by: Steven Johnson <srj@google.com>	28 March 2023, 21:23:47 UTC
4e2aaab	Steven Johnson	28 March 2023, 18:25:29 UTC	Remove dupe func	28 March 2023, 18:25:29 UTC
7614953	Steven Johnson	28 March 2023, 18:18:38 UTC	Merge branch 'main' into xtensa-codegen	28 March 2023, 18:18:38 UTC
82ae713	Steven Johnson	28 March 2023, 18:16:16 UTC	Fix for top-of-tree LLVM (#7453)	28 March 2023, 18:16:16 UTC
1fb9293	Steven Johnson	28 March 2023, 17:04:13 UTC	Remove defunct Make and .gitignore for the external_code tests, now l… (#7449) Remove defunct Make and .gitignore for the external_code tests, now long gone	28 March 2023, 17:04:13 UTC
6881545	Steven Johnson	28 March 2023, 01:59:23 UTC	Update XtensaOptimize.cpp	28 March 2023, 01:59:23 UTC
8ebc04a	Steven Johnson	28 March 2023, 01:36:58 UTC	WIP	28 March 2023, 01:36:58 UTC
5cb1b30	Steven Johnson	28 March 2023, 01:03:25 UTC	Merge branch 'main' into xtensa-codegen	28 March 2023, 01:03:25 UTC
d57b53e	Steven Johnson	28 March 2023, 01:01:33 UTC	Fix correctness_pytorch for injection from #7443 (#7450)	28 March 2023, 01:01:33 UTC
55edef8	Steven Johnson	27 March 2023, 22:34:16 UTC	Use existing Halide Runtime atomic wrappers everywhere in the runtime (#7429) * Use existing Halide Runtime atomic wrappers everywhere in the runtime We currently have a set of wrappers around the __atomic/__sync primitives used by our threading model; for various reasons, we desire to use the (deprecated) __sync primitives for 32-bit builds instead of the __atomic primitives (see https://github.com/halide/Halide/pull/7427 for some discussion). This PR attempts to use these abstractions everywhere else in our runtime, for consistency, so that 64-bit builds consistent use the __atomic primitives for (e.g.) profiling and tracing too. This meant: - Splitting the wrappers into a new header (runtime_atomics.h), as synchronization_common.h can't be included into arbitrary other files, for valid reasons - Adding wrappers for the necessary primitives - Modifying the code elsewhere in the runtime Where new wrappers were needed, I generally defaulted to assuming that SEQ_CST was the safest memory order to use. Not entirely sure if this is a worthwhile goal or not, but putting this out there for consideration and discussion. * Update runtime_atomics.h	27 March 2023, 22:34:16 UTC
46c48b7	Steven Johnson	27 March 2023, 22:33:32 UTC	Ensure that return values from runtime calls are checked (#7403) * Ensure that return values from runtime calls are checked Fixes a handful of places that should have checked the error-code result from explicit calls to the runtime, but weren't. Also, drive-by change to HashMap::store(), which returned an int but was incapable of returning anything but zero -- changed to just return void. * fixes * trigger buildbots * trigger buildbots	27 March 2023, 22:33:32 UTC
231c88b	Steven Johnson	27 March 2023, 22:28:44 UTC	Cleanups in runtime/device_interface.cpp (#7408) * Cleanups in runtime/device_interface.cpp (Harvested from an experimental CL) - Add `UseModule` helper to make it easier to balance `use_module()` and `release_module()` - add `call_device_interface()` helper to make it easier to call device_interface functions safely - convert the one usage to `halide_abort_if_false()` to `halide_error() + return error` - drive-by changes from `0` to `halide_error_code_success` * trigger buildbots * trigger buildbots * Add halide_debug_assert * trigger buildbots * Update device_interface.cpp	27 March 2023, 22:28:44 UTC
4dc9ce5	Steven Johnson	27 March 2023, 21:47:07 UTC	Mark the Halide pipeline structs as aligned(8) (#7428) When compiling for 32-bit, LLVM assumes that these structs are only 4-aligned (since alignof(uint64_t) == 4 for x86-32), which means some atomic operations on these structs may require library calls. Since these structs are always malloc'ed, and malloc on all our platforms will return an 8-aligned pointer, we can improve this by telling Clang that the struct will always be at least 8-aligned.	27 March 2023, 21:47:07 UTC
d92cec9	Volodymyr Kysenko	27 March 2023, 21:42:18 UTC	Moves OptimizeShuffles pass into separate file (#7447) * Moves OptimizeShuffles pass into separate file * Update comment * Revert changes to src/runtime/mini_webgpu.h	27 March 2023, 21:42:18 UTC
37ac255	Andrew Adams	27 March 2023, 19:34:20 UTC	Promote fixed-point intrinsics out of the Internal namespace (#7446) * Promote fixed-point intrinsics out of the Internal namespace and add deprecated wrappers for them in the Internal namespace so that we don't break any existing code * Pacify clang-tidy * Remove HALIDE_NO_USER_CODE_INLINE	27 March 2023, 19:34:20 UTC
32d1a29	Volodymyr Kysenko	27 March 2023, 18:39:29 UTC	Move common logic into visit_comparison_op	27 March 2023, 18:39:29 UTC
7976d05	Yongqi	27 March 2023, 17:00:23 UTC	Fix bugs in PyTorch codegen. (#7443)	27 March 2023, 17:00:23 UTC
ab5f042	Andrew Adams	25 March 2023, 17:43:11 UTC	Compute comparison masks in narrower types if possible (#7392) * Compute comparison masks in narrower types if possible * Remove reliance on infinite precision int32s * Further elaborate on comment * Lower signed saturating_add and sub to unsigned math The existing lowering was prone to overflow * cast -> reinterpret	25 March 2023, 17:43:11 UTC
2a51f71	Andrew Adams	24 March 2023, 18:13:36 UTC	Use pmaddubsw for non-RDom horizontal widening adds (#7440)	24 March 2023, 18:13:36 UTC
9d06135	Volodymyr Kysenko	23 March 2023, 22:27:51 UTC	Refactor is_native_vector_type and is_double_native_vector_type	23 March 2023, 22:27:51 UTC
9af1278	Volodymyr Kysenko	23 March 2023, 19:09:26 UTC	Report an error in the end of suffix_for_type	23 March 2023, 19:09:26 UTC
c4bd23e	Volodymyr Kysenko	23 March 2023, 19:02:02 UTC	Remove commented code	23 March 2023, 19:02:02 UTC
39eced3	Volodymyr Kysenko	23 March 2023, 17:13:55 UTC	Merge branch 'main' into xtensa-codegen	23 March 2023, 17:13:55 UTC
83adfa3	Misha Gutman	23 March 2023, 16:54:04 UTC	[xtensa] Added uint32 vector mul and fixed i16 -> i32_x2 vector conversion (#7438) * [xtensa] Added uint32 mul and fixed i16 -> i32_x2 vector conversion * [xtensa] improved implementation of uint32xuint32 mul and i16->i32 conversion	23 March 2023, 16:54:04 UTC
4fa913e	Volodymyr Kysenko	23 March 2023, 16:46:58 UTC	Add missing #include <exception> (#7445)	23 March 2023, 16:46:58 UTC
8495be1	Misha Gutman	22 March 2023, 18:14:39 UTC	[xtensa] Returned old free_helper (#7441) * [xtensa] Returned old free_helper due to poor performance of xtensa compiler with the new one * [xtensa] removed unneeded comment from codegen_c	22 March 2023, 18:14:39 UTC
4cb6dba	Andrew Adams	19 March 2023, 20:55:40 UTC	Redo CPU schedule for bilateral grid (#7436)	19 March 2023, 20:55:40 UTC
ec6c234	Steven Johnson	17 March 2023, 17:45:29 UTC	Merge branch 'main' into xtensa-codegen	17 March 2023, 17:45:29 UTC
badf486	Steven Johnson	17 March 2023, 17:41:48 UTC	Disable performance_boundary_conditions under WebGPU pending #7420 (#7435)	17 March 2023, 17:41:48 UTC
658ceba	Steven Johnson	16 March 2023, 23:25:52 UTC	Move large Xtensa-codegen source into external template files (#7430) * Move large Xtensa-codegen source into external template files * Update CodeGen_Xtensa_vectors.template.cpp * Fix sign mismatch * Update XtensaOptimize.cpp	16 March 2023, 23:25:52 UTC
bf1133a	Volodymyr Kysenko	16 March 2023, 22:24:23 UTC	Make sure that count for load_variable is positive	16 March 2023, 22:24:23 UTC
5a025a7	Volodymyr Kysenko	16 March 2023, 22:20:57 UTC	Fix formatting	16 March 2023, 22:20:57 UTC
37329e1	Volodymyr Kysenko	16 March 2023, 22:20:01 UTC	Limit halide_xtensa_extract__of_ to native vectors	16 March 2023, 22:20:01 UTC
d31dcb7	Volodymyr Kysenko	16 March 2023, 22:18:21 UTC	Add halide_xtensa_extract_*_of_4_u16	16 March 2023, 22:18:21 UTC
d7153e3	Steven Johnson	16 March 2023, 21:56:36 UTC	Revert all apps/ to current top-of-tree status	16 March 2023, 21:56:36 UTC
d598e54	Steven Johnson	16 March 2023, 21:56:10 UTC	Merge branch 'main' into xtensa-codegen	16 March 2023, 21:56:10 UTC
643b2f1	Steven Johnson	16 March 2023, 21:42:34 UTC	Modify runtime calls to always return a valid halide_error_code_t value (#7404) * Modify runtime calls to always return a valid halide_error_code_t value Currently, the return values from our runtime code are a mishmash -- there's lots of code that returns any random nonzero value to indicate an error. This isn't wrong per se, but it's not clean, and it's desirable that the return values are predictable. This PR doesn't change the call signature of any (public) Halide Runtime functions, but modifies the internal logic so that all return values are valid values of `enum halide_error_code_t`. Generally, there should be minimal change to the code otherwise, although I did leave in a few drive-by changes that I couldn't resist (e.g., better error-checking when dynamically loading symbols). My long-term goal here is to eventually propose changing the signature of runtime functions that return errors to actually return `enum halide_error_code_t`; as you might imagine, making that transition might be controversial for a number of reasons. This PR is intended to be a way to make such a future transition easier to reason about, while arguably improving the code quality of the runtime slightly. * tidy * Update opencl.cpp * trigger buildbots * trigger buildbots * Fix merge mistake * Update gpu_context_common.h * Update cuda.cpp * Werror * status_ * if-with-initializer format * Update cuda.cpp * Update opencl.cpp * Update cuda.cpp * Update device_interface.h * Update hexagon_cache_allocator.cpp * Update printer.h * remove prefixes * "device field is already non-zero" * Update opencl.cpp * Update cuda.cpp * Fix error spacing * trigger buildbots	16 March 2023, 21:42:34 UTC
88b3ef8	Steven Johnson	16 March 2023, 21:35:52 UTC	Split WebGPU runtime into two variants (#7248 workaround) (#7419) * Split WebGPU runtime into two variants (#7248 workaround) Halide promises that you can crosscompile to any supported target from a 'stock' build of libHalide. Unfortunately, the initial landing of WebGPU support breaks that promise: we compile the webgpu runtime support (webgpu.cpp) with code that is predicated on `WITH_DAWN_NATIVE` (for Dawn vs Emscripten, respectively). This means that if you build Halide with `WITH_DAWN_NATIVE` defined, you can only target Dawn with that build of Halide; similarly, if you build with `WITH_DAWN_NATIVE` not-defined, you can only target Emscripten. (Trying to use the 'wrong' version will produce link-time errors.) For people who build everything from source, this isn't a big deal, but for people who just pull binary builds, this is a big problem. This PR proposes a temporary workaround until the API discrepancies are resolved: - Compile the existing webgpu.cpp runtime both ways - in LLVM_Runtime_Linker.cpp, select the correct variant based on whether the Target is targeting wasm or not - Profit! This is a rather ugly hack, but it should hopefully be (relatively) temporary. * A few more fixes * Update HalideGeneratorHelpers.cmake * Update interpreter.cpp * Update interpreter.cpp	16 March 2023, 21:35:52 UTC
cc74ee8	Steven Johnson	16 March 2023, 18:22:23 UTC	Move long boilerplate C/C++ code into template files (#7426) * Move long boilerplate C/C++ code into template files Codegen_C has a couple of long strings with boilerplate code that is conditionally emitted; at least one of these is too long for a single string literal under MSVC. Let's try moving these into standalone files instead; this may make it easier to use conventional tooling on the C++ code, and make Codegen_C easier to read and think about. (Note that the not-yet-landed Xtensa branch should also use this approach, if we decide this approach is good, since it has even more such code.) TODO: probably would be good to augment `binary2cpp` to allow option comments in the source file that are stripped in the output file (e.g. "This file is used in CodeGen_C.cpp for blah blah blah, look out for blah") * Remove detritus	16 March 2023, 18:22:23 UTC
afda48d	Steven Johnson	15 March 2023, 20:43:43 UTC	Merge branch 'main' into xtensa-codegen	15 March 2023, 20:43:43 UTC
50f8c85	Steven Johnson	15 March 2023, 17:16:01 UTC	Disable correctness_atomics on Windows with Cuda, alas (#7423) (#7424)	15 March 2023, 17:16:01 UTC
ae59b91	Steven Johnson	14 March 2023, 20:56:33 UTC	Ignore assertions inside WebGPU kernels (#7418) This is the approach that (e.g.) the OpenCL backend takes to assertions inside kernel code.	14 March 2023, 20:56:33 UTC
e966163	Steven Johnson	14 March 2023, 20:33:34 UTC	Log name of failing function for !function_takes_user_context (#7417)	14 March 2023, 20:33:34 UTC
05fa61a	Steven Johnson	14 March 2023, 18:40:40 UTC	Fix for top-of-tree LLVM (#7416)	14 March 2023, 18:40:40 UTC
9199849	Steven Johnson	14 March 2023, 18:01:46 UTC	A few minor cleanups in WebGPU backend (#7413) * A few minor cleanups in WebGPU backend Mostly just using using halide_error_code_t values everywhere. * trigger buildbots * Update webgpu.cpp	14 March 2023, 18:01:46 UTC
b63139e	James Price	14 March 2023, 17:01:25 UTC	Update mini_webgpu.h with latest changes from Dawn (#7415)	14 March 2023, 17:01:25 UTC
d383cb9	James Price	14 March 2023, 16:58:12 UTC	Fix null device crash during WebGPU initialization (#7414) If RequestDevice fails, make sure we exit initialization early instead of trying to create a staging buffer with a nullptr device.	14 March 2023, 16:58:12 UTC
6d19cec	Steven Johnson	13 March 2023, 22:36:38 UTC	printer.h should include HalideRuntime.h (#7412) printer.h uses uint64_t, so it needs to include something that ensures that type is defined. HalideRuntime.h is probably the right choice (since it always transitively includes runtime_internal.h when compiling runtime.	13 March 2023, 22:36:38 UTC
078465c	James Price	13 March 2023, 20:30:38 UTC	Add initial support for WebGPU (#6492) * [WebGPU] Add runtime stubs and codegen skeleton All runtime functions are currently unimplemented, and the shader codegen just emits an empty compute shader. * [WebGPU] Implement lazy device initialization Sychronize device access using a WebGpuContext object, as is done in the other GPU runtimes. Device initialization is asynchronous, so we rely on Emscripten's Asyncify mechanism to wait for it to complete. * [WebGPU] Implement device release Release the device and adapter. * [WebGPU] Add scoped error handling mechanism WebGPU uses asynchronous callbacks for error handling, so we need to spin-lock while waiting for them to fire in order to determine success/failure status for any WebGPU APIs that we call. * [WebGPU] Implement device malloc/free Create a WGPUBuffer with Storage\|CopyDst\|CopySrc usages. We'll need a staging buffer to perform host<->device transfers, as a buffer used as a storage buffer cannot be mapped. * [WebGPU] Implement basic host<->device copies Use a staging buffer to copy data from the device, since we cannot map a buffer that is used a storage buffer. This logic will need an overhaul in order to support buffers that represent sub-regions of larger buffers. * [WebGPU] Implement halide_webgpu_device_sync Just wait for all submitted commands to complete. * [WebGPU] Implement shader compilation * [WebGPU] Implement core of WGSL codegen This implements enough of the WGSL codegen required to generate code for a 32-bit version of the apps/blur example. Buffer arguments are emitted as read_write storage buffers. Non-buffer arguments are collected into a structure and generated as a single uniform buffer. Workgroup sizes are currently required to be constant. This can relaxed in the future by using pipeline-overridable constants. * [WebGPU] Implement Cast node WGSL uses different syntax for type casts. * [WebGPU] Implement the float_from_bits() intrinsic * [WebGPU] Implement run function Create a compute pipeline, create a uniform buffer for non-buffer arguments, and dispatch a compute command to the queue. Does not handle workgroup storage yet. * [WebGPU] Move queue into WgpuContext class The queue handle received from wgpuDeviceGetQueue needs to be released, so it makes more sense to automatically get/release this handle in the WgpuContext constructor/destructor. * [WebGPU] Add support for JIT This requires a native implementation of WebGPU (e.g. Dawn or wgpu). * [WebGPU] Enable the gpu_only AOT generator test Link generator tests against the native WebGPU library if specified. * [WebGPU] Add support for targeting dawn-native We have to make this a compile-time switch for now, as the supported APIs currently differ between Dawn and Emscripten. We should be able to remove all of these conditionals when the native API stabilizes. * [WebGPU] Add support for AOT tests when using WASM Use Dawn's node bindings to run these tests, by pre-pending some initialization code to the script that nodejs will invoke. * [WebGPU] Print explicit types for let declarations This is makes the generated code a little more human-readable. * [WebGPU] Address first round of review comments * [WebGPU] Add copy of webgpu.h This is a verbatim copy of the Emscripten version, just without the include directives. * [WebGPU] Add comment about ASYNCIFY requirement * [WebGPU] Remove -Wno-atomic-alignment Tweak atomic primitive usage to avoid the need for this. * pacify clang-tidy * Fix more clang-tidy errors * Only use ASYNCIFY for tests when targeting WebGPU * Fix even more clang-tidy errors * [WebGPU] Add basic support to Makefile * [WebGPU] Don't wrap buffers inside structures This requirement has been removed from the WGSL specification, and the corresponding implementation changes have now landed in Dawn. * [WebGPU] Fix debug message tag * [WebGPU] Update WGPUErrorFilter enum in header * [WebGPU] Update WGSL attribute syntax The WGSL specification recently changed attribute syntax from [[attribute]] to @attribute. * [WebGPU] Add README_webgpu.md Explains how to configure Halide to target WebGPU for both Emscripten and Dawn native. Also lists several known limitations. * [WebGPU] Move native WebGPU library CMake logic This is the correct place, otherwise the link line order is wrong. * [WebGPU] Implement WGSL codegen for serial loops * [WebGPU] Implement WGSL codegen for Allocate Use array types for stack allocations. Leave GPU shared memory unimplemented for now. * [WebGPU] Implement WGSL codegen for Select Use the WGSL select builtin function, which supports bool vector conditions for component-wise selection too. * [WebGPU] Mark 64-bit types as unsupported * [WebGPU] Implement device_and_host_{malloc,free} Just use the default implementations. * [WebGPU] Fixed WGSL codegen for boolean vectors * [WebGPU] Implement f32 math intrinsics * [WebGPU] Implement inverse and inverse sqrt * [WebGPU] Fixup some errors in WGSL codegen * [WebGPU] Implement logical and/or for bool vectors * [WebGPU] Implement WGSL codegen for Broadcast node * [WebGPU] Implement WGSL codegen for Ramp node * [WebGPU] Emulate 8- and 16-bit integers Use atomics to emulate storage, and widen the values to 32-bits when operating on them. * [WebGPU] Avoid buffer name collisions Buffers are declared as global variables in WGSL, so prefix them with the kernel name to avoid collisions. * [WebGPU] Fix divide-by-power-of-two WGSL requires the RHS of a shift to be unsigned. * [WebGPU] Implement codegen for gpu_thread_barrier * [WebGPU] Implement WGSL codegen for Evaluate This fixes an issue with the halide_unused() intrinsic. * [WebGPU] Add support for shared memory This currently only supports statically-sized shared memory allocations; dynamically-sized allocations require using pipeline-overridable constants which are not yet implemented in Dawn. * [WebGPU] Fix 8/16-bit load/store emulation Loads and stores that implicitly casted to/from 32-bits were casting when they should not have been, and not casting when they should. * Use generic 64-bit support query in gpu_mixed_shared_mem_types This is more robust than checking for specific APIs. * [WebGPU] Fix object cleanup during teardown * [WebGPU] Do not re-allocate device buffers * [WebGPU] Mark maximum vector width as 4 in tests * [WebGPU] Add functions to object lifetime tracker * [WebGPU] Override FloatImm handling We need to explicitly suffix the integer literal with a `u` to make it unsigned. * [WebGPU] Scalarize predicated loads * [WebGPU] Implement if_then_else intrinsic This is generated when scalarizing predicated loads. * [WebGPU] Enable gpu_free_sync test * [WebGPU] Implement print_reinterpret Use WGSL's bitcast<> operator. * [WebGPU] Implement print_extern_call This is just a regular function call in WGSL. * Add missing include and namespace for isnan/isinf * [WebGPU] Avoid short-circuiting operators These can cause issues with WGSL's uniformity analysis. * [WebGPU] Use commas for struct member seperators * [WebGPU] Update API headers and usages Two API functions were renamed. * [WebGPU] Use CodeGen_GPU_C base class for codegen Introduce a new enum value for WGSL's vector syntax. * [WebGPU] Add warning for emulated narrow integers * [WebGPU] Update README with latest status * [WebGPU] Add support for non-contiguous copies Also adds support for buffer cropping/slicing. * [WebGPU] Fix clang-tidy error * [WebGPU] Use atomicCmpXchg for 8/16-bit emulation Halide assumes that write-write data-races are benign when both threads are writing the same value, but this is not true when those writes are implemented using atomicXor. We need to use atomicCompareExchangeWeak in a loop to perform this emulation instead. Unfortunately this makes things even slower :-( * [WebGPU] Support non-32-bit parameter types Expand them to 32-bits in the shader and the runtime. * [WebGPU] Fix mixed types in buffers The atomic emulation of narrow types shouldn't use the allocation type, since the same buffer may be re-used for multiple types. This means we also sometimes need to perform 32-bit accesses using atomics as well. Instead of using the buffer allocation type, we now pre-traverse the IR to check for accesses that will require emulation, and mark the corresponding buffer as such. * [WebGPU] Show validation errors for failed maps * [WebGPU] Round up buffer offsets and sizes The WebGPU API requires that these are multiples of 4. * [WebGPU] Update implementation status in README * [WebGPU] Replace @stage(compute) with @compute * [WebGPU] Polyfill the pow_f32 intrinsic The pow builtin in WGSL only has the correct semantics for x>0, so we need to emulate the behavior for the other cases. * [WebGPU] Skip the gpu_allocation_cache test Allocation cache is not yet implemented for the WebGPU backend, and so this test takes forever. * [WebGPU] Use builtins for inverse hyperbolics * [WebGPU] Map rint() to round() WGSL's round() builtin has round-to-nearest-even semantics. * [WebGPU] Set device lost callback This provides more information about events that cause the device to become inaccessible. * [WebGPU] Use i32 for bool parameters The bool type cannot be used in the uniform storage class in WGSL. * [WebGPU] Raise limits for buffer size and workgroup storage * [WebGPU] Update mini_webgpu.h Disable clang-format for this file. * [WebGPU] Add support for dynamic workgroups This is implemented using pipeline-overridable constants. * [WebGPU] Avoid using 'new' as an identifier This is a reserved keyword in WGSL. * [WebGPU] Do not merge workgroup allocations Since we promote 8- and 16-bit integers to 32-bit in workgroup memory, merging multiple different GPUShared allocations into a single one can cause different regions to overlap and data races ensue. * [WebGPU] Fix Emscripten support * [WebGPU] Use const for integer immediates This avoids some MSL compiler ICEs with the code generated by Dawn, and also makes it easier for the WGSL compiler to constant-fold these values. * [WebGPU] Squelch clang-tidy error * [WebGPU] Note Dawn's dependency on go in README * [WebGPU] Add links to Emscripten vs Dawn issue * [WebGPU] Show error finding WebGPU library fails * [WebGPU] Add link to issue about Windows support * [WebGPU] Rename roundUpToMultipleOf4 ...and use it in one place where we weren't. * [WebGPU] Add links to wrap_native issue * [WebGPU] Use debug_assert for some runtime errors * [WebGPU] Stop using designated initializers Add -Wc++20-designator flag to runtime build to prevent this from happening again. * [WebGPU] Update mini_webgpu.h * [WebGPU] Fix a validation issue with ToT Dawn A bitcast that produces a NaN is invalid in WGSL, so implement the `nan()` intrinsic via `float_from_bits()`. * Update README_webgpu.md * Add is_finite_f32, is_inf_f32, is_nan_f32 * Update isinf, isnan, isfinite; add inf_f32, neg_inf_f32 * correctness_isnan should be skipped for WebGPU * Update isnan.cpp * Update atomics_gpu_8_bit.cpp * Fix python_tutorial_lesson_10_aot_compilation_run * Partial fix for generator_aot_acquire_release() This adds the necessary (horrible hackery) to bring the WebGPU case in line with the other backends... but the test still fails, apparently due to the same copy-to-host bug we suspect for generator_aot_gpu_only.Pushing this anyway because it was annoying to write :-) * [WebGPU] Fix AOT test build for non-WASM * [WebGPU] Move README to root * [WebGPU] Address review comments * wip * Update CMakeLists.txt * Work-in-progress for generator_aot_gpu_multi_context_threaded * [WebGPU] Use a per-context staging buffer This fixes the generator_aot_gpu_multi_context_threaded tests. * [WebGPU] Fix clang-format issue * [WebGPU] Move staging buffer creation We shouldn't be doing this inside a callback as we use async error checking routines. * Revert "Merge branch 'webgpu' of https://github.com/jrprice/Halide into pr/6492" This reverts commit 9d79ac60153e12ed51ea9f1a91a544f3fc7ccc78, reversing changes made to 0673e6754e53e62cf6029efdb94a57447db2f03a. * Skip correctness_multi_way_select on x86 Macs (for now) * clang-format * [WebGPU] Recommit 742db3feb888394ff6529c357da3f32ae9286ea4	13 March 2023, 20:30:38 UTC
613655d	Steven Johnson	13 March 2023, 17:35:57 UTC	Add is_stack_private_to_thread()	13 March 2023, 17:35:57 UTC
08d3c37	Steven Johnson	13 March 2023, 17:31:53 UTC	Merge branch 'main' into xtensa-codegen	13 March 2023, 17:31:53 UTC
78097a7	Steven Johnson	13 March 2023, 17:01:35 UTC	Dont use negative values for exit() (#7405) * Dont use negative values for exit() A program that terminates via `exit(-1)` leaves `$? = 255` (per the exit(3) manpage), as the value is chopped with `& 0377`. This makes it hard to use `git bisect` to track down bugs, as considers any exit code > 127 (or = 15, oddly enough) equivalent to an `abort()` and terminates the bisect. IMHO this is perverse behavior on the part of `git bisect`, but it is what it is, so let's revamp our tests to avoid calling `exit(-1);`. * trigger buildbots	13 March 2023, 17:01:35 UTC
822f5da	Steven Johnson	13 March 2023, 17:00:50 UTC	Don't return negative values from main() (#7406) * Don't return negative values from main() A program that returns `-1` from `main()` leaves $? = 255 (per the exit(3) manpage), as the value is chopped with & 0377. This makes it hard to use git bisect to track down bugs, as considers any exit code > 127 (or = 15, oddly enough) equivalent to an abort() and terminates the bisect. IMHO this is perverse behavior on the part of git bisect, but it is what it is, so let's revamp our tests to avoid returning `-1` from main to indicate failure, and return `1` instead. * trigger buildbots * validate_gpu_object_lifetime() handling	13 March 2023, 17:00:50 UTC
f76bcc8	Steven Johnson	12 March 2023, 17:49:46 UTC	Cleanup of src/runtime/internal and test/runtime (#7399) * Cleanup of src/runtime/internal and test/runtime - Don't include .cpp files. - Don't use header-only "libraries" that rely on include order or being included only once (to wit: test/runtime/common.h -> common.cpp) - All files should explicitly #include what they need, even if they think it's already included (ie, order of include files should not matter) - In src/runtime/internal, change all `halide_abort_if_false` -> `halide_debug_assert` - in test/runtime, add HALIDE_CHECK to common.h and use it for tests instead of `halide_abort_if_false` * 32-bit fixes	12 March 2023, 17:49:46 UTC
0beb081	Ttayu	11 March 2023, 20:45:17 UTC	docs: Use halide.imageio instead of imageio (#7409) * docs: Use halide.imageio instead of imageio * trigger buildbots --------- Co-authored-by: Steven Johnson <srj@google.com>	11 March 2023, 20:45:17 UTC
4bf660e	Andrew Adams	11 March 2023, 00:53:44 UTC	Fix overflow in x86 absd lowering (#7407) * Fix overflow in x86 absd lowering * Fix default lowering too	11 March 2023, 00:53:44 UTC
ead83e6	Ttayu	10 March 2023, 18:23:21 UTC	Docs/update readme python (#7402) * docs: Update simple usage in README_python.md * docs: Add imports	10 March 2023, 18:23:21 UTC
b02de4b	Steven Johnson	09 March 2023, 17:46:43 UTC	Update all use of actions/checkout to v3 (#7400)	09 March 2023, 17:46:43 UTC
6a6c842	Steven Johnson	07 March 2023, 23:56:03 UTC	Backport changes to CodegenC from the xtensa branch (#7396) Backported with some trivial style changes (virtual method instead of flag, since this should never changes for a given instance)	07 March 2023, 23:56:03 UTC
a2df5bd	Steven Johnson	07 March 2023, 19:29:08 UTC	Attempt to give proper labels to pip dev builds (v2) (#7395) A much-simpler alternative to #7394	07 March 2023, 19:29:08 UTC
3a0b891	Steven Johnson	06 March 2023, 21:43:48 UTC	Merge branch 'main' into xtensa-codegen	06 March 2023, 21:43:48 UTC
4a80251	Andrew Adams	03 March 2023, 20:57:47 UTC	Destringify CanonicalizeGPUVars (#7386) * Destringify CanonicalizeGPUVars This new implementation takes the high-water marks of each type of GPU loop, instead of filtering using the prefix of the loop name. * Better comments * fix typo	03 March 2023, 20:57:47 UTC
aa8fcad	Terry Heo	03 March 2023, 20:56:52 UTC	hannk: Provide weak symbol functions to use op profiling (#7388) * hannk: Provide weak symbol functions to use op profiling You can add your own profiler with strong symbols. * hannk: Guard profiler feature with HANNK_PROFILER	03 March 2023, 20:56:52 UTC
5e81e91	Volodymyr Kysenko	03 March 2023, 18:56:46 UTC	Remove incorrect halide_xtensa_sat_narrow_u16	03 March 2023, 18:56:46 UTC
c855273	Volodymyr Kysenko	03 March 2023, 18:55:38 UTC	Merge branch 'main' into xtensa-codegen	03 March 2023, 18:55:38 UTC
387a19c	Xuanda Yang	03 March 2023, 01:10:50 UTC	Make README consistent on supported LLVM versions (#7390) Make README consistent with https://github.com/halide/Halide/pull/7093	03 March 2023, 01:10:50 UTC
122b5b6	Terry Heo	01 March 2023, 22:22:20 UTC	hannk: Add device_sync method hannk::Tensor class (#7387) hannk: Add devic_sync method hannk::Tensor class This method is useful to debug performance with synchronous execution.	01 March 2023, 22:22:20 UTC
303a90c	Steven Johnson	01 March 2023, 17:21:48 UTC	Minor cleanup of GPUCompilationCache (#7376) * Minor cleanup of GPUCompilationCache While tracking down an apparently-unrelated threading bug in the webgpu backend, I made some tweaks to this code that I think are worth keeping. The main one of importance is that `release_hold()` and `release_context()` really should acquire the mutex -- they weren't before -- so now all public methods are properly mutexed. The other changes are mostly cosmetic: - Moved helper methods to be private rather than public - Changed the id value size to be `uintptr_t` rather than `uint32_t`; the space allocated for them is sizeof(void). (Not sure this moves the needle but it felt right.) - Removed unused ctor for CachedCompilation Also, drive-by change in printer.h to capture some logging improvements. Fix deadlock release_all() and release_context() were contending for the mutex	01 March 2023, 17:21:48 UTC
b48f78c	Volodymyr Kysenko	01 March 2023, 00:42:59 UTC	Merge branch 'xtensa-codegen' of https://github.com/halide/Halide into xtensa-codegen	01 March 2023, 00:42:59 UTC
040d773	Volodymyr Kysenko	01 March 2023, 00:42:07 UTC	Merge branch 'main' into xtensa-codegen	01 March 2023, 00:42:07 UTC
fba892a	Volodymyr Kysenko	01 March 2023, 00:29:42 UTC	Specify a full type for llvm::IRBuilder (#7384) Specify full type for llvm::IRBuilder	01 March 2023, 00:29:42 UTC
c16b5e2	Mikhail Usvyatsov	28 February 2023, 17:31:38 UTC	[xtensa] removed tests that are failing to compile (#7362) * [xtensa] removed tests that are failing to compile due to poor support of int48 in scalarised regime * [xtensa] removed runtime generation for xtensa tests, as it is not used	28 February 2023, 17:31:38 UTC
5c02ae2	Adrian Lebioda	28 February 2023, 02:14:41 UTC	Bounds visitor for div was missing single_point mutated case (#7379) * Bounds visitor for div was missing single_point mutated case Signed-off-by: Adrian Lebioda <adrian.lebioda@hexagon.com> * Add test Signed-off-by: Adrian Lebioda <adrian.lebioda@hexagon.com> --------- Signed-off-by: Adrian Lebioda <adrian.lebioda@hexagon.com> Co-authored-by: Adrian Lebioda <adrian.lebioda@hexagon.com>	28 February 2023, 02:14:41 UTC
bdba694	Steven Johnson	28 February 2023, 02:14:24 UTC	Add Callable default ctor + `defined()` method (#7380) * Add Callable default ctor + `defined()` method This allows it to behave like * Add user_assert + test	28 February 2023, 02:14:24 UTC
2fb3b62	Volodymyr Kysenko	28 February 2023, 01:21:13 UTC	Better handling of u1 to i16 cast & clean-up	28 February 2023, 01:21:13 UTC
0091fd9	Volodymyr Kysenko	28 February 2023, 00:35:02 UTC	[xtensa] Limit the number of allowed DMA channels + allocate a separate channel for the output transactions (#7381) * Limit the number of allowed DMA channels + allocate a separate channel for the output transactions * Fix formatting	28 February 2023, 00:35:02 UTC
38057b8	Volodymyr Kysenko	27 February 2023, 23:28:26 UTC	Merge branch 'main' into xtensa-codegen	27 February 2023, 23:28:26 UTC
c42a5b2	Andrew Adams	27 February 2023, 16:46:41 UTC	Remove a gross hack from gpu_only_aottest (#7378) * Remove a gross hack from gpu_only_aottest Also add metal support * Add missing include	27 February 2023, 16:46:41 UTC
09400f6	Andrew Adams	25 February 2023, 17:16:32 UTC	Bounds visitors for min/max were missing single_point mutated case (#7377) * Bounds visitors for min/max were missing single_point mutated case Partially fixes #7374 * Add test	25 February 2023, 17:16:32 UTC
b6a18b8	Steven Johnson	23 February 2023, 19:36:32 UTC	Update WABT to 1.0.32; Increase stack size for WASM AOT apps (#7373)	23 February 2023, 19:36:32 UTC
144c1a4	Steven Johnson	23 February 2023, 17:59:39 UTC	correctness_round should use Target::supports_type() (#7372) This gives it proper support for new GPU backends	23 February 2023, 17:59:39 UTC
b17806d	Steven Johnson	23 February 2023, 17:59:13 UTC	Use HalideFreeHelper for the register_destructor (#7371) Slightly cleaner code. Also, drive-by change of NULL -> nullptr	23 February 2023, 17:59:13 UTC
629da52	Steven Johnson	23 February 2023, 17:58:37 UTC	Use single-char form of `unique_name` for semaphores (#7370) The multi-char form of `unique_name` will append a `$` to the identifier, e.g. `sema$4`. This isn't really legal for a C/C++ identifier.	23 February 2023, 17:58:37 UTC
ad6c84a	Steven Johnson	22 February 2023, 23:45:59 UTC	[xtensa] Clean up HalideFreeHelper code (#7368) * Clean up HalideFreeHelper code - Revise HalideFreeHelper to be a templated struct, to save the unnecessary stack storage for the function - Add emit_halide_free_helper() method to consolidate usage - Add a nullptr check to the `stack_is_core_private`, per comment - Fix some minor whitespace issues (If this PR is accepted here, I will of course backport the non-xtensa portions to main) * Update CodeGen_C.cpp	22 February 2023, 23:45:59 UTC
29f3f42	Steven Johnson	22 February 2023, 23:44:29 UTC	Merge branch 'main' into xtensa-codegen	22 February 2023, 23:44:29 UTC
386a2d1	Steven Johnson	22 February 2023, 23:39:25 UTC	Clean Up HalideFreeHelper code (main) (#7369) * Clean Up HalideFreeHelper code (main) - Revise HalideFreeHelper to be a templated struct, to save the unnecessary stack storage for the function - Add emit_halide_free_helper() method to consolidate usage * Update CodeGen_C.cpp	22 February 2023, 23:39:25 UTC
e69fa42	Steven Johnson	22 February 2023, 18:37:34 UTC	Merge branch 'main' into xtensa-codegen	22 February 2023, 18:37:34 UTC
3246844	Andrew Adams	21 February 2023, 21:18:30 UTC	Use a std::unique_ptr for the IR Builder (#7356) * Use a std::unique_ptr for the IR Builder instead of a raw owning pointer * Use make_unique	21 February 2023, 21:18:30 UTC

Newer
Older