https://github.com/halide/Halide

sort by:
Revision Author Date Message Commit Date
61700af Remove unused 'external_buffers' member in Codegen_Xtensa 31 March 2023, 16:48:29 UTC
9f58631 Minor improvements to Xtensa codegen (#7463) * Minor improvements to Xtensa codegen Harvesting some minor code improvements from an experiment that didn't get finished: - Move all the `is_native_xtensa_vector()` and related code from XtensaOptimize into CodegenXtensa, which is the only caller - Make these all member methods so they can just use the `get_target()` member, which was the only target they ever wanted - Simplify the implementation of them a bit, especially `is_native_xtensa_vector()`, which should still be equivalent but terser and smaller codegen - Use `halide_type_t` instead of `Halide::Type` in many places, since the former is smaller (exactly 32 bits) and is easier to hash; similarly, use `std::unordered_set` for the lookups we need. (Note that Halide::Type implicitly converts to halide_type_t at no cost.) * Update mini_webgpu.h 31 March 2023, 16:37:07 UTC
48903c4 [xtensa] fixed few correctness issues in codegen + added new correctness tests (#7444) * [xtensa] fixed few correctness issues * [xtensa] Fixed white space 30 March 2023, 16:50:27 UTC
33dee7f Merge branch 'main' into xtensa-codegen 29 March 2023, 21:19:55 UTC
b6b15ac Alternative approach to deprecating internal fixed-point intrinsics (#7461) 29 March 2023, 21:17:06 UTC
e0a1e65 Merge branch 'main' into xtensa-codegen 29 March 2023, 17:43:34 UTC
95b8543 Fix PseudoExpr for FuncRef (followup to #7446) (#7458) * Remove references to deprecated variants of fixed-point operators Fix PseudoExpr for FuncRef (followup to #7446) * format * Update pool_generator.cpp 29 March 2023, 17:43:12 UTC
d06498c Remove apparently pointless shift in autoscheduler tutorial (#7455) Fixes #7451 29 March 2023, 16:46:45 UTC
6865960 Remove references to deprecated variants of fixed-point operators (#7457) 29 March 2023, 03:01:34 UTC
c7fb421 Add missing store_predicated 28 March 2023, 23:06:15 UTC
7524d06 Fix is_double_native_vector_type 28 March 2023, 23:00:55 UTC
0f23f14 Merge branch 'main' into xtensa-codegen 28 March 2023, 21:37:13 UTC
7ab0e8f Skip simd_op_check for disabled targets (#7452) Co-authored-by: Steven Johnson <srj@google.com> 28 March 2023, 21:23:47 UTC
4e2aaab Remove dupe func 28 March 2023, 18:25:29 UTC
7614953 Merge branch 'main' into xtensa-codegen 28 March 2023, 18:18:38 UTC
82ae713 Fix for top-of-tree LLVM (#7453) 28 March 2023, 18:16:16 UTC
1fb9293 Remove defunct Make and .gitignore for the external_code tests, now l… (#7449) Remove defunct Make and .gitignore for the external_code tests, now long gone 28 March 2023, 17:04:13 UTC
6881545 Update XtensaOptimize.cpp 28 March 2023, 01:59:23 UTC
8ebc04a WIP 28 March 2023, 01:36:58 UTC
5cb1b30 Merge branch 'main' into xtensa-codegen 28 March 2023, 01:03:25 UTC
d57b53e Fix correctness_pytorch for injection from #7443 (#7450) 28 March 2023, 01:01:33 UTC
55edef8 Use existing Halide Runtime atomic wrappers everywhere in the runtime (#7429) * Use existing Halide Runtime atomic wrappers everywhere in the runtime We currently have a set of wrappers around the __atomic/__sync primitives used by our threading model; for various reasons, we desire to use the (deprecated) __sync primitives for 32-bit builds instead of the __atomic primitives (see https://github.com/halide/Halide/pull/7427 for some discussion). This PR attempts to use these abstractions everywhere else in our runtime, for consistency, so that 64-bit builds consistent use the __atomic primitives for (e.g.) profiling and tracing too. This meant: - Splitting the wrappers into a new header (runtime_atomics.h), as synchronization_common.h can't be included into arbitrary other files, for valid reasons - Adding wrappers for the necessary primitives - Modifying the code elsewhere in the runtime Where new wrappers were needed, I generally defaulted to assuming that SEQ_CST was the safest memory order to use. Not entirely sure if this is a worthwhile goal or not, but putting this out there for consideration and discussion. * Update runtime_atomics.h 27 March 2023, 22:34:16 UTC
46c48b7 Ensure that return values from runtime calls are checked (#7403) * Ensure that return values from runtime calls are checked Fixes a handful of places that should have checked the error-code result from explicit calls to the runtime, but weren't. Also, drive-by change to HashMap::store(), which returned an int but was incapable of returning anything but zero -- changed to just return void. * fixes * trigger buildbots * trigger buildbots 27 March 2023, 22:33:32 UTC
231c88b Cleanups in runtime/device_interface.cpp (#7408) * Cleanups in runtime/device_interface.cpp (Harvested from an experimental CL) - Add `UseModule` helper to make it easier to balance `use_module()` and `release_module()` - add `call_device_interface()` helper to make it easier to call device_interface functions safely - convert the one usage to `halide_abort_if_false()` to `halide_error() + return error` - drive-by changes from `0` to `halide_error_code_success` * trigger buildbots * trigger buildbots * Add halide_debug_assert * trigger buildbots * Update device_interface.cpp 27 March 2023, 22:28:44 UTC
4dc9ce5 Mark the Halide pipeline structs as aligned(8) (#7428) When compiling for 32-bit, LLVM assumes that these structs are only 4-aligned (since alignof(uint64_t) == 4 for x86-32), which means some atomic operations on these structs may require library calls. Since these structs are always malloc'ed, and malloc on all our platforms will return an 8-aligned pointer, we can improve this by telling Clang that the struct will always be at least 8-aligned. 27 March 2023, 21:47:07 UTC
d92cec9 Moves OptimizeShuffles pass into separate file (#7447) * Moves OptimizeShuffles pass into separate file * Update comment * Revert changes to src/runtime/mini_webgpu.h 27 March 2023, 21:42:18 UTC
37ac255 Promote fixed-point intrinsics out of the Internal namespace (#7446) * Promote fixed-point intrinsics out of the Internal namespace and add deprecated wrappers for them in the Internal namespace so that we don't break any existing code * Pacify clang-tidy * Remove HALIDE_NO_USER_CODE_INLINE 27 March 2023, 19:34:20 UTC
32d1a29 Move common logic into visit_comparison_op 27 March 2023, 18:39:29 UTC
7976d05 Fix bugs in PyTorch codegen. (#7443) 27 March 2023, 17:00:23 UTC
ab5f042 Compute comparison masks in narrower types if possible (#7392) * Compute comparison masks in narrower types if possible * Remove reliance on infinite precision int32s * Further elaborate on comment * Lower signed saturating_add and sub to unsigned math The existing lowering was prone to overflow * cast -> reinterpret 25 March 2023, 17:43:11 UTC
2a51f71 Use pmaddubsw for non-RDom horizontal widening adds (#7440) 24 March 2023, 18:13:36 UTC
9d06135 Refactor is_native_vector_type and is_double_native_vector_type 23 March 2023, 22:27:51 UTC
9af1278 Report an error in the end of suffix_for_type 23 March 2023, 19:09:26 UTC
c4bd23e Remove commented code 23 March 2023, 19:02:02 UTC
39eced3 Merge branch 'main' into xtensa-codegen 23 March 2023, 17:13:55 UTC
83adfa3 [xtensa] Added uint32 vector mul and fixed i16 -> i32_x2 vector conversion (#7438) * [xtensa] Added uint32 mul and fixed i16 -> i32_x2 vector conversion * [xtensa] improved implementation of uint32xuint32 mul and i16->i32 conversion 23 March 2023, 16:54:04 UTC
4fa913e Add missing #include <exception> (#7445) 23 March 2023, 16:46:58 UTC
8495be1 [xtensa] Returned old free_helper (#7441) * [xtensa] Returned old free_helper due to poor performance of xtensa compiler with the new one * [xtensa] removed unneeded comment from codegen_c 22 March 2023, 18:14:39 UTC
4cb6dba Redo CPU schedule for bilateral grid (#7436) 19 March 2023, 20:55:40 UTC
ec6c234 Merge branch 'main' into xtensa-codegen 17 March 2023, 17:45:29 UTC
badf486 Disable performance_boundary_conditions under WebGPU pending #7420 (#7435) 17 March 2023, 17:41:48 UTC
658ceba Move large Xtensa-codegen source into external template files (#7430) * Move large Xtensa-codegen source into external template files * Update CodeGen_Xtensa_vectors.template.cpp * Fix sign mismatch * Update XtensaOptimize.cpp 16 March 2023, 23:25:52 UTC
bf1133a Make sure that count for load_variable is positive 16 March 2023, 22:24:23 UTC
5a025a7 Fix formatting 16 March 2023, 22:20:57 UTC
37329e1 Limit halide_xtensa_extract_*_of_* to native vectors 16 March 2023, 22:20:01 UTC
d31dcb7 Add halide_xtensa_extract_*_of_4_u16 16 March 2023, 22:18:21 UTC
d7153e3 Revert all apps/ to current top-of-tree status 16 March 2023, 21:56:36 UTC
d598e54 Merge branch 'main' into xtensa-codegen 16 March 2023, 21:56:10 UTC
643b2f1 Modify runtime calls to always return a valid halide_error_code_t value (#7404) * Modify runtime calls to always return a valid halide_error_code_t value Currently, the return values from our runtime code are a mishmash -- there's lots of code that returns any random nonzero value to indicate an error. This isn't wrong per se, but it's not clean, and it's desirable that the return values are predictable. This PR doesn't change the call signature of any (public) Halide Runtime functions, but modifies the internal logic so that all return values are valid values of `enum halide_error_code_t`. Generally, there should be minimal change to the code otherwise, although I did leave in a few drive-by changes that I couldn't resist (e.g., better error-checking when dynamically loading symbols). My long-term goal here is to eventually propose changing the signature of runtime functions that return errors to actually return `enum halide_error_code_t`; as you might imagine, making that transition might be controversial for a number of reasons. This PR is intended to be a way to make such a future transition easier to reason about, while arguably improving the code quality of the runtime slightly. * tidy * Update opencl.cpp * trigger buildbots * trigger buildbots * Fix merge mistake * Update gpu_context_common.h * Update cuda.cpp * Werror * status_ * if-with-initializer format * Update cuda.cpp * Update opencl.cpp * Update cuda.cpp * Update device_interface.h * Update hexagon_cache_allocator.cpp * Update printer.h * remove prefixes * "device field is already non-zero" * Update opencl.cpp * Update cuda.cpp * Fix error spacing * trigger buildbots 16 March 2023, 21:42:34 UTC
88b3ef8 Split WebGPU runtime into two variants (#7248 workaround) (#7419) * Split WebGPU runtime into two variants (#7248 workaround) Halide promises that you can crosscompile to *any* supported target from a 'stock' build of libHalide. Unfortunately, the initial landing of WebGPU support breaks that promise: we compile the webgpu runtime support (webgpu.cpp) with code that is predicated on `WITH_DAWN_NATIVE` (for Dawn vs Emscripten, respectively). This means that if you build Halide with `WITH_DAWN_NATIVE` defined, you can *only* target Dawn with that build of Halide; similarly, if you build with `WITH_DAWN_NATIVE` not-defined, you can only target Emscripten. (Trying to use the 'wrong' version will produce link-time errors.) For people who build everything from source, this isn't a big deal, but for people who just pull binary builds, this is a big problem. This PR proposes a temporary workaround until the API discrepancies are resolved: - Compile the existing webgpu.cpp runtime *both* ways - in LLVM_Runtime_Linker.cpp, select the correct variant based on whether the Target is targeting wasm or not - Profit! This is a rather ugly hack, but it should hopefully be (relatively) temporary. * A few more fixes * Update HalideGeneratorHelpers.cmake * Update interpreter.cpp * Update interpreter.cpp 16 March 2023, 21:35:52 UTC
cc74ee8 Move long boilerplate C/C++ code into template files (#7426) * Move long boilerplate C/C++ code into template files Codegen_C has a couple of long strings with boilerplate code that is conditionally emitted; at least one of these is too long for a single string literal under MSVC. Let's try moving these into standalone files instead; this may make it easier to use conventional tooling on the C++ code, and make Codegen_C easier to read and think about. (Note that the not-yet-landed Xtensa branch should also use this approach, if we decide this approach is good, since it has even more such code.) TODO: probably would be good to augment `binary2cpp` to allow option comments in the source file that are stripped in the output file (e.g. "This file is used in CodeGen_C.cpp for blah blah blah, look out for blah") * Remove detritus 16 March 2023, 18:22:23 UTC
afda48d Merge branch 'main' into xtensa-codegen 15 March 2023, 20:43:43 UTC
50f8c85 Disable correctness_atomics on Windows with Cuda, alas (#7423) (#7424) 15 March 2023, 17:16:01 UTC
ae59b91 Ignore assertions inside WebGPU kernels (#7418) This is the approach that (e.g.) the OpenCL backend takes to assertions inside kernel code. 14 March 2023, 20:56:33 UTC
e966163 Log name of failing function for !function_takes_user_context (#7417) 14 March 2023, 20:33:34 UTC
05fa61a Fix for top-of-tree LLVM (#7416) 14 March 2023, 18:40:40 UTC
9199849 A few minor cleanups in WebGPU backend (#7413) * A few minor cleanups in WebGPU backend Mostly just using using halide_error_code_t values everywhere. * trigger buildbots * Update webgpu.cpp 14 March 2023, 18:01:46 UTC
b63139e Update mini_webgpu.h with latest changes from Dawn (#7415) 14 March 2023, 17:01:25 UTC
d383cb9 Fix null device crash during WebGPU initialization (#7414) If RequestDevice fails, make sure we exit initialization early instead of trying to create a staging buffer with a nullptr device. 14 March 2023, 16:58:12 UTC
6d19cec printer.h should include HalideRuntime.h (#7412) printer.h uses uint64_t, so it needs to include something that ensures that type is defined. HalideRuntime.h is probably the right choice (since it always transitively includes runtime_internal.h when compiling runtime. 13 March 2023, 22:36:38 UTC
078465c Add initial support for WebGPU (#6492) * [WebGPU] Add runtime stubs and codegen skeleton All runtime functions are currently unimplemented, and the shader codegen just emits an empty compute shader. * [WebGPU] Implement lazy device initialization Sychronize device access using a WebGpuContext object, as is done in the other GPU runtimes. Device initialization is asynchronous, so we rely on Emscripten's Asyncify mechanism to wait for it to complete. * [WebGPU] Implement device release Release the device and adapter. * [WebGPU] Add scoped error handling mechanism WebGPU uses asynchronous callbacks for error handling, so we need to spin-lock while waiting for them to fire in order to determine success/failure status for any WebGPU APIs that we call. * [WebGPU] Implement device malloc/free Create a WGPUBuffer with Storage|CopyDst|CopySrc usages. We'll need a staging buffer to perform host<->device transfers, as a buffer used as a storage buffer cannot be mapped. * [WebGPU] Implement basic host<->device copies Use a staging buffer to copy data from the device, since we cannot map a buffer that is used a storage buffer. This logic will need an overhaul in order to support buffers that represent sub-regions of larger buffers. * [WebGPU] Implement halide_webgpu_device_sync Just wait for all submitted commands to complete. * [WebGPU] Implement shader compilation * [WebGPU] Implement core of WGSL codegen This implements enough of the WGSL codegen required to generate code for a 32-bit version of the apps/blur example. Buffer arguments are emitted as read_write storage buffers. Non-buffer arguments are collected into a structure and generated as a single uniform buffer. Workgroup sizes are currently required to be constant. This can relaxed in the future by using pipeline-overridable constants. * [WebGPU] Implement Cast node WGSL uses different syntax for type casts. * [WebGPU] Implement the float_from_bits() intrinsic * [WebGPU] Implement run function Create a compute pipeline, create a uniform buffer for non-buffer arguments, and dispatch a compute command to the queue. Does not handle workgroup storage yet. * [WebGPU] Move queue into WgpuContext class The queue handle received from wgpuDeviceGetQueue needs to be released, so it makes more sense to automatically get/release this handle in the WgpuContext constructor/destructor. * [WebGPU] Add support for JIT This requires a native implementation of WebGPU (e.g. Dawn or wgpu). * [WebGPU] Enable the gpu_only AOT generator test Link generator tests against the native WebGPU library if specified. * [WebGPU] Add support for targeting dawn-native We have to make this a compile-time switch for now, as the supported APIs currently differ between Dawn and Emscripten. We should be able to remove all of these conditionals when the native API stabilizes. * [WebGPU] Add support for AOT tests when using WASM Use Dawn's node bindings to run these tests, by pre-pending some initialization code to the script that nodejs will invoke. * [WebGPU] Print explicit types for let declarations This is makes the generated code a little more human-readable. * [WebGPU] Address first round of review comments * [WebGPU] Add copy of webgpu.h This is a verbatim copy of the Emscripten version, just without the include directives. * [WebGPU] Add comment about ASYNCIFY requirement * [WebGPU] Remove -Wno-atomic-alignment Tweak atomic primitive usage to avoid the need for this. * pacify clang-tidy * Fix more clang-tidy errors * Only use ASYNCIFY for tests when targeting WebGPU * Fix even more clang-tidy errors * [WebGPU] Add basic support to Makefile * [WebGPU] Don't wrap buffers inside structures This requirement has been removed from the WGSL specification, and the corresponding implementation changes have now landed in Dawn. * [WebGPU] Fix debug message tag * [WebGPU] Update WGPUErrorFilter enum in header * [WebGPU] Update WGSL attribute syntax The WGSL specification recently changed attribute syntax from [[attribute]] to @attribute. * [WebGPU] Add README_webgpu.md Explains how to configure Halide to target WebGPU for both Emscripten and Dawn native. Also lists several known limitations. * [WebGPU] Move native WebGPU library CMake logic This is the correct place, otherwise the link line order is wrong. * [WebGPU] Implement WGSL codegen for serial loops * [WebGPU] Implement WGSL codegen for Allocate Use array types for stack allocations. Leave GPU shared memory unimplemented for now. * [WebGPU] Implement WGSL codegen for Select Use the WGSL select builtin function, which supports bool vector conditions for component-wise selection too. * [WebGPU] Mark 64-bit types as unsupported * [WebGPU] Implement device_and_host_{malloc,free} Just use the default implementations. * [WebGPU] Fixed WGSL codegen for boolean vectors * [WebGPU] Implement f32 math intrinsics * [WebGPU] Implement inverse and inverse sqrt * [WebGPU] Fixup some errors in WGSL codegen * [WebGPU] Implement logical and/or for bool vectors * [WebGPU] Implement WGSL codegen for Broadcast node * [WebGPU] Implement WGSL codegen for Ramp node * [WebGPU] Emulate 8- and 16-bit integers Use atomics to emulate storage, and widen the values to 32-bits when operating on them. * [WebGPU] Avoid buffer name collisions Buffers are declared as global variables in WGSL, so prefix them with the kernel name to avoid collisions. * [WebGPU] Fix divide-by-power-of-two WGSL requires the RHS of a shift to be unsigned. * [WebGPU] Implement codegen for gpu_thread_barrier * [WebGPU] Implement WGSL codegen for Evaluate This fixes an issue with the halide_unused() intrinsic. * [WebGPU] Add support for shared memory This currently only supports statically-sized shared memory allocations; dynamically-sized allocations require using pipeline-overridable constants which are not yet implemented in Dawn. * [WebGPU] Fix 8/16-bit load/store emulation Loads and stores that implicitly casted to/from 32-bits were casting when they should not have been, and not casting when they should. * Use generic 64-bit support query in gpu_mixed_shared_mem_types This is more robust than checking for specific APIs. * [WebGPU] Fix object cleanup during teardown * [WebGPU] Do not re-allocate device buffers * [WebGPU] Mark maximum vector width as 4 in tests * [WebGPU] Add functions to object lifetime tracker * [WebGPU] Override FloatImm handling We need to explicitly suffix the integer literal with a `u` to make it unsigned. * [WebGPU] Scalarize predicated loads * [WebGPU] Implement if_then_else intrinsic This is generated when scalarizing predicated loads. * [WebGPU] Enable gpu_free_sync test * [WebGPU] Implement print_reinterpret Use WGSL's bitcast<> operator. * [WebGPU] Implement print_extern_call This is just a regular function call in WGSL. * Add missing include and namespace for isnan/isinf * [WebGPU] Avoid short-circuiting operators These can cause issues with WGSL's uniformity analysis. * [WebGPU] Use commas for struct member seperators * [WebGPU] Update API headers and usages Two API functions were renamed. * [WebGPU] Use CodeGen_GPU_C base class for codegen Introduce a new enum value for WGSL's vector syntax. * [WebGPU] Add warning for emulated narrow integers * [WebGPU] Update README with latest status * [WebGPU] Add support for non-contiguous copies Also adds support for buffer cropping/slicing. * [WebGPU] Fix clang-tidy error * [WebGPU] Use atomicCmpXchg for 8/16-bit emulation Halide assumes that write-write data-races are benign when both threads are writing the same value, but this is not true when those writes are implemented using atomicXor. We need to use atomicCompareExchangeWeak in a loop to perform this emulation instead. Unfortunately this makes things even slower :-( * [WebGPU] Support non-32-bit parameter types Expand them to 32-bits in the shader and the runtime. * [WebGPU] Fix mixed types in buffers The atomic emulation of narrow types shouldn't use the allocation type, since the same buffer may be re-used for multiple types. This means we also sometimes need to perform 32-bit accesses using atomics as well. Instead of using the buffer allocation type, we now pre-traverse the IR to check for accesses that will require emulation, and mark the corresponding buffer as such. * [WebGPU] Show validation errors for failed maps * [WebGPU] Round up buffer offsets and sizes The WebGPU API requires that these are multiples of 4. * [WebGPU] Update implementation status in README * [WebGPU] Replace @stage(compute) with @compute * [WebGPU] Polyfill the pow_f32 intrinsic The pow builtin in WGSL only has the correct semantics for x>0, so we need to emulate the behavior for the other cases. * [WebGPU] Skip the gpu_allocation_cache test Allocation cache is not yet implemented for the WebGPU backend, and so this test takes forever. * [WebGPU] Use builtins for inverse hyperbolics * [WebGPU] Map rint() to round() WGSL's round() builtin has round-to-nearest-even semantics. * [WebGPU] Set device lost callback This provides more information about events that cause the device to become inaccessible. * [WebGPU] Use i32 for bool parameters The bool type cannot be used in the uniform storage class in WGSL. * [WebGPU] Raise limits for buffer size and workgroup storage * [WebGPU] Update mini_webgpu.h Disable clang-format for this file. * [WebGPU] Add support for dynamic workgroups This is implemented using pipeline-overridable constants. * [WebGPU] Avoid using 'new' as an identifier This is a reserved keyword in WGSL. * [WebGPU] Do not merge workgroup allocations Since we promote 8- and 16-bit integers to 32-bit in workgroup memory, merging multiple different GPUShared allocations into a single one can cause different regions to overlap and data races ensue. * [WebGPU] Fix Emscripten support * [WebGPU] Use const for integer immediates This avoids some MSL compiler ICEs with the code generated by Dawn, and also makes it easier for the WGSL compiler to constant-fold these values. * [WebGPU] Squelch clang-tidy error * [WebGPU] Note Dawn's dependency on go in README * [WebGPU] Add links to Emscripten vs Dawn issue * [WebGPU] Show error finding WebGPU library fails * [WebGPU] Add link to issue about Windows support * [WebGPU] Rename roundUpToMultipleOf4 ...and use it in one place where we weren't. * [WebGPU] Add links to wrap_native issue * [WebGPU] Use debug_assert for some runtime errors * [WebGPU] Stop using designated initializers Add -Wc++20-designator flag to runtime build to prevent this from happening again. * [WebGPU] Update mini_webgpu.h * [WebGPU] Fix a validation issue with ToT Dawn A bitcast that produces a NaN is invalid in WGSL, so implement the `nan()` intrinsic via `float_from_bits()`. * Update README_webgpu.md * Add is_finite_f32, is_inf_f32, is_nan_f32 * Update isinf, isnan, isfinite; add inf_f32, neg_inf_f32 * correctness_isnan should be skipped for WebGPU * Update isnan.cpp * Update atomics_gpu_8_bit.cpp * Fix python_tutorial_lesson_10_aot_compilation_run * Partial fix for generator_aot_acquire_release() This adds the necessary (horrible hackery) to bring the WebGPU case in line with the other backends... but the test still fails, apparently due to the same copy-to-host bug we suspect for generator_aot_gpu_only.Pushing this anyway because it was annoying to write :-) * [WebGPU] Fix AOT test build for non-WASM * [WebGPU] Move README to root * [WebGPU] Address review comments * wip * Update CMakeLists.txt * Work-in-progress for generator_aot_gpu_multi_context_threaded * [WebGPU] Use a per-context staging buffer This fixes the generator_aot_gpu_multi_context_threaded tests. * [WebGPU] Fix clang-format issue * [WebGPU] Move staging buffer creation We shouldn't be doing this inside a callback as we use async error checking routines. * Revert "Merge branch 'webgpu' of https://github.com/jrprice/Halide into pr/6492" This reverts commit 9d79ac60153e12ed51ea9f1a91a544f3fc7ccc78, reversing changes made to 0673e6754e53e62cf6029efdb94a57447db2f03a. * Skip correctness_multi_way_select on x86 Macs (for now) * clang-format * [WebGPU] Recommit 742db3feb888394ff6529c357da3f32ae9286ea4 13 March 2023, 20:30:38 UTC
613655d Add is_stack_private_to_thread() 13 March 2023, 17:35:57 UTC
08d3c37 Merge branch 'main' into xtensa-codegen 13 March 2023, 17:31:53 UTC
78097a7 Dont use negative values for exit() (#7405) * Dont use negative values for exit() A program that terminates via `exit(-1)` leaves `$? = 255` (per the exit(3) manpage), as the value is chopped with `& 0377`. This makes it hard to use `git bisect` to track down bugs, as considers any exit code > 127 (or = 15, oddly enough) equivalent to an `abort()` and terminates the bisect. IMHO this is perverse behavior on the part of `git bisect`, but it is what it is, so let's revamp our tests to avoid calling `exit(-1);`. * trigger buildbots 13 March 2023, 17:01:35 UTC
822f5da Don't return negative values from main() (#7406) * Don't return negative values from main() A program that returns `-1` from `main()` leaves $? = 255 (per the exit(3) manpage), as the value is chopped with & 0377. This makes it hard to use git bisect to track down bugs, as considers any exit code > 127 (or = 15, oddly enough) equivalent to an abort() and terminates the bisect. IMHO this is perverse behavior on the part of git bisect, but it is what it is, so let's revamp our tests to avoid returning `-1` from main to indicate failure, and return `1` instead. * trigger buildbots * validate_gpu_object_lifetime() handling 13 March 2023, 17:00:50 UTC
f76bcc8 Cleanup of src/runtime/internal and test/runtime (#7399) * Cleanup of src/runtime/internal and test/runtime - Don't include .cpp files. - Don't use header-only "libraries" that rely on include order or being included only once (to wit: test/runtime/common.h -> common.cpp) - All files should explicitly #include what they need, even if they think it's already included (ie, order of include files should not matter) - In src/runtime/internal, change all `halide_abort_if_false` -> `halide_debug_assert` - in test/runtime, add HALIDE_CHECK to common.h and use it for tests instead of `halide_abort_if_false` * 32-bit fixes 12 March 2023, 17:49:46 UTC
0beb081 docs: Use halide.imageio instead of imageio (#7409) * docs: Use halide.imageio instead of imageio * trigger buildbots --------- Co-authored-by: Steven Johnson <srj@google.com> 11 March 2023, 20:45:17 UTC
4bf660e Fix overflow in x86 absd lowering (#7407) * Fix overflow in x86 absd lowering * Fix default lowering too 11 March 2023, 00:53:44 UTC
ead83e6 Docs/update readme python (#7402) * docs: Update simple usage in README_python.md * docs: Add imports 10 March 2023, 18:23:21 UTC
b02de4b Update all use of actions/checkout to v3 (#7400) 09 March 2023, 17:46:43 UTC
6a6c842 Backport changes to CodegenC from the xtensa branch (#7396) Backported with some trivial style changes (virtual method instead of flag, since this should never changes for a given instance) 07 March 2023, 23:56:03 UTC
a2df5bd Attempt to give proper labels to pip dev builds (v2) (#7395) A much-simpler alternative to #7394 07 March 2023, 19:29:08 UTC
3a0b891 Merge branch 'main' into xtensa-codegen 06 March 2023, 21:43:48 UTC
4a80251 Destringify CanonicalizeGPUVars (#7386) * Destringify CanonicalizeGPUVars This new implementation takes the high-water marks of each type of GPU loop, instead of filtering using the prefix of the loop name. * Better comments * fix typo 03 March 2023, 20:57:47 UTC
aa8fcad hannk: Provide weak symbol functions to use op profiling (#7388) * hannk: Provide weak symbol functions to use op profiling You can add your own profiler with strong symbols. * hannk: Guard profiler feature with HANNK_PROFILER 03 March 2023, 20:56:52 UTC
5e81e91 Remove incorrect halide_xtensa_sat_narrow_u16 03 March 2023, 18:56:46 UTC
c855273 Merge branch 'main' into xtensa-codegen 03 March 2023, 18:55:38 UTC
387a19c Make README consistent on supported LLVM versions (#7390) Make README consistent with https://github.com/halide/Halide/pull/7093 03 March 2023, 01:10:50 UTC
122b5b6 hannk: Add device_sync method hannk::Tensor class (#7387) hannk: Add devic_sync method hannk::Tensor class This method is useful to debug performance with synchronous execution. 01 March 2023, 22:22:20 UTC
303a90c Minor cleanup of GPUCompilationCache (#7376) * Minor cleanup of GPUCompilationCache While tracking down an apparently-unrelated threading bug in the webgpu backend, I made some tweaks to this code that I think are worth keeping. The main one of importance is that `release_hold()` and `release_context()` really should acquire the mutex -- they weren't before -- so now all public methods are properly mutexed. The other changes are mostly cosmetic: - Moved helper methods to be private rather than public - Changed the id value size to be `uintptr_t` rather than `uint32_t`; the space allocated for them is sizeof(void*). (Not sure this moves the needle but it felt right.) - Removed unused ctor for CachedCompilation Also, drive-by change in printer.h to capture some logging improvements. * Fix deadlock release_all() and release_context() were contending for the mutex 01 March 2023, 17:21:48 UTC
b48f78c Merge branch 'xtensa-codegen' of https://github.com/halide/Halide into xtensa-codegen 01 March 2023, 00:42:59 UTC
040d773 Merge branch 'main' into xtensa-codegen 01 March 2023, 00:42:07 UTC
fba892a Specify a full type for llvm::IRBuilder (#7384) Specify full type for llvm::IRBuilder 01 March 2023, 00:29:42 UTC
c16b5e2 [xtensa] removed tests that are failing to compile (#7362) * [xtensa] removed tests that are failing to compile due to poor support of int48 in scalarised regime * [xtensa] removed runtime generation for xtensa tests, as it is not used 28 February 2023, 17:31:38 UTC
5c02ae2 Bounds visitor for div was missing single_point mutated case (#7379) * Bounds visitor for div was missing single_point mutated case Signed-off-by: Adrian Lebioda <adrian.lebioda@hexagon.com> * Add test Signed-off-by: Adrian Lebioda <adrian.lebioda@hexagon.com> --------- Signed-off-by: Adrian Lebioda <adrian.lebioda@hexagon.com> Co-authored-by: Adrian Lebioda <adrian.lebioda@hexagon.com> 28 February 2023, 02:14:41 UTC
bdba694 Add Callable default ctor + `defined()` method (#7380) * Add Callable default ctor + `defined()` method This allows it to behave like * Add user_assert + test 28 February 2023, 02:14:24 UTC
2fb3b62 Better handling of u1 to i16 cast & clean-up 28 February 2023, 01:21:13 UTC
0091fd9 [xtensa] Limit the number of allowed DMA channels + allocate a separate channel for the output transactions (#7381) * Limit the number of allowed DMA channels + allocate a separate channel for the output transactions * Fix formatting 28 February 2023, 00:35:02 UTC
38057b8 Merge branch 'main' into xtensa-codegen 27 February 2023, 23:28:26 UTC
c42a5b2 Remove a gross hack from gpu_only_aottest (#7378) * Remove a gross hack from gpu_only_aottest Also add metal support * Add missing include 27 February 2023, 16:46:41 UTC
09400f6 Bounds visitors for min/max were missing single_point mutated case (#7377) * Bounds visitors for min/max were missing single_point mutated case Partially fixes #7374 * Add test 25 February 2023, 17:16:32 UTC
b6a18b8 Update WABT to 1.0.32; Increase stack size for WASM AOT apps (#7373) 23 February 2023, 19:36:32 UTC
144c1a4 correctness_round should use Target::supports_type() (#7372) This gives it proper support for new GPU backends 23 February 2023, 17:59:39 UTC
b17806d Use HalideFreeHelper for the register_destructor (#7371) Slightly cleaner code. Also, drive-by change of NULL -> nullptr 23 February 2023, 17:59:13 UTC
629da52 Use single-char form of `unique_name` for semaphores (#7370) The multi-char form of `unique_name` will append a `$` to the identifier, e.g. `sema$4`. This isn't really legal for a C/C++ identifier. 23 February 2023, 17:58:37 UTC
ad6c84a [xtensa] Clean up HalideFreeHelper code (#7368) * Clean up HalideFreeHelper code - Revise HalideFreeHelper to be a templated struct, to save the unnecessary stack storage for the function - Add emit_halide_free_helper() method to consolidate usage - Add a nullptr check to the `stack_is_core_private`, per comment - Fix some minor whitespace issues (If this PR is accepted here, I will of course backport the non-xtensa portions to main) * Update CodeGen_C.cpp 22 February 2023, 23:45:59 UTC
29f3f42 Merge branch 'main' into xtensa-codegen 22 February 2023, 23:44:29 UTC
386a2d1 Clean Up HalideFreeHelper code (main) (#7369) * Clean Up HalideFreeHelper code (main) - Revise HalideFreeHelper to be a templated struct, to save the unnecessary stack storage for the function - Add emit_halide_free_helper() method to consolidate usage * Update CodeGen_C.cpp 22 February 2023, 23:39:25 UTC
e69fa42 Merge branch 'main' into xtensa-codegen 22 February 2023, 18:37:34 UTC
3246844 Use a std::unique_ptr for the IR Builder (#7356) * Use a std::unique_ptr for the IR Builder instead of a raw owning pointer * Use make_unique 21 February 2023, 21:18:30 UTC
back to top