https://github.com/halide/Halide

sort by:
Revision Author Date Message Commit Date
b2135e1 Merge remote-tracking branch 'origin/main' into abadams/tweak_unpack_buffers 21 February 2024, 18:53:54 UTC
c4d56c6 Small Tutorial Fix (#8111) * Update lesson_17_predicated_rdom.cpp * Update lesson_17_predicated_rdom.cpp 19 February 2024, 22:46:15 UTC
be64516 Do less redundant work in UnpackBuffers We were redundantly creating a handle Variable every time we encountered something like foo.stride.0, instead of just the first time we encounter a Variable that refers to an input Parameter/Buffer. Speeds up this already-fast lowering pass by 10% or so. No measurable impact on total lowering time. 18 February 2024, 02:54:55 UTC
4fc1e57 Fix an issue where the Halide compiler hits an internal error for bool types in widening intrinsics. (#8099) * Fix an issue where the Halide compiler hits an internal error when bool types are used with e.g. widening_mul. This situation did not arise from user code doing this directly, but rather through some chain o lowering with float16 types. The test cases added to correctness_intrinsics target the issue directly and do fail without the fix. I did not add broader coverage for bool types and intrinsics as it would require more thinking. Most of them overflow for the true/true case and thus are of questionable use, however widening operations cannot overflow... Certainly we could define the language to forbid this, but currently the frontend does not do so. As indicated above, the use case driving this was not using bool arithmetic to begin with. * Formatting. 16 February 2024, 21:58:23 UTC
d9668c5 Fix clang-tidy error in runtime.printer.h (parameter shadows member) (#8074) 15 February 2024, 17:57:16 UTC
2855ca3 Strip asserts right at the end of lowering (#8094) The simplifier exploits asserts to make simplification. When compiling with NoAsserts, certain assertions aren't ever introduced, which means that the simplifier can't exploit certain things that we know to be true. Mostly this has a negative effect on code size. E.g. tail cases get generated even though they are actually dead code. This PR keeps all the assertions right until the end of lowering, when it strips them in a dedicated pass. This reduces object file size for a large production blob of Halide code by ~10%, without measurably affecting runtime. 15 February 2024, 17:06:36 UTC
e6e1b6f Ensure string(REPLACE) is called with the right number of arguments (#8097) 15 February 2024, 01:58:55 UTC
9a740b5 [Vulkan] Region allocator fixes for memory requirements and allocations (#8087) * Add region allocator tests that check alignment, nearest_multiple and collect routines * Fix can_split() routine to use conformed sizes so that split allocation matches Fix region size accounting so that coalesce never has zero size regions to merge * Fix aligned_offset() routine to check for zero alignment (which means no constraint) * Fix ifdef for internal debugging * Clean up debug internal log messages * Use memory_requirements to determine nearest_multiple during initialization Query memory_requirements for each region, and reallocate if driver requires additional device memory * Formatting pass --------- Co-authored-by: Derek Gerstmann <dgerstmann@adobe.com> 14 February 2024, 22:41:51 UTC
b582561 Fix reduce_expr_modulo of vector in Solve.cpp (#8089) * Fix reduce_expr_modulo of vector in Solve.cpp * Fix test 14 February 2024, 21:57:09 UTC
f2d750f tests: correctness/float16_t: mark `__extendhfsf2` with default visibility (#8084) ``` [2336/4154] /usr/bin/clang++-17 -DHALIDE_ENABLE_RTTI -DHALIDE_VERSION_MAJOR=17 -DHALIDE_VERSION_MINOR=0 -DHALIDE_VERSION_PATCH=0 -DHALIDE_WITH_EXCEPTIONS -I/build/halide-17.0.0/test/common -I/build/halide-17.0.0/tools -I/build/halide-17.0.0/build/stage-1/halide/include -g -fdebug-default-version=4 -fprofile-use=/build/halide-17.0.0/build-profile/default.profdata -fcs-profile-generate -Xclang -mllvm -Xclang -vp-counters-per-site=100.0 -fuse-ld=lld-17 -Wl,--build-id=sha1 -std=c++17 -flto=thin -fPIE -fvisibility=hidden -fvisibility-inlines-hidden -Winvalid-pch -Xclang -include-pch -Xclang /build/halide-17.0.0/build/stage-1/halide/test/CMakeFiles/_test_internal.dir/cmake_pch.hxx.pch -Xclang -include -Xclang /build/halide-17.0.0/build/stage-1/halide/test/CMakeFiles/_test_internal.dir/cmake_pch.hxx -MD -MT test/correctness/CMakeFiles/correctness_float16_t.dir/float16_t.cpp.o -MF test/correctness/CMakeFiles/correctness_float16_t.dir/float16_t.cpp.o.d -o test/correctness/CMakeFiles/correctness_float16_t.dir/float16_t.cpp.o -c /build/halide-17.0.0/test/correctness/float16_t.cpp <...> ld.lld-17: error: undefined hidden symbol: __extendhfsf2 >>> referenced by float16_t.cpp:391 (/build/halide-17.0.0/test/correctness/float16_t.cpp:391) >>> lto.tmp:(main) >>> did you mean: __extendbfsf2 >>> defined in: /lib/x86_64-linux-gnu/libgcc_s.so.1 clang++-17: error: linker command failed with exit code 1 (use -v to see invocation) ``` 14 February 2024, 20:35:52 UTC
40a622f clang does not support `_Float16` when targeting i386 (#8085) See https://github.com/halide/Halide/issues/7678 14 February 2024, 20:34:23 UTC
6edea16 Allow disabling of mutlithreading in simd op check (#8096) simd_op_check_xtensa is not threadsafe at present 14 February 2024, 20:26:27 UTC
c8f43f3 Parallelize some tests (#8078) * Parallelize some tests This reduces the time taken to run all correctness tests from 8:15 to 3:15 on my machine. * The FIXME is actually fine * Remove debug print * Fix when we're willing to run x86 code in simd_op_check * Use separate imageparams per task * Deep-copy the LoopLevels * Make float16_t neon op check test at least build * Revert accidental serialization * Throw return values from callable into the void We don't have a custom error handler in place, so they're always zero * Skip test under ASAN * Fix unintentional change to test 13 February 2024, 21:47:19 UTC
d8cfed6 Forward the partition methods from generator outputs (#8090) 13 February 2024, 21:47:09 UTC
ada6345 Fix rfactor adding too many pure loops (#8086) When you rfactor an update definition, the new update definition must use all the pure vars of the Func, even though the one you're rfactoring may not have used them all. We also want to preserve any scheduling already done to the pure vars, so we want to preserve the dims list and splits list from the original definition. The code accounted for this by checking the dims list for any missing pure vars and adding them at the end (just before Var::outermost()), but this didn't account for the fact that they may no longer exist in the dims list due to splits that didn't reuse the outer name. In these circumstances we could end up with too many pure loops. E.g. if x has been split into xo and xi, then the code was adding a loop for x even though there were already loops for xo and xi, which of course produces garbage output. This PR instead just checks which pure vars are actually used in the update definition up front, and then uses that to tell which ones should be added. Fixes #7890 12 February 2024, 18:10:00 UTC
9c3615b Add checks to prevent people from using negative split factors (#8076) * Add checks to prevent people from using negative split factors Our analysis passes assume that loop maxes are greater than loop mins, so negative split factors cause sufficient havoc that not even output bounds queries are safe. These are therefore checked on pipeline entry. This is a new way for output bounds queries to throw errors (in addition to the buffer pointers themselves being null, and maybe some buffer constraints). Testing this, I realized these errors were getting thrown twice, because the output buffer bounds query in Pipeline::realize was built around two recursive calls to realize, and both were calling the custom error handler. In addition to reporting errors in this class twice, this implies several other inefficiencies, e.g. jit call args were being prepped twice. I reworked it to be built around two calls to call_jit_code instead. Fixes #7938 * Add test to cmakelists * Remove pointless target arg to call_jit_code It has to be the same as the cached target in the receiving object anyway 11 February 2024, 18:41:01 UTC
22581bf Remove OpenGLCompute (#8077) * Remove OpenGLCompute This was supposed to be removed in Halide 17 (oops), removing for Halide 18 * Update dynamic_allocation_in_gpu_kernel.cpp * Update dynamic_allocation_in_gpu_kernel.cpp * Update halide_ir.fbs 11 February 2024, 18:40:09 UTC
a3baa5d [WebGPU] Update to latest native headers (#8081) * [WebGPU] Update to latest native headers * Remove #ifdef for `requiredFeature[s]Count` * Pass nullptr to wgpuCreateInstance * Emscripten currently requires this * Dawn accepts it too * Use nullptr for another wgpuCreateInstance call 09 February 2024, 18:39:21 UTC
de8e39d Bump serialization version to 18.0.0 (#8080) * Bump serialization version to 18.0.0 As a matter of policy, we should probably bump the version of the serialization format for every version of Halide -- even if changes are minimal-to-nonexistent -- to reinforce the fact that this isn't intended in any way as a long-term archival format. This PR suggests that we bump the major version to match the main Halide version, but I'm open for other suggestions. * Update halide_ir.fbs 09 February 2024, 16:55:00 UTC
55dfa39 Add an easy way to print vectors in debug output. (#8072) * Add helper to print containers, or at least vectors, in debug info. * Add documentation comments. * Formatting. * Name change. 07 February 2024, 18:23:46 UTC
39e5c08 Better validation of gpu schedules (#8068) * Update makefile to use test/common/terminate_handler.cpp This means we actually print error messages when using exceptions and the makefile * Better validate of GPU schedules GPU loop constraints were checked in two different places. Checking them in ScheduleFunctions was incorrect because it didn't consider update definitions and specializations. Checking them in FuseGPUThreadLoops was too late, because the Var names have gone (they've been renamed to things like __thread_id_x). Furthermore, some problems were internal errors or runtime errors when they should have been user errors. We allowed 4d thread and block dimensions, but then hit an internal error. This PR centralizes checking of GPU loop structure in CanonicalizeGPUVars and adds more helpful error messages that print the problematic loop structure. E.g: ``` Error: GPU thread loop over f$8.s0.v0 is inside three other GPU thread loops. The maximum number of nested GPU thread loops is 3. The loop nest is: compute_at for g$8: for g$8.s0.v7: for g$8.s0.v6: for g$8.s0.v5: for g$8.s0.v4: gpu_block g$8.s0.v3: gpu_block g$8.s0.v2: gpu_thread g$8.s0.v1: gpu_thread g$8.s0.v0: store_at for f$8: compute_at for f$8: gpu_thread f$8.s0.v1: gpu_thread f$8.s0.v0: ``` Fixes the bug found in #7946 * Delete dead code * Actually clear the ostringstream 07 February 2024, 17:49:06 UTC
37153a9 Fix bool conversion bug in Vulkan code generator (#8067) * Fix bug in Vulkan code generator that was incorrectly passing the address of a byte vector, instead of its contents to builder.declare_constant() * Add bool_predicate_cast correctness test to verify bool conversion for Vulkan codegen works as expected --------- Co-authored-by: Derek Gerstmann <dgerstmann@adobe.com> 07 February 2024, 17:43:58 UTC
78a0762 Add hexagon_benchmarks app for CMake builds (#8069) * Add hexagon_benchmarks app for CMake builds * Removed unnecessary -lc++abi flag from GCC build 07 February 2024, 17:41:51 UTC
84fe565 Outsmart the LLVM optimizer (#8073) The old definitions of bool_1, bool_2, bool_3 in simd_op_check_x86 (etc) all referred to the same entry in in_f32; as of https://github.com/llvm/llvm-project/pull/76367, the LLVM optimizer is smart enough to realize that (eg) bool1 != bool2 by construction, and optimizes away the code that tests their conditions, such as the one for andps and orps. Initing them from different locations is enough to outsmart the compiler. (bug was only noticed in the x86 test, but I updated the other tests to guard against future improvements there too.) 07 February 2024, 17:41:21 UTC
665804c Don't require Halide_WebGPU when using wasm (#8063) (#8065) * Don't require Halide_WebGPU when using wasm (#8063) * trigger buildbots 06 February 2024, 23:34:29 UTC
93bff95 add unsafe_promise_clamped (#8071) add unsafe_promise_clamp 06 February 2024, 23:34:02 UTC
80e2081 Update makefile to use test/common/terminate_handler.cpp (#8066) This means we actually print error messages when using exceptions and the makefile 05 February 2024, 22:25:05 UTC
e2448fe Fix type error in VectorizeLoops (#8055) 01 February 2024, 17:46:10 UTC
47378ee Enable `bugprone-switch-missing-default-case` (#8048) * Upgrade clang-format and clang-tidy to use LLVM 17 * trigger buildbots * trigger buildbots * trigger buildbots * trigger buildbots * Enable `bugprone-switch-missing-default-case` ...and fix existing warnings. * Update .clang-tidy * Update Parameter.cpp * Update .clang-tidy * Update .clang-tidy * Update .clang-tidy * Update .clang-tidy * Update CPlusPlusMangle.cpp 29 January 2024, 01:28:13 UTC
4b2d211 Upgrade clang-format and clang-tidy to use LLVM 17 (#8042) * Upgrade clang-format and clang-tidy to use LLVM 17 * trigger buildbots * trigger buildbots * trigger buildbots * trigger buildbots 27 January 2024, 00:33:24 UTC
45d7850 Track whether or not let expressions failed to solve in solver (#7982) * Track whether or not let expressions failed to solve in solver After mutating an expression, the solver needs to know two things: 1) Did the expression contain the variable we're solving for 2) Was the expression successfully "solved" for the variable. I.e. the variable only appears once in the leftmost position. We need to know this to know property 1 of any subexpressions (i.e. does the right child of the expression contain the variable). This drives what transformations we do in ways that are guaranteed to terminate and not take exponential time. We were tracking property 1 through lets but not property 2, and this meant we were doing unhelpful transformations in some cases. I found a case in the wild where this made a pipeline take > 1 hour to compile (I killed it after an hour). It may have been in an infinite transformation loop, or it might have just been exponential. Not sure. * Remove surplus comma * Fix use of uninitialized value that could cause bad transformation 26 January 2024, 20:01:41 UTC
3657cf5 Fix bounds_of_nested_lanes (#8039) * Fix bounds_of_nested_lanes bounds_of_nested_lanes assumed that one layer of nested vectorization could be removed at a time. When faced with the expression: min(ramp(x8(a), x8(b), 5), x40(27)) It panicked, because on the left hand side it reduced the bounds to x8(a) ... x8(a) + x8(b) * 4, and on the right hand side it reduced the bounds to 27. It then attempted to take a min of mismatched types. In general we can't assume that binary operators on nested vectors have the same nesting structure on both sides, so I just rewrote it to reduce directly to a scalar. Fixes #8038 26 January 2024, 17:26:12 UTC
4590a09 Fix for llvm trunk: Force-include more runtime types (#8045) * Fix for llvm trunk: Force-include more runtime types * Include the force-include-types module first * Fix comment * Expand comment 26 January 2024, 01:07:40 UTC
c1923f3 HALIDE_VERSION_MAJOR -> 18 (#8044) 24 January 2024, 23:53:28 UTC
6177e51 Update Halide version to 18 (#8043) 24 January 2024, 20:04:19 UTC
9b9dfaf Update Makefile for llvm 19 (#8040) 24 January 2024, 19:12:17 UTC
90e909d Allow LLVM 19 in CMake (#8041) 24 January 2024, 18:44:47 UTC
e0e9f63 Tweak the Printer code in runtime for smaller code (#8023) * Tweak the Printer code in runtime for smaller code TL;DR: template expansion meant that we had more replicated code than expected from the inline expansion of code in Printer and friends. Restructured and added NEVER_INLINE to try to make the call sites as small as possible. It's a modest code-size savings but nonzero... e.g., the linux-x86-64 .o output from correct_cross_compilation drops from 164280 bytes to 162936 bytes. * Update printer.h * debug * Update HalideTestHelpers.cmake * Update printer.h * fixes 22 January 2024, 21:43:00 UTC
22f9bb9 Add test for #8029 (#8032) Tweak correctness_float16_t so that it uses one of the transcendal functions (sqrt) that were missing in Metal. 17 January 2024, 16:26:43 UTC
3a77204 Require LLVM >= 16.0 (#8003) * Require LLVM >= 16.0 Per policy, we only support top-of-tree LLVM, plus two versions back; let's update to require LLVM >= 16, and drop workarounds for older versions. * LLVM_VERSION < 170 17 January 2024, 15:35:07 UTC
d2eed57 Fix build breakage for wasm targets (#8031) Update HalideTestHelpers.cmake 16 January 2024, 20:00:36 UTC
8d3c12e adds mappings for f16 variants of halide float math (#8029) * adds mappings for f16 variants of halide float math * fix clang format errors * trigger buildbots --------- Co-authored-by: Steven Johnson <srj@google.com> 16 January 2024, 18:55:53 UTC
91b063d Stronger chain detection in LoopCarry pass (#8016) * Stronger chain detection in LoopCarry * Make sure that types are the same * Add a comment * Run CSE before calling can_prove * Test for loop carry * clang-tidy * Add missing override * Update comments 09 January 2024, 04:57:15 UTC
cdebeb8 Fix -Wstrict-prototype warnings in HalideRuntime.h (#8027) When HalideRuntime.h is included in a C file, funtions that are declared with `()` instead of `(void)` for their arguments change meaning. These may cause issues downstream because different code is generated. 09 January 2024, 01:33:08 UTC
21accad Set warnings on tests as well as src (#8022) * Don't use variable-length arrays There was a rogue use of VLAs (an extension we don't want to use) in one of the runtime tests. Fixed the test. I'll follow up with a separate PR to ensure this warning is enabled everywhere to flush out other usages. * Set warnings on tests as well as src 04 January 2024, 17:04:34 UTC
daf011d Don't use variable-length arrays (#8021) There was a rogue use of VLAs (an extension we don't want to use) in one of the runtime tests. Fixed the test. I'll follow up with a separate PR to ensure this warning is enabled everywhere to flush out other usages. 04 January 2024, 17:04:18 UTC
b661c8d Quick fix for crash that is occurring in SVE2 tests. (#8020) Broken out into separate PR for ease of review and isolated test/tracking. 04 January 2024, 01:49:56 UTC
d2da007 Fix for top-of-tree LLVM (Fix #8017) (#8018) Fix for top-of-tree LLVM 03 January 2024, 20:05:37 UTC
8024bdc Don't add ring_buffer semaphores if the function is not scheduled as async (#8015) Don't add ring_buffer semaphores if the function is not scheduled as asybc Co-authored-by: Steven Johnson <srj@google.com> 02 January 2024, 22:52:53 UTC
6f26b04 Change startswith -> starts_with (#8013) startswith was deprecated in llvm/lvm-project#75491, which means that Halide fails to compile using LLVM 18 (deprecation warning). 02 January 2024, 18:27:51 UTC
61b8d38 Scheduling directive to support ring buffering (#7967) * Half-plumbed * Revert "Half-plumbed" This reverts commit eb9dd02c6c607f0b49c95258ae67f58fe583ff44. * Interface for double buffer * Update Provides, Calls and Realizes for double buffering * Proper sync for double buffering * Use proper name for the semaphor and use correct initial value * Rename the class * Pass expression for index * Adds storage for double buffering index * Use a separate index to go through the double buffer * Failing test * Better handling of hoisted storage in all of the async-related passes * New test and clean-up the generated IR * More tests * Allow double buffering without async and add corresponding test * Filter out incorrect double_buffer schedules * Add tests to the cmake files * Clean up * Update the comment * Clean up * Clean up * Update serialization * complete_x86_target() should enable F16C and FMA when AVX2 is present (#7971) All known AVX2-enabled architectures definitely have these features. * Add two new tail strategies for update definitions (#7949) * Add two new tail strategies for update definitions * Stop printing asm * Update expected number of partitions for Partition::Always * Add a comment explaining why the blend safety check is per dimension * Add serialization support for the new tail strategies * trigger buildbots * Add comment --------- Co-authored-by: Steven Johnson <srj@google.com> * Add appropriate mattrs for arm-32 extensions (#7978) * Add appropriate mattrs for arm-32 extensions Fixes #7976 * Pull clauses out of if * Move canonical version numbers into source, not build system (#7980) (#7981) * Move canonical version numbers into source, not build system (#7980) * Fixes * Silence useless "Insufficient parallelism" autoscheduler warning (#7990) * Add a notebook with a visualization of the aprrox_* functions and their errors (#7974) * Add a notebook with a visualization of the aprrox_* functions and their errors * Fix spelling error * Make narrowing float->int casts on wasm go via wider ints (#7973) Fixes #7972 * Fix handling of assert statements whose conditions get vectorized (#7989) * Fix handling of assert statements whose conditions get vectorized * Fix test name * Fix all "unscheduled update()" warnings in our code (#7991) * Fix all "unscheduled update()" warnings in our code And also fix the Mullapudi scheduler to explicitly touch all update stages. This allows us to mark this warning as an error if we so choose. * fixes * fixes * Update recursive_box_filters.cpp * Silence useless 'Outer dim vectorization of var' warning in Mullapudi… (#7992) Silence useless 'Outer dim vectorization of var' warning in Mullapudi scheduler * Add a tutorial for async and double_buffer * Renamed double_buffer to ring_buffer * ring_buffer() now expects an extent Expr * Actually use extent for ring_buffer() * Address some of the comments * Provide an example of the code structure for producer-consumer async example * Comments updates * Fix clang-format and clang-tidy * Add Python binding for Func::ring_buffer() * Don't use a separate index for ring buffer + add a new test * Rename the tests * Clean up the old name * Add & * Move test to the right folder * Move expr * Add comments for InjectRingBuffering * Improve ring_buffer doc * Fix comments * Comments * A better error message * Mention that extent is expected to be a positive integer * Add another code structure and explain how the indices for ring buffer are computed * Expand test comments * Fix spelling --------- Co-authored-by: Steven Johnson <srj@google.com> Co-authored-by: Andrew Adams <andrew.b.adams@gmail.com> 19 December 2023, 22:14:05 UTC
6bcb695 Update Halide version in setup.py to 17.0.0 (#8010) 15 December 2023, 00:27:56 UTC
6d29ad5 Add missing Python bindings for various recent additions to Func and Stage (#8002) * Add missing Python bindings for various recent additions to Func and Stage We have been sloppy about maintaining these. Also added a bit of testing. * Update PyEnums.cpp 13 December 2023, 17:02:37 UTC
3d5cf40 Inject profiling for function calls to 'halide_copy_to_host' and 'halide_copy_to_device'. (#7913) * Inject profiling for function calls to 'halide_copy_to_host' and 'halide_copy_to_device'. * WIP: I get segfaults. The device_interface pointer is bogus. * Figured it out... * Allow global sync on d3d12. * Cleanly time all buffer copies as well. * Cleanup old comment. * Following Andrews suggestion for suffixing buffer copies in the profiler. * Sort the profiler report lines into three sections: funcs, buffer copy to device, and buffer copy to host. * Inject profiling for function calls to 'halide_copy_to_host' and 'halide_copy_to_device'. * WIP: I get segfaults. The device_interface pointer is bogus. * Figured it out... * Allow global sync on d3d12. * Cleanly time all buffer copies as well. * Cleanup old comment. * Following Andrews suggestion for suffixing buffer copies in the profiler. * Sort the profiler report lines into three sections: funcs, buffer copy to device, and buffer copy to host. * Attempt to fix output parsing. * Fix crash for copy_to_device * halide_device_sync_global(NULL) -> success * Fixed the buffer copy bug. Added a new test that will cause buffer copies in two directions within the compiled pipeline. This will catch this better in the future. Tweaked the profile report section header printing. * Clang-format, my dear friend... 12 December 2023, 17:50:56 UTC
357e646 Do some basic validation of Target Features (#7986) (#7987) * Do some basic validation of Target Features (#7986) * Update Target.cpp * Update Target.cpp * Fixes * Update Target.cpp * Improve error messaging. * format * Update Target.cpp 08 December 2023, 19:17:30 UTC
9c099c2 Teach unrolling to exploit conditions in enclosing ifs (#7969) * Teach unrolling to exploit conditions in enclosing ifs Fixes #7968 * Handle vectorization as well * Remove unused usings * Add missing print 08 December 2023, 17:53:04 UTC
9643518 Add join_strings() call and use it from mattrs() (#7997) * Add join_strings() call and use it from mattrs() This is a super-nit kind of fix, but the fact that we had rerolled a join-strings algo in a half-dozen places made my teeth hurt, so I decided to fix it: - Add join_strings() to Util.h - revise the mattrs() calls to use it instead of the janky mess they used This doesn't move the needle on code size or speed but it is less weird. Probably other places we could/should use this too. (Does C++20 have join/split strings in the std library yet? If not, why not?) * Update Util.h * Update Util.h * clang-tidy 08 December 2023, 17:50:32 UTC
19c1c81 Make wasm +sign-ext and +nontrapping-fptoint the default (#7995) * Make wasm +sign-ext and +nontrapping-fptoint the default These have been supported in ~all wasm runtimes for a while now, and +nontrapping-fptoint in particular can make a big performance difference. We should enable these by default, and add a new backdoor (wasm_mvponly) for code paths that need to use the original wasm Minimum Viable Product spec only. * Update simd_op_check_wasm.cpp 08 December 2023, 16:50:01 UTC
5aa891a Silence useless 'Outer dim vectorization of var' warning in Mullapudi… (#7992) Silence useless 'Outer dim vectorization of var' warning in Mullapudi scheduler 07 December 2023, 18:03:06 UTC
df36139 Fix all "unscheduled update()" warnings in our code (#7991) * Fix all "unscheduled update()" warnings in our code And also fix the Mullapudi scheduler to explicitly touch all update stages. This allows us to mark this warning as an error if we so choose. * fixes * fixes * Update recursive_box_filters.cpp 07 December 2023, 18:02:42 UTC
83febb0 Fix handling of assert statements whose conditions get vectorized (#7989) * Fix handling of assert statements whose conditions get vectorized * Fix test name 07 December 2023, 17:46:27 UTC
d1ecc1f Make narrowing float->int casts on wasm go via wider ints (#7973) Fixes #7972 07 December 2023, 16:06:57 UTC
6e57d6c Add a notebook with a visualization of the aprrox_* functions and their errors (#7974) * Add a notebook with a visualization of the aprrox_* functions and their errors * Fix spelling error 07 December 2023, 16:06:31 UTC
9f6ec17 Silence useless "Insufficient parallelism" autoscheduler warning (#7990) 07 December 2023, 00:59:53 UTC
17b7366 Move canonical version numbers into source, not build system (#7980) (#7981) * Move canonical version numbers into source, not build system (#7980) * Fixes 06 December 2023, 23:03:14 UTC
209ec02 Add appropriate mattrs for arm-32 extensions (#7978) * Add appropriate mattrs for arm-32 extensions Fixes #7976 * Pull clauses out of if 05 December 2023, 22:15:23 UTC
17578a1 Add two new tail strategies for update definitions (#7949) * Add two new tail strategies for update definitions * Stop printing asm * Update expected number of partitions for Partition::Always * Add a comment explaining why the blend safety check is per dimension * Add serialization support for the new tail strategies * trigger buildbots * Add comment --------- Co-authored-by: Steven Johnson <srj@google.com> 05 December 2023, 18:08:08 UTC
dea2cf7 complete_x86_target() should enable F16C and FMA when AVX2 is present (#7971) All known AVX2-enabled architectures definitely have these features. 03 December 2023, 21:34:02 UTC
674e6cc Disallow async nestings that violate read after write dependencies (#7868) * Disallow async nestings that violate read after write dependencies Fixes #7867 * Add test * Add another failure case, and improve error message * Add some more tests * Update test * Add new test to cmakelists * Fix for llvm trunk * Always acquire the folding semaphore, even if unused * Skip async_order test under wasm * trigger buildbots --------- Co-authored-by: Volodymyr Kysenko <vksnk@google.com> Co-authored-by: Steven Johnson <srj@google.com> 01 December 2023, 21:18:20 UTC
4fc2a7d Handle many more intrinsics in Bounds.cpp (#7823) * Handle many more intrinsics in Bounds.cpp This addresses many (but not all) of the `signed integer overflow` issues we're seeing in Google due to https://github.com/halide/Halide/pull/7814 -- a lot of the issues seems to be in code that uses intrinsics that had no handling in value bounds checking, so the bounds were naively large and overflowed. - Most of the intrinsics from FindIntrinsics.h weren't handled; now they all are (most by lowering to other IR, though the halving_add variants were modeled directly because the bitwise ops don't mesh well) - strict_float() is just a pass-through - round() is a best guess (basically, if bounds exist, expand by one as a worst-case) There are definitely others we should handle here... trunc/floor/ceil probably? * Fix round() and strict_float() handling * Update Bounds.cpp * Fixes? * trigger buildbots * Revert saturating_cast handling * Update Bounds.cpp --------- Co-authored-by: Andrew Adams <andrew.b.adams@gmail.com> 01 December 2023, 00:31:48 UTC
3136819 [serialization] Add Halide version and serialization version in serialization format (#7905) * halide version * serialization version * format * Fix Makefile * trigger buildbots --------- Co-authored-by: Andrew Adams <andrew.b.adams@gmail.com> Co-authored-by: Steven Johnson <srj@google.com> 30 November 2023, 17:59:30 UTC
ad5dd20 Update instructions to include generated schedules (#7928) The generated schedule from the auto-scheduler can no longer be copy-n-pasted to the Generater source code. Update the tutorial to show how the generated schedules can be appled and included into Generator. Use case: version control and fine tuning of schedules. Resolves: #7148 See also: #7900 Co-authored-by: Steven Johnson <srj@google.com> 29 November 2023, 17:31:12 UTC
bf5f206 Remove inadvertently added generated file (#7966) 29 November 2023, 17:19:03 UTC
68f2bbd Revise Flatbuffers codegen style (#7964) * Rename the generated Flatbuffer headers The Blaze/Bazel rules for Flatbuffers are inflexible and require this naming pattern :-/ * Also update the flags to flatc * Fix lots of stuff * exclude from clang-format * ignore again 29 November 2023, 17:06:51 UTC
b7468af Attempt to fix nested vectorization gemm performance on new build bot (#7959) * Better (simpler) schedules for nested vectorization gemm * Remove early return * Empty-Commit --------- Co-authored-by: Steven Johnson <srj@google.com> 29 November 2023, 16:39:41 UTC
5175d16 Make the fast inverse test throughput-limited rather than latency-limited (#7958) Co-authored-by: Steven Johnson <srj@google.com> 28 November 2023, 21:59:21 UTC
2b23e07 Return values from stub functions in Deserialization (#7963) Needed to prevent "error: non-void function does not return a value" 28 November 2023, 16:05:52 UTC
9ce5fd6 [WebGPU] Update to latest native headers (#7932) * [WebGPU] Update to latest native headers * Update mini_webgpu.h with latest version from Dawn * Document this process * Remove an argument from wgpuQueueOnSubmittedWorkDone Fixes #7581 * [WebGPU] Note that wgpu is not yet supported * [WebGPU] Add https:// to external links in README * update to commit b5d38fc7dc2a20081312c95e379c4a918df8b7d4 * Update mini_webgpu.h --------- Co-authored-by: Steven Johnson <srj@google.com> 28 November 2023, 14:54:03 UTC
976ea0b [serialization] Serialize stub definitions of external parameters. (#7926) * Serialize stub definitions of external parameters. Add deserialize_parameter methods to allow the user to only deserialize the mapping of external parameters (and remap them to their own user parameters) prior to deserializing the full pipeline definition. * Clang tidy/format pass --------- Co-authored-by: Derek Gerstmann <dgerstmann@adobe.com> 28 November 2023, 00:55:41 UTC
8c28a73 Improve code size and compile time for local laplacian app (#7927) Improve code size and compile time for local laplacian and interpolate apps This reduces compile time for the manual local laplacian schedule from 4.9s to 2.2s, and reduces code size from 126k to 82k Most of the reduction comes from avoiding a pointless boundary condition in the output Func. A smaller amount comes from avoiding loop partitioning using RoundUp and Partition::Never. The Partition::Never calls are responsible for a 3% reduction in code size and compile times by themselves. This has basically no effect on runtime. It seems to reduce it very slightly, but it's in the noise. 21 November 2023, 23:27:21 UTC
04c21bf Always call lower_round_to_nearest_ties_to_even on arm32 (#7957) 21 November 2023, 21:56:45 UTC
f5a4e49 Add missing condition to if renesting rule (#7952) * Add missing condition to if renesting rule * Add test * clang-format 21 November 2023, 19:23:44 UTC
ad0f24e Track likely values through lets in loop partitioning (#7930) * Track likely values through lets in loop partitioning Fixes #7929 Improves runtime of lens_blur app by ~20% * Add uncaptured likely tags to selects in boundary condition helpers Now that we look through lets, we end up in more situations where both sides have a captured likely. * Better comments 16 November 2023, 00:49:35 UTC
0f65435 More targeted fix for gather instructions being slow on intel processors (#7945) See https://github.com/llvm/llvm-project/issues/70259 14 November 2023, 19:48:34 UTC
f0cdd50 Delete unused function (#7925) 14 November 2023, 18:23:14 UTC
f25af7f Remove the deprecated API `llvm::Type::getInt8PtrTy` usage. (#7937) This API is removed in LLVM trunk now https://github.com/llvm/llvm-project/commit/7b9d73c2f90c0ed8497339a16fc39785349d9610. 09 November 2023, 05:27:20 UTC
3b4dc33 Make sure all Halide arithmetic scalar types can be named from the Generator interface. (#7934) * Make sure all Halide arithmetic scalar types can be named from the Generator interface. Specifically adding 64-bit signed and unsigned integers and making sure float16 and bfloat16 are fully supported and documented. Add a simple test for all the type names. (Don't use float16 and bfloat16 in the arithmetic as they do not compile with the C++ backend. The name mapping should still be tested but the types passed do not seem to be checked as the values are not used.) 07 November 2023, 21:23:31 UTC
256c2f2 Add missing serialization of Dim::partition_policy (#7935) add missing serialization of Dim::partition_policy 07 November 2023, 17:57:21 UTC
e5bf7ab Add special build for testing serialization via a serialization roundtrip in JIT compilation and fix serialization leaks (#7763) * add back JIT testing, enclosed in #ifdef blocks * fix typo * nits * WITH_SERIALIZATION_JIT->WITH_SERIALIZATION_JIT_ROUNDTRIP_TESTING * fix self-reference leaks: now uses weak function ptr in reverse function mappings * Move clang-tidy checks back to Linux Recent changes in the GHA runners for macOS don't play well with clang-tidy; rather than sink any more time into debugging it, I'm going to revert the relevant parts of #7746 so that it runs on the less-finicky Linux runners instead. * bogus * Update Generator.cpp * Update Generator.cpp * call copy_to_host before serializing buffers * throw an error if we serialize on-device buffer * Skip specialize_to_gpu * Update Pipeline.cpp * Skip two more tests * use serialize to memory during jit testing * makefile update * makefile fix * skip the tutorial if flatc is not there * fix * fix signature * fix makefile * trigger buildbot --------- Co-authored-by: Steven Johnson <srj@google.com> 06 November 2023, 23:36:56 UTC
e5ee753 Remove use of dynamic_cast. (#7931) Remove use of dynamic_cast to preserve compiling the Halide compiler without RTTI. 03 November 2023, 00:27:03 UTC
1865101 Loop Partitioning Policy through Stage::partition(VarOrRVar, LoopPartitionPolicy) (#7914) * Loop Partitioning Policy through Stage::partition(VarOrRVar, LoopPartitionPolicy) * Renamed LoopPartitionPolicy to Partition. Added tests in boundary_conditions to verify correctness of the code with and without loop partitioning. Added tests that validates that disabling loop partitioning works. * Include error-test for when partitioning is always requested, but none was performed. 31 October 2023, 17:38:55 UTC
0134c40 Improve the error message if you store_at without a compute_at (#7923) * Improve an error message * Clean up * Update messages 30 October 2023, 21:39:17 UTC
97573c6 Scheduling directive to hoist the storage of the function (#7915) * Minimal hoist_storage plumbing * HoistedStorage placeholder IR node * Basic hoist_storage test * Fully plumb through the HoistedStorage node * IRPrinter for HoistedStorage * Insert hoisted storage at the correct loop level * Progress * Formatted * Move out common code for creating Allocate node * Format * Emit Allocate at the HoistedStorage site * Collect all dependant vars * Basic test working * Progress * Substitute lets into allocation extents instead of lifting stuff * Infer bounds for the extends dependant on loop variables * Update tests * Remove old code * Remove old code * Better tests * More tests * Validate schedules with hoist_storage * Error test * Fix stupid mistake * More tests * Remove debug prints * Better errors * Add missing handler for inlined functions * Format * Comments * Format * Add some missing visit handlers * New line * Fix comment * Luckily we only have two build systems * Adds hoist_storage_root * Comment for IR node * Serialization support for HoistedStorage * Handle hoist_storage fo tuples * Handle multiple realize nodes * Move assert up * Better error message * Better loop bounds * Format * Updated error message * Happy clang-tidy happy me * An error message when compute is inlined, but store is not inlined * Only mutate lets which are needed * Update apps to use hoist_storage Some very minor performance gains, but mostly in the noise. Also switched the apps makefiles to emit stmt html by default instead of stmt, to take advantage of the new and improved stmt html. * Switch to stack of hoisted storages * Limit scope of lets for expansion * Break early * Skip substitute_in_all_lets * Re-use expanded min/extents * WebAssembly JIT does not support custom allocators * Change debug level to get more info about segfault * More debugging prints * Let's try aligned malloc * Revert "Change debug level to get more info about segfault" This reverts commit a5a689be8c6ad351674f3ced3bbf542335f91d75. * Revert "More debugging prints" This reverts commit bb6b8c1313cbdb9f355df20fd203ee02d485042e. --------- Co-authored-by: Andrew Adams <andrew.b.adams@gmail.com> 27 October 2023, 21:21:26 UTC
ed357c2 Fix bug mentioned by @antonysigma. (#7916) 27 October 2023, 17:23:42 UTC
cf01e97 Turn off SLP vectorization for avx512 only (#7918) Fixes #7917 27 October 2023, 17:22:31 UTC
fffb8bd Fix read-after-write hazard analysis in storage folding (#7910) Explicitly mark which loops get loop-carry-dependencies inserted by sliding window to assist storage folding. Storage folding needs to know about this so it doesn't try to fold in a way that invalidates these read-after-write dependencies. It currently tries to prove the absence of hazards with box_contains(box_provided, box_required), but this is sometimes incorrect because box_provided could be conservatively large, and the code it analyses might not actually provide (store to) all the required (loaded from) values. It's simpler for sliding window to just tell storage folding when it inserts loop-carry-dependencies, and this is most simply done directly in the IR itself. Fixes #7909 24 October 2023, 17:23:49 UTC
d023065 Hotfix reinterpret HTML (#7912) Hotfix reinterpret 22 October 2023, 19:20:47 UTC
739053d Check returned result in the test (#7911) * Check returned result of Callable * Format 22 October 2023, 19:11:00 UTC
872264c Static analysis (MSVC) fixes for device_buffer_utils.h (#7904) * Static analysis (MSVC) fixes for device_buffer_utils.h * clang-format happiness * signed integer cast 20 October 2023, 21:33:13 UTC
2918854 Highlight groups for the HTML Stmt file and tooltips to reveal types. (#7887) * Highlight groups for the HTML Stmt file and tooltips to reveal types. * Cleaned up JS using eslint. * Remove commented code. 20 October 2023, 17:37:38 UTC
back to top