a4172e1 | Steven Johnson | 09 February 2024, 17:18:03 UTC | Merge remote-tracking branch 'origin/abadams/parallel_simd_op_check' into srj/test8078 | 09 February 2024, 17:18:03 UTC |
223a44b | Andrew Adams | 09 February 2024, 06:17:34 UTC | Revert accidental serialization | 09 February 2024, 06:17:34 UTC |
596300f | Andrew Adams | 09 February 2024, 00:55:37 UTC | Make float16_t neon op check test at least build | 09 February 2024, 00:55:37 UTC |
b0a2b7f | Andrew Adams | 08 February 2024, 23:17:48 UTC | Deep-copy the LoopLevels | 08 February 2024, 23:17:48 UTC |
12ed341 | Andrew Adams | 08 February 2024, 23:17:41 UTC | Use separate imageparams per task | 08 February 2024, 23:17:41 UTC |
8400ad9 | Andrew Adams | 08 February 2024, 01:54:38 UTC | Fix when we're willing to run x86 code in simd_op_check | 08 February 2024, 01:54:38 UTC |
0b9dd99 | Andrew Adams | 08 February 2024, 01:50:07 UTC | Remove debug print | 08 February 2024, 01:50:07 UTC |
3f281f0 | Andrew Adams | 08 February 2024, 01:49:51 UTC | The FIXME is actually fine | 08 February 2024, 01:49:51 UTC |
60ce495 | Andrew Adams | 08 February 2024, 01:49:00 UTC | Parallelize some tests This reduces the time taken to run all correctness tests from 8:15 to 3:15 on my machine. | 08 February 2024, 01:49:00 UTC |
77fc71b | Steven Johnson | 07 February 2024, 18:10:25 UTC | Merge branch 'xtensa-codegen' of https://github.com/halide/Halide into xtensa-codegen | 07 February 2024, 18:10:25 UTC |
3285236 | Steven Johnson | 07 February 2024, 18:10:20 UTC | Merge branch 'main' into xtensa-codegen | 07 February 2024, 18:10:20 UTC |
ea03af7 | Misha Gutman | 07 February 2024, 18:09:41 UTC | [xtensa] Added int32<->float vector reinterprets (#8070) | 07 February 2024, 18:09:41 UTC |
39e5c08 | Andrew Adams | 07 February 2024, 17:49:06 UTC | Better validation of gpu schedules (#8068) * Update makefile to use test/common/terminate_handler.cpp This means we actually print error messages when using exceptions and the makefile * Better validate of GPU schedules GPU loop constraints were checked in two different places. Checking them in ScheduleFunctions was incorrect because it didn't consider update definitions and specializations. Checking them in FuseGPUThreadLoops was too late, because the Var names have gone (they've been renamed to things like __thread_id_x). Furthermore, some problems were internal errors or runtime errors when they should have been user errors. We allowed 4d thread and block dimensions, but then hit an internal error. This PR centralizes checking of GPU loop structure in CanonicalizeGPUVars and adds more helpful error messages that print the problematic loop structure. E.g: ``` Error: GPU thread loop over f$8.s0.v0 is inside three other GPU thread loops. The maximum number of nested GPU thread loops is 3. The loop nest is: compute_at for g$8: for g$8.s0.v7: for g$8.s0.v6: for g$8.s0.v5: for g$8.s0.v4: gpu_block g$8.s0.v3: gpu_block g$8.s0.v2: gpu_thread g$8.s0.v1: gpu_thread g$8.s0.v0: store_at for f$8: compute_at for f$8: gpu_thread f$8.s0.v1: gpu_thread f$8.s0.v0: ``` Fixes the bug found in #7946 * Delete dead code * Actually clear the ostringstream | 07 February 2024, 17:49:06 UTC |
37153a9 | Derek Gerstmann | 07 February 2024, 17:43:58 UTC | Fix bool conversion bug in Vulkan code generator (#8067) * Fix bug in Vulkan code generator that was incorrectly passing the address of a byte vector, instead of its contents to builder.declare_constant() * Add bool_predicate_cast correctness test to verify bool conversion for Vulkan codegen works as expected --------- Co-authored-by: Derek Gerstmann <dgerstmann@adobe.com> | 07 February 2024, 17:43:58 UTC |
78a0762 | Prasoon Mishra | 07 February 2024, 17:41:51 UTC | Add hexagon_benchmarks app for CMake builds (#8069) * Add hexagon_benchmarks app for CMake builds * Removed unnecessary -lc++abi flag from GCC build | 07 February 2024, 17:41:51 UTC |
84fe565 | Steven Johnson | 07 February 2024, 17:41:21 UTC | Outsmart the LLVM optimizer (#8073) The old definitions of bool_1, bool_2, bool_3 in simd_op_check_x86 (etc) all referred to the same entry in in_f32; as of https://github.com/llvm/llvm-project/pull/76367, the LLVM optimizer is smart enough to realize that (eg) bool1 != bool2 by construction, and optimizes away the code that tests their conditions, such as the one for andps and orps. Initing them from different locations is enough to outsmart the compiler. (bug was only noticed in the x86 test, but I updated the other tests to guard against future improvements there too.) | 07 February 2024, 17:41:21 UTC |
665804c | Steven Johnson | 06 February 2024, 23:34:29 UTC | Don't require Halide_WebGPU when using wasm (#8063) (#8065) * Don't require Halide_WebGPU when using wasm (#8063) * trigger buildbots | 06 February 2024, 23:34:29 UTC |
93bff95 | Teo | 06 February 2024, 23:34:02 UTC | add unsafe_promise_clamped (#8071) add unsafe_promise_clamp | 06 February 2024, 23:34:02 UTC |
feb0b93 | Misha Gutman | 06 February 2024, 19:22:14 UTC | [xtensa] Added int32 by int16 vector division + maintenance work (#8058) * [xtensa] Renamed SEL instructions to semantically correct * [xtensa] updated the types definitions in halide_xtensa_div32 * [xtensa] added int32 by int16 vector division * [xtensa] replaced convert int16->int32_x2->int16 to two interleavs for better efficiency | 06 February 2024, 19:22:14 UTC |
80e2081 | Andrew Adams | 05 February 2024, 22:25:05 UTC | Update makefile to use test/common/terminate_handler.cpp (#8066) This means we actually print error messages when using exceptions and the makefile | 05 February 2024, 22:25:05 UTC |
8c7d78c | Volodymyr Kysenko | 01 February 2024, 19:14:09 UTC | Fix warning | 01 February 2024, 19:14:09 UTC |
35e97c6 | Volodymyr Kysenko | 01 February 2024, 19:11:51 UTC | Merge branch 'main' into xtensa-codegen | 01 February 2024, 19:11:51 UTC |
f32f95e | Misha Gutman | 01 February 2024, 18:40:58 UTC | [xtensa] added vector load_predicated for f16 (#8057) | 01 February 2024, 18:40:58 UTC |
e2448fe | Andrew Adams | 01 February 2024, 17:46:10 UTC | Fix type error in VectorizeLoops (#8055) | 01 February 2024, 17:46:10 UTC |
9e17fc7 | Misha Gutman | 29 January 2024, 18:12:01 UTC | [xtensa] Added float16 interleaves (#8050) | 29 January 2024, 18:12:01 UTC |
47378ee | Steven Johnson | 29 January 2024, 01:28:13 UTC | Enable `bugprone-switch-missing-default-case` (#8048) * Upgrade clang-format and clang-tidy to use LLVM 17 * trigger buildbots * trigger buildbots * trigger buildbots * trigger buildbots * Enable `bugprone-switch-missing-default-case` ...and fix existing warnings. * Update .clang-tidy * Update Parameter.cpp * Update .clang-tidy * Update .clang-tidy * Update .clang-tidy * Update .clang-tidy * Update CPlusPlusMangle.cpp | 29 January 2024, 01:28:13 UTC |
4b2d211 | Steven Johnson | 27 January 2024, 00:33:24 UTC | Upgrade clang-format and clang-tidy to use LLVM 17 (#8042) * Upgrade clang-format and clang-tidy to use LLVM 17 * trigger buildbots * trigger buildbots * trigger buildbots * trigger buildbots | 27 January 2024, 00:33:24 UTC |
45d7850 | Andrew Adams | 26 January 2024, 20:01:41 UTC | Track whether or not let expressions failed to solve in solver (#7982) * Track whether or not let expressions failed to solve in solver After mutating an expression, the solver needs to know two things: 1) Did the expression contain the variable we're solving for 2) Was the expression successfully "solved" for the variable. I.e. the variable only appears once in the leftmost position. We need to know this to know property 1 of any subexpressions (i.e. does the right child of the expression contain the variable). This drives what transformations we do in ways that are guaranteed to terminate and not take exponential time. We were tracking property 1 through lets but not property 2, and this meant we were doing unhelpful transformations in some cases. I found a case in the wild where this made a pipeline take > 1 hour to compile (I killed it after an hour). It may have been in an infinite transformation loop, or it might have just been exponential. Not sure. * Remove surplus comma * Fix use of uninitialized value that could cause bad transformation | 26 January 2024, 20:01:41 UTC |
3657cf5 | Andrew Adams | 26 January 2024, 17:26:12 UTC | Fix bounds_of_nested_lanes (#8039) * Fix bounds_of_nested_lanes bounds_of_nested_lanes assumed that one layer of nested vectorization could be removed at a time. When faced with the expression: min(ramp(x8(a), x8(b), 5), x40(27)) It panicked, because on the left hand side it reduced the bounds to x8(a) ... x8(a) + x8(b) * 4, and on the right hand side it reduced the bounds to 27. It then attempted to take a min of mismatched types. In general we can't assume that binary operators on nested vectors have the same nesting structure on both sides, so I just rewrote it to reduce directly to a scalar. Fixes #8038 | 26 January 2024, 17:26:12 UTC |
4590a09 | Andrew Adams | 26 January 2024, 01:07:40 UTC | Fix for llvm trunk: Force-include more runtime types (#8045) * Fix for llvm trunk: Force-include more runtime types * Include the force-include-types module first * Fix comment * Expand comment | 26 January 2024, 01:07:40 UTC |
c1923f3 | Steven Johnson | 24 January 2024, 23:53:28 UTC | HALIDE_VERSION_MAJOR -> 18 (#8044) | 24 January 2024, 23:53:28 UTC |
6177e51 | Steven Johnson | 24 January 2024, 20:04:19 UTC | Update Halide version to 18 (#8043) | 24 January 2024, 20:04:19 UTC |
9b9dfaf | Andrew Adams | 24 January 2024, 19:12:17 UTC | Update Makefile for llvm 19 (#8040) | 24 January 2024, 19:12:17 UTC |
90e909d | Steven Johnson | 24 January 2024, 18:44:47 UTC | Allow LLVM 19 in CMake (#8041) | 24 January 2024, 18:44:47 UTC |
958037a | Misha Gutman | 23 January 2024, 17:19:16 UTC | [xtensa] Added efficient gather load to Q7 (#8026) Added efficient gather load to Q7 | 23 January 2024, 17:19:16 UTC |
e0e9f63 | Steven Johnson | 22 January 2024, 21:43:00 UTC | Tweak the Printer code in runtime for smaller code (#8023) * Tweak the Printer code in runtime for smaller code TL;DR: template expansion meant that we had more replicated code than expected from the inline expansion of code in Printer and friends. Restructured and added NEVER_INLINE to try to make the call sites as small as possible. It's a modest code-size savings but nonzero... e.g., the linux-x86-64 .o output from correct_cross_compilation drops from 164280 bytes to 162936 bytes. * Update printer.h * debug * Update HalideTestHelpers.cmake * Update printer.h * fixes | 22 January 2024, 21:43:00 UTC |
05d4412 | Volodymyr Kysenko | 19 January 2024, 23:33:25 UTC | Skip the double buffering for DMA if the allocation and compute is at the same level | 19 January 2024, 23:33:25 UTC |
4a3378f | Misha Gutman | 17 January 2024, 23:05:41 UTC | [xtensa] adjusted the tests to be launchable for Q8 (#8011) * [xtensa] adjusted the tests to be launchable for Q8 * Style fixes + C++-17 compliance | 17 January 2024, 23:05:41 UTC |
e5d4a57 | Steven Johnson | 17 January 2024, 23:05:20 UTC | Fix clang-tidy errors in InjectDmaTransfer (#8033) | 17 January 2024, 23:05:20 UTC |
a925471 | Steven Johnson | 17 January 2024, 19:28:50 UTC | Merge branch 'main' into xtensa-codegen | 17 January 2024, 19:28:50 UTC |
22f9bb9 | Steven Johnson | 17 January 2024, 16:26:43 UTC | Add test for #8029 (#8032) Tweak correctness_float16_t so that it uses one of the transcendal functions (sqrt) that were missing in Metal. | 17 January 2024, 16:26:43 UTC |
3a77204 | Steven Johnson | 17 January 2024, 15:35:07 UTC | Require LLVM >= 16.0 (#8003) * Require LLVM >= 16.0 Per policy, we only support top-of-tree LLVM, plus two versions back; let's update to require LLVM >= 16, and drop workarounds for older versions. * LLVM_VERSION < 170 | 17 January 2024, 15:35:07 UTC |
d2eed57 | Steven Johnson | 16 January 2024, 20:00:36 UTC | Fix build breakage for wasm targets (#8031) Update HalideTestHelpers.cmake | 16 January 2024, 20:00:36 UTC |
8d3c12e | Mike Woodworth | 16 January 2024, 18:55:53 UTC | adds mappings for f16 variants of halide float math (#8029) * adds mappings for f16 variants of halide float math * fix clang format errors * trigger buildbots --------- Co-authored-by: Steven Johnson <srj@google.com> | 16 January 2024, 18:55:53 UTC |
91b063d | Volodymyr Kysenko | 09 January 2024, 04:57:15 UTC | Stronger chain detection in LoopCarry pass (#8016) * Stronger chain detection in LoopCarry * Make sure that types are the same * Add a comment * Run CSE before calling can_prove * Test for loop carry * clang-tidy * Add missing override * Update comments | 09 January 2024, 04:57:15 UTC |
cdebeb8 | Tom Westerhout | 09 January 2024, 01:33:08 UTC | Fix -Wstrict-prototype warnings in HalideRuntime.h (#8027) When HalideRuntime.h is included in a C file, funtions that are declared with `()` instead of `(void)` for their arguments change meaning. These may cause issues downstream because different code is generated. | 09 January 2024, 01:33:08 UTC |
21accad | Steven Johnson | 04 January 2024, 17:04:34 UTC | Set warnings on tests as well as src (#8022) * Don't use variable-length arrays There was a rogue use of VLAs (an extension we don't want to use) in one of the runtime tests. Fixed the test. I'll follow up with a separate PR to ensure this warning is enabled everywhere to flush out other usages. * Set warnings on tests as well as src | 04 January 2024, 17:04:34 UTC |
daf011d | Steven Johnson | 04 January 2024, 17:04:18 UTC | Don't use variable-length arrays (#8021) There was a rogue use of VLAs (an extension we don't want to use) in one of the runtime tests. Fixed the test. I'll follow up with a separate PR to ensure this warning is enabled everywhere to flush out other usages. | 04 January 2024, 17:04:18 UTC |
b661c8d | Zalman Stern | 04 January 2024, 01:49:56 UTC | Quick fix for crash that is occurring in SVE2 tests. (#8020) Broken out into separate PR for ease of review and isolated test/tracking. | 04 January 2024, 01:49:56 UTC |
d2da007 | Steven Johnson | 03 January 2024, 20:05:37 UTC | Fix for top-of-tree LLVM (Fix #8017) (#8018) Fix for top-of-tree LLVM | 03 January 2024, 20:05:37 UTC |
846ac52 | Volodymyr Kysenko | 03 January 2024, 04:47:14 UTC | Schedule ahead DMA copy if ring_buffer is defined | 03 January 2024, 04:47:14 UTC |
b12448e | Volodymyr Kysenko | 03 January 2024, 03:48:15 UTC | Add runtime function to wait for specific dma transaction | 03 January 2024, 03:48:15 UTC |
76d8e37 | Volodymyr Kysenko | 03 January 2024, 02:55:48 UTC | Swap loop_carry and align_loads | 03 January 2024, 02:55:48 UTC |
e2a58dd | Volodymyr Kysenko | 03 January 2024, 02:51:44 UTC | Merge branch 'main' into xtensa-codegen | 03 January 2024, 02:51:44 UTC |
8024bdc | Volodymyr Kysenko | 02 January 2024, 22:52:53 UTC | Don't add ring_buffer semaphores if the function is not scheduled as async (#8015) Don't add ring_buffer semaphores if the function is not scheduled as asybc Co-authored-by: Steven Johnson <srj@google.com> | 02 January 2024, 22:52:53 UTC |
6f26b04 | Tyler Hou | 02 January 2024, 18:27:51 UTC | Change startswith -> starts_with (#8013) startswith was deprecated in llvm/lvm-project#75491, which means that Halide fails to compile using LLVM 18 (deprecation warning). | 02 January 2024, 18:27:51 UTC |
43df465 | Aelphy | 21 December 2023, 19:39:18 UTC | [xtensa] undo disabling of ConvertGatherLoadIndex | 21 December 2023, 19:39:18 UTC |
aec7d7b | Aelphy | 21 December 2023, 19:30:22 UTC | [xtensa] index cast to uint16 for gath_load is at least sometimes wrong | 21 December 2023, 19:30:22 UTC |
5212015 | Volodymyr Kysenko | 19 December 2023, 22:21:45 UTC | Formatting fixes | 19 December 2023, 22:21:45 UTC |
4306918 | Volodymyr Kysenko | 19 December 2023, 22:18:33 UTC | Merge branch 'main' into xtensa-codegen | 19 December 2023, 22:18:33 UTC |
61b8d38 | Volodymyr Kysenko | 19 December 2023, 22:14:05 UTC | Scheduling directive to support ring buffering (#7967) * Half-plumbed * Revert "Half-plumbed" This reverts commit eb9dd02c6c607f0b49c95258ae67f58fe583ff44. * Interface for double buffer * Update Provides, Calls and Realizes for double buffering * Proper sync for double buffering * Use proper name for the semaphor and use correct initial value * Rename the class * Pass expression for index * Adds storage for double buffering index * Use a separate index to go through the double buffer * Failing test * Better handling of hoisted storage in all of the async-related passes * New test and clean-up the generated IR * More tests * Allow double buffering without async and add corresponding test * Filter out incorrect double_buffer schedules * Add tests to the cmake files * Clean up * Update the comment * Clean up * Clean up * Update serialization * complete_x86_target() should enable F16C and FMA when AVX2 is present (#7971) All known AVX2-enabled architectures definitely have these features. * Add two new tail strategies for update definitions (#7949) * Add two new tail strategies for update definitions * Stop printing asm * Update expected number of partitions for Partition::Always * Add a comment explaining why the blend safety check is per dimension * Add serialization support for the new tail strategies * trigger buildbots * Add comment --------- Co-authored-by: Steven Johnson <srj@google.com> * Add appropriate mattrs for arm-32 extensions (#7978) * Add appropriate mattrs for arm-32 extensions Fixes #7976 * Pull clauses out of if * Move canonical version numbers into source, not build system (#7980) (#7981) * Move canonical version numbers into source, not build system (#7980) * Fixes * Silence useless "Insufficient parallelism" autoscheduler warning (#7990) * Add a notebook with a visualization of the aprrox_* functions and their errors (#7974) * Add a notebook with a visualization of the aprrox_* functions and their errors * Fix spelling error * Make narrowing float->int casts on wasm go via wider ints (#7973) Fixes #7972 * Fix handling of assert statements whose conditions get vectorized (#7989) * Fix handling of assert statements whose conditions get vectorized * Fix test name * Fix all "unscheduled update()" warnings in our code (#7991) * Fix all "unscheduled update()" warnings in our code And also fix the Mullapudi scheduler to explicitly touch all update stages. This allows us to mark this warning as an error if we so choose. * fixes * fixes * Update recursive_box_filters.cpp * Silence useless 'Outer dim vectorization of var' warning in Mullapudi… (#7992) Silence useless 'Outer dim vectorization of var' warning in Mullapudi scheduler * Add a tutorial for async and double_buffer * Renamed double_buffer to ring_buffer * ring_buffer() now expects an extent Expr * Actually use extent for ring_buffer() * Address some of the comments * Provide an example of the code structure for producer-consumer async example * Comments updates * Fix clang-format and clang-tidy * Add Python binding for Func::ring_buffer() * Don't use a separate index for ring buffer + add a new test * Rename the tests * Clean up the old name * Add & * Move test to the right folder * Move expr * Add comments for InjectRingBuffering * Improve ring_buffer doc * Fix comments * Comments * A better error message * Mention that extent is expected to be a positive integer * Add another code structure and explain how the indices for ring buffer are computed * Expand test comments * Fix spelling --------- Co-authored-by: Steven Johnson <srj@google.com> Co-authored-by: Andrew Adams <andrew.b.adams@gmail.com> | 19 December 2023, 22:14:05 UTC |
a20dbef | Volodymyr Kysenko | 15 December 2023, 21:20:37 UTC | Interleave functions for fp16 | 15 December 2023, 21:20:37 UTC |
7d83daf | Volodymyr Kysenko | 15 December 2023, 20:54:36 UTC | Merge branch 'main' into xtensa-codegen | 15 December 2023, 20:54:36 UTC |
6bcb695 | Steven Johnson | 15 December 2023, 00:27:56 UTC | Update Halide version in setup.py to 17.0.0 (#8010) | 15 December 2023, 00:27:56 UTC |
e1e4193 | Steven Johnson | 13 December 2023, 17:16:13 UTC | Merge branch 'main' into xtensa-codegen | 13 December 2023, 17:16:13 UTC |
6d29ad5 | Steven Johnson | 13 December 2023, 17:02:37 UTC | Add missing Python bindings for various recent additions to Func and Stage (#8002) * Add missing Python bindings for various recent additions to Func and Stage We have been sloppy about maintaining these. Also added a bit of testing. * Update PyEnums.cpp | 13 December 2023, 17:02:37 UTC |
b1dd8de | Steven Johnson | 13 December 2023, 00:12:59 UTC | Merge branch 'main' into xtensa-codegen | 13 December 2023, 00:12:59 UTC |
3d5cf40 | Martijn Courteaux | 12 December 2023, 17:50:56 UTC | Inject profiling for function calls to 'halide_copy_to_host' and 'halide_copy_to_device'. (#7913) * Inject profiling for function calls to 'halide_copy_to_host' and 'halide_copy_to_device'. * WIP: I get segfaults. The device_interface pointer is bogus. * Figured it out... * Allow global sync on d3d12. * Cleanly time all buffer copies as well. * Cleanup old comment. * Following Andrews suggestion for suffixing buffer copies in the profiler. * Sort the profiler report lines into three sections: funcs, buffer copy to device, and buffer copy to host. * Inject profiling for function calls to 'halide_copy_to_host' and 'halide_copy_to_device'. * WIP: I get segfaults. The device_interface pointer is bogus. * Figured it out... * Allow global sync on d3d12. * Cleanly time all buffer copies as well. * Cleanup old comment. * Following Andrews suggestion for suffixing buffer copies in the profiler. * Sort the profiler report lines into three sections: funcs, buffer copy to device, and buffer copy to host. * Attempt to fix output parsing. * Fix crash for copy_to_device * halide_device_sync_global(NULL) -> success * Fixed the buffer copy bug. Added a new test that will cause buffer copies in two directions within the compiled pipeline. This will catch this better in the future. Tweaked the profile report section header printing. * Clang-format, my dear friend... | 12 December 2023, 17:50:56 UTC |
2c48ba8 | Volodymyr Kysenko | 08 December 2023, 21:58:37 UTC | Fix boolean Or for q8 and add support of boolean Add and Not | 08 December 2023, 21:58:37 UTC |
d84e3a6 | Steven Johnson | 08 December 2023, 19:18:04 UTC | Merge branch 'main' into xtensa-codegen | 08 December 2023, 19:18:04 UTC |
357e646 | Steven Johnson | 08 December 2023, 19:17:30 UTC | Do some basic validation of Target Features (#7986) (#7987) * Do some basic validation of Target Features (#7986) * Update Target.cpp * Update Target.cpp * Fixes * Update Target.cpp * Improve error messaging. * format * Update Target.cpp | 08 December 2023, 19:17:30 UTC |
9c099c2 | Andrew Adams | 08 December 2023, 17:53:04 UTC | Teach unrolling to exploit conditions in enclosing ifs (#7969) * Teach unrolling to exploit conditions in enclosing ifs Fixes #7968 * Handle vectorization as well * Remove unused usings * Add missing print | 08 December 2023, 17:53:04 UTC |
9643518 | Steven Johnson | 08 December 2023, 17:50:32 UTC | Add join_strings() call and use it from mattrs() (#7997) * Add join_strings() call and use it from mattrs() This is a super-nit kind of fix, but the fact that we had rerolled a join-strings algo in a half-dozen places made my teeth hurt, so I decided to fix it: - Add join_strings() to Util.h - revise the mattrs() calls to use it instead of the janky mess they used This doesn't move the needle on code size or speed but it is less weird. Probably other places we could/should use this too. (Does C++20 have join/split strings in the std library yet? If not, why not?) * Update Util.h * Update Util.h * clang-tidy | 08 December 2023, 17:50:32 UTC |
19c1c81 | Steven Johnson | 08 December 2023, 16:50:01 UTC | Make wasm +sign-ext and +nontrapping-fptoint the default (#7995) * Make wasm +sign-ext and +nontrapping-fptoint the default These have been supported in ~all wasm runtimes for a while now, and +nontrapping-fptoint in particular can make a big performance difference. We should enable these by default, and add a new backdoor (wasm_mvponly) for code paths that need to use the original wasm Minimum Viable Product spec only. * Update simd_op_check_wasm.cpp | 08 December 2023, 16:50:01 UTC |
5aa891a | Steven Johnson | 07 December 2023, 18:03:06 UTC | Silence useless 'Outer dim vectorization of var' warning in Mullapudi… (#7992) Silence useless 'Outer dim vectorization of var' warning in Mullapudi scheduler | 07 December 2023, 18:03:06 UTC |
df36139 | Steven Johnson | 07 December 2023, 18:02:42 UTC | Fix all "unscheduled update()" warnings in our code (#7991) * Fix all "unscheduled update()" warnings in our code And also fix the Mullapudi scheduler to explicitly touch all update stages. This allows us to mark this warning as an error if we so choose. * fixes * fixes * Update recursive_box_filters.cpp | 07 December 2023, 18:02:42 UTC |
83febb0 | Andrew Adams | 07 December 2023, 17:46:27 UTC | Fix handling of assert statements whose conditions get vectorized (#7989) * Fix handling of assert statements whose conditions get vectorized * Fix test name | 07 December 2023, 17:46:27 UTC |
d1ecc1f | Andrew Adams | 07 December 2023, 16:06:57 UTC | Make narrowing float->int casts on wasm go via wider ints (#7973) Fixes #7972 | 07 December 2023, 16:06:57 UTC |
6e57d6c | Volodymyr Kysenko | 07 December 2023, 16:06:31 UTC | Add a notebook with a visualization of the aprrox_* functions and their errors (#7974) * Add a notebook with a visualization of the aprrox_* functions and their errors * Fix spelling error | 07 December 2023, 16:06:31 UTC |
9f6ec17 | Steven Johnson | 07 December 2023, 00:59:53 UTC | Silence useless "Insufficient parallelism" autoscheduler warning (#7990) | 07 December 2023, 00:59:53 UTC |
17b7366 | Steven Johnson | 06 December 2023, 23:03:14 UTC | Move canonical version numbers into source, not build system (#7980) (#7981) * Move canonical version numbers into source, not build system (#7980) * Fixes | 06 December 2023, 23:03:14 UTC |
d95b65a | Steven Johnson | 05 December 2023, 22:15:51 UTC | Merge branch 'main' into xtensa-codegen | 05 December 2023, 22:15:51 UTC |
209ec02 | Andrew Adams | 05 December 2023, 22:15:23 UTC | Add appropriate mattrs for arm-32 extensions (#7978) * Add appropriate mattrs for arm-32 extensions Fixes #7976 * Pull clauses out of if | 05 December 2023, 22:15:23 UTC |
17578a1 | Andrew Adams | 05 December 2023, 18:08:08 UTC | Add two new tail strategies for update definitions (#7949) * Add two new tail strategies for update definitions * Stop printing asm * Update expected number of partitions for Partition::Always * Add a comment explaining why the blend safety check is per dimension * Add serialization support for the new tail strategies * trigger buildbots * Add comment --------- Co-authored-by: Steven Johnson <srj@google.com> | 05 December 2023, 18:08:08 UTC |
dea2cf7 | Steven Johnson | 03 December 2023, 21:34:02 UTC | complete_x86_target() should enable F16C and FMA when AVX2 is present (#7971) All known AVX2-enabled architectures definitely have these features. | 03 December 2023, 21:34:02 UTC |
674e6cc | Andrew Adams | 01 December 2023, 21:18:20 UTC | Disallow async nestings that violate read after write dependencies (#7868) * Disallow async nestings that violate read after write dependencies Fixes #7867 * Add test * Add another failure case, and improve error message * Add some more tests * Update test * Add new test to cmakelists * Fix for llvm trunk * Always acquire the folding semaphore, even if unused * Skip async_order test under wasm * trigger buildbots --------- Co-authored-by: Volodymyr Kysenko <vksnk@google.com> Co-authored-by: Steven Johnson <srj@google.com> | 01 December 2023, 21:18:20 UTC |
4fc2a7d | Steven Johnson | 01 December 2023, 00:31:48 UTC | Handle many more intrinsics in Bounds.cpp (#7823) * Handle many more intrinsics in Bounds.cpp This addresses many (but not all) of the `signed integer overflow` issues we're seeing in Google due to https://github.com/halide/Halide/pull/7814 -- a lot of the issues seems to be in code that uses intrinsics that had no handling in value bounds checking, so the bounds were naively large and overflowed. - Most of the intrinsics from FindIntrinsics.h weren't handled; now they all are (most by lowering to other IR, though the halving_add variants were modeled directly because the bitwise ops don't mesh well) - strict_float() is just a pass-through - round() is a best guess (basically, if bounds exist, expand by one as a worst-case) There are definitely others we should handle here... trunc/floor/ceil probably? * Fix round() and strict_float() handling * Update Bounds.cpp * Fixes? * trigger buildbots * Revert saturating_cast handling * Update Bounds.cpp --------- Co-authored-by: Andrew Adams <andrew.b.adams@gmail.com> | 01 December 2023, 00:31:48 UTC |
3136819 | Xuanda Yang | 30 November 2023, 17:59:30 UTC | [serialization] Add Halide version and serialization version in serialization format (#7905) * halide version * serialization version * format * Fix Makefile * trigger buildbots --------- Co-authored-by: Andrew Adams <andrew.b.adams@gmail.com> Co-authored-by: Steven Johnson <srj@google.com> | 30 November 2023, 17:59:30 UTC |
ad5dd20 | antonysigma | 29 November 2023, 17:31:12 UTC | Update instructions to include generated schedules (#7928) The generated schedule from the auto-scheduler can no longer be copy-n-pasted to the Generater source code. Update the tutorial to show how the generated schedules can be appled and included into Generator. Use case: version control and fine tuning of schedules. Resolves: #7148 See also: #7900 Co-authored-by: Steven Johnson <srj@google.com> | 29 November 2023, 17:31:12 UTC |
fadcbeb | Misha Gutman | 29 November 2023, 17:30:15 UTC | [xtensa] Clean up (#7961) | 29 November 2023, 17:30:15 UTC |
bf5f206 | Steven Johnson | 29 November 2023, 17:19:03 UTC | Remove inadvertently added generated file (#7966) | 29 November 2023, 17:19:03 UTC |
fc4ff80 | Steven Johnson | 29 November 2023, 17:07:09 UTC | Merge branch 'main' into xtensa-codegen | 29 November 2023, 17:07:09 UTC |
68f2bbd | Steven Johnson | 29 November 2023, 17:06:51 UTC | Revise Flatbuffers codegen style (#7964) * Rename the generated Flatbuffer headers The Blaze/Bazel rules for Flatbuffers are inflexible and require this naming pattern :-/ * Also update the flags to flatc * Fix lots of stuff * exclude from clang-format * ignore again | 29 November 2023, 17:06:51 UTC |
b7468af | Andrew Adams | 29 November 2023, 16:39:41 UTC | Attempt to fix nested vectorization gemm performance on new build bot (#7959) * Better (simpler) schedules for nested vectorization gemm * Remove early return * Empty-Commit --------- Co-authored-by: Steven Johnson <srj@google.com> | 29 November 2023, 16:39:41 UTC |
80e8daa | Steven Johnson | 28 November 2023, 23:31:28 UTC | Merge branch 'main' into xtensa-codegen | 28 November 2023, 23:31:28 UTC |
5175d16 | Andrew Adams | 28 November 2023, 21:59:21 UTC | Make the fast inverse test throughput-limited rather than latency-limited (#7958) Co-authored-by: Steven Johnson <srj@google.com> | 28 November 2023, 21:59:21 UTC |
2b23e07 | Steven Johnson | 28 November 2023, 16:05:52 UTC | Return values from stub functions in Deserialization (#7963) Needed to prevent "error: non-void function does not return a value" | 28 November 2023, 16:05:52 UTC |
68a6652 | Steven Johnson | 28 November 2023, 15:29:28 UTC | Merge branch 'main' into xtensa-codegen | 28 November 2023, 15:29:28 UTC |
9ce5fd6 | James Price | 28 November 2023, 14:54:03 UTC | [WebGPU] Update to latest native headers (#7932) * [WebGPU] Update to latest native headers * Update mini_webgpu.h with latest version from Dawn * Document this process * Remove an argument from wgpuQueueOnSubmittedWorkDone Fixes #7581 * [WebGPU] Note that wgpu is not yet supported * [WebGPU] Add https:// to external links in README * update to commit b5d38fc7dc2a20081312c95e379c4a918df8b7d4 * Update mini_webgpu.h --------- Co-authored-by: Steven Johnson <srj@google.com> | 28 November 2023, 14:54:03 UTC |
71e2728 | Steven Johnson | 28 November 2023, 01:19:04 UTC | Merge branch 'main' into xtensa-codegen | 28 November 2023, 01:19:04 UTC |