Revision history - refs/heads/srj/test8078 - origin: https://github.com/halide/Halide

visit type:

Newer
Older

Revision	Author	Date	Message	Commit Date
a4172e1	Steven Johnson	09 February 2024, 17:18:03 UTC	Merge remote-tracking branch 'origin/abadams/parallel_simd_op_check' into srj/test8078	09 February 2024, 17:18:03 UTC
223a44b	Andrew Adams	09 February 2024, 06:17:34 UTC	Revert accidental serialization	09 February 2024, 06:17:34 UTC
596300f	Andrew Adams	09 February 2024, 00:55:37 UTC	Make float16_t neon op check test at least build	09 February 2024, 00:55:37 UTC
b0a2b7f	Andrew Adams	08 February 2024, 23:17:48 UTC	Deep-copy the LoopLevels	08 February 2024, 23:17:48 UTC
12ed341	Andrew Adams	08 February 2024, 23:17:41 UTC	Use separate imageparams per task	08 February 2024, 23:17:41 UTC
8400ad9	Andrew Adams	08 February 2024, 01:54:38 UTC	Fix when we're willing to run x86 code in simd_op_check	08 February 2024, 01:54:38 UTC
0b9dd99	Andrew Adams	08 February 2024, 01:50:07 UTC	Remove debug print	08 February 2024, 01:50:07 UTC
3f281f0	Andrew Adams	08 February 2024, 01:49:51 UTC	The FIXME is actually fine	08 February 2024, 01:49:51 UTC
60ce495	Andrew Adams	08 February 2024, 01:49:00 UTC	Parallelize some tests This reduces the time taken to run all correctness tests from 8:15 to 3:15 on my machine.	08 February 2024, 01:49:00 UTC
77fc71b	Steven Johnson	07 February 2024, 18:10:25 UTC	Merge branch 'xtensa-codegen' of https://github.com/halide/Halide into xtensa-codegen	07 February 2024, 18:10:25 UTC
3285236	Steven Johnson	07 February 2024, 18:10:20 UTC	Merge branch 'main' into xtensa-codegen	07 February 2024, 18:10:20 UTC
ea03af7	Misha Gutman	07 February 2024, 18:09:41 UTC	[xtensa] Added int32<->float vector reinterprets (#8070)	07 February 2024, 18:09:41 UTC
39e5c08	Andrew Adams	07 February 2024, 17:49:06 UTC	Better validation of gpu schedules (#8068) * Update makefile to use test/common/terminate_handler.cpp This means we actually print error messages when using exceptions and the makefile * Better validate of GPU schedules GPU loop constraints were checked in two different places. Checking them in ScheduleFunctions was incorrect because it didn't consider update definitions and specializations. Checking them in FuseGPUThreadLoops was too late, because the Var names have gone (they've been renamed to things like __thread_id_x). Furthermore, some problems were internal errors or runtime errors when they should have been user errors. We allowed 4d thread and block dimensions, but then hit an internal error. This PR centralizes checking of GPU loop structure in CanonicalizeGPUVars and adds more helpful error messages that print the problematic loop structure. E.g: ``` Error: GPU thread loop over f$8.s0.v0 is inside three other GPU thread loops. The maximum number of nested GPU thread loops is 3. The loop nest is: compute_at for g$8: for g$8.s0.v7: for g$8.s0.v6: for g$8.s0.v5: for g$8.s0.v4: gpu_block g$8.s0.v3: gpu_block g$8.s0.v2: gpu_thread g$8.s0.v1: gpu_thread g$8.s0.v0: store_at for f$8: compute_at for f$8: gpu_thread f$8.s0.v1: gpu_thread f$8.s0.v0: ``` Fixes the bug found in #7946 * Delete dead code * Actually clear the ostringstream	07 February 2024, 17:49:06 UTC
37153a9	Derek Gerstmann	07 February 2024, 17:43:58 UTC	Fix bool conversion bug in Vulkan code generator (#8067) * Fix bug in Vulkan code generator that was incorrectly passing the address of a byte vector, instead of its contents to builder.declare_constant() * Add bool_predicate_cast correctness test to verify bool conversion for Vulkan codegen works as expected --------- Co-authored-by: Derek Gerstmann <dgerstmann@adobe.com>	07 February 2024, 17:43:58 UTC
78a0762	Prasoon Mishra	07 February 2024, 17:41:51 UTC	Add hexagon_benchmarks app for CMake builds (#8069) * Add hexagon_benchmarks app for CMake builds * Removed unnecessary -lc++abi flag from GCC build	07 February 2024, 17:41:51 UTC
84fe565	Steven Johnson	07 February 2024, 17:41:21 UTC	Outsmart the LLVM optimizer (#8073) The old definitions of bool_1, bool_2, bool_3 in simd_op_check_x86 (etc) all referred to the same entry in in_f32; as of https://github.com/llvm/llvm-project/pull/76367, the LLVM optimizer is smart enough to realize that (eg) bool1 != bool2 by construction, and optimizes away the code that tests their conditions, such as the one for andps and orps. Initing them from different locations is enough to outsmart the compiler. (bug was only noticed in the x86 test, but I updated the other tests to guard against future improvements there too.)	07 February 2024, 17:41:21 UTC
665804c	Steven Johnson	06 February 2024, 23:34:29 UTC	Don't require Halide_WebGPU when using wasm (#8063) (#8065) * Don't require Halide_WebGPU when using wasm (#8063) * trigger buildbots	06 February 2024, 23:34:29 UTC
93bff95	Teo	06 February 2024, 23:34:02 UTC	add unsafe_promise_clamped (#8071) add unsafe_promise_clamp	06 February 2024, 23:34:02 UTC
feb0b93	Misha Gutman	06 February 2024, 19:22:14 UTC	[xtensa] Added int32 by int16 vector division + maintenance work (#8058) * [xtensa] Renamed SEL instructions to semantically correct * [xtensa] updated the types definitions in halide_xtensa_div32 * [xtensa] added int32 by int16 vector division * [xtensa] replaced convert int16->int32_x2->int16 to two interleavs for better efficiency	06 February 2024, 19:22:14 UTC
80e2081	Andrew Adams	05 February 2024, 22:25:05 UTC	Update makefile to use test/common/terminate_handler.cpp (#8066) This means we actually print error messages when using exceptions and the makefile	05 February 2024, 22:25:05 UTC
8c7d78c	Volodymyr Kysenko	01 February 2024, 19:14:09 UTC	Fix warning	01 February 2024, 19:14:09 UTC
35e97c6	Volodymyr Kysenko	01 February 2024, 19:11:51 UTC	Merge branch 'main' into xtensa-codegen	01 February 2024, 19:11:51 UTC
f32f95e	Misha Gutman	01 February 2024, 18:40:58 UTC	[xtensa] added vector load_predicated for f16 (#8057)	01 February 2024, 18:40:58 UTC
e2448fe	Andrew Adams	01 February 2024, 17:46:10 UTC	Fix type error in VectorizeLoops (#8055)	01 February 2024, 17:46:10 UTC
9e17fc7	Misha Gutman	29 January 2024, 18:12:01 UTC	[xtensa] Added float16 interleaves (#8050)	29 January 2024, 18:12:01 UTC
47378ee	Steven Johnson	29 January 2024, 01:28:13 UTC	Enable `bugprone-switch-missing-default-case` (#8048) * Upgrade clang-format and clang-tidy to use LLVM 17 * trigger buildbots * trigger buildbots * trigger buildbots * trigger buildbots * Enable `bugprone-switch-missing-default-case` ...and fix existing warnings. * Update .clang-tidy * Update Parameter.cpp * Update .clang-tidy * Update .clang-tidy * Update .clang-tidy * Update .clang-tidy * Update CPlusPlusMangle.cpp	29 January 2024, 01:28:13 UTC
4b2d211	Steven Johnson	27 January 2024, 00:33:24 UTC	Upgrade clang-format and clang-tidy to use LLVM 17 (#8042) * Upgrade clang-format and clang-tidy to use LLVM 17 * trigger buildbots * trigger buildbots * trigger buildbots * trigger buildbots	27 January 2024, 00:33:24 UTC
45d7850	Andrew Adams	26 January 2024, 20:01:41 UTC	Track whether or not let expressions failed to solve in solver (#7982) * Track whether or not let expressions failed to solve in solver After mutating an expression, the solver needs to know two things: 1) Did the expression contain the variable we're solving for 2) Was the expression successfully "solved" for the variable. I.e. the variable only appears once in the leftmost position. We need to know this to know property 1 of any subexpressions (i.e. does the right child of the expression contain the variable). This drives what transformations we do in ways that are guaranteed to terminate and not take exponential time. We were tracking property 1 through lets but not property 2, and this meant we were doing unhelpful transformations in some cases. I found a case in the wild where this made a pipeline take > 1 hour to compile (I killed it after an hour). It may have been in an infinite transformation loop, or it might have just been exponential. Not sure. * Remove surplus comma * Fix use of uninitialized value that could cause bad transformation	26 January 2024, 20:01:41 UTC
3657cf5	Andrew Adams	26 January 2024, 17:26:12 UTC	Fix bounds_of_nested_lanes (#8039) * Fix bounds_of_nested_lanes bounds_of_nested_lanes assumed that one layer of nested vectorization could be removed at a time. When faced with the expression: min(ramp(x8(a), x8(b), 5), x40(27)) It panicked, because on the left hand side it reduced the bounds to x8(a) ... x8(a) + x8(b) * 4, and on the right hand side it reduced the bounds to 27. It then attempted to take a min of mismatched types. In general we can't assume that binary operators on nested vectors have the same nesting structure on both sides, so I just rewrote it to reduce directly to a scalar. Fixes #8038	26 January 2024, 17:26:12 UTC
4590a09	Andrew Adams	26 January 2024, 01:07:40 UTC	Fix for llvm trunk: Force-include more runtime types (#8045) * Fix for llvm trunk: Force-include more runtime types * Include the force-include-types module first * Fix comment * Expand comment	26 January 2024, 01:07:40 UTC
c1923f3	Steven Johnson	24 January 2024, 23:53:28 UTC	HALIDE_VERSION_MAJOR -> 18 (#8044)	24 January 2024, 23:53:28 UTC
6177e51	Steven Johnson	24 January 2024, 20:04:19 UTC	Update Halide version to 18 (#8043)	24 January 2024, 20:04:19 UTC
9b9dfaf	Andrew Adams	24 January 2024, 19:12:17 UTC	Update Makefile for llvm 19 (#8040)	24 January 2024, 19:12:17 UTC
90e909d	Steven Johnson	24 January 2024, 18:44:47 UTC	Allow LLVM 19 in CMake (#8041)	24 January 2024, 18:44:47 UTC
958037a	Misha Gutman	23 January 2024, 17:19:16 UTC	[xtensa] Added efficient gather load to Q7 (#8026) Added efficient gather load to Q7	23 January 2024, 17:19:16 UTC
e0e9f63	Steven Johnson	22 January 2024, 21:43:00 UTC	Tweak the Printer code in runtime for smaller code (#8023) * Tweak the Printer code in runtime for smaller code TL;DR: template expansion meant that we had more replicated code than expected from the inline expansion of code in Printer and friends. Restructured and added NEVER_INLINE to try to make the call sites as small as possible. It's a modest code-size savings but nonzero... e.g., the linux-x86-64 .o output from correct_cross_compilation drops from 164280 bytes to 162936 bytes. * Update printer.h * debug * Update HalideTestHelpers.cmake * Update printer.h * fixes	22 January 2024, 21:43:00 UTC
05d4412	Volodymyr Kysenko	19 January 2024, 23:33:25 UTC	Skip the double buffering for DMA if the allocation and compute is at the same level	19 January 2024, 23:33:25 UTC
4a3378f	Misha Gutman	17 January 2024, 23:05:41 UTC	[xtensa] adjusted the tests to be launchable for Q8 (#8011) * [xtensa] adjusted the tests to be launchable for Q8 * Style fixes + C++-17 compliance	17 January 2024, 23:05:41 UTC
e5d4a57	Steven Johnson	17 January 2024, 23:05:20 UTC	Fix clang-tidy errors in InjectDmaTransfer (#8033)	17 January 2024, 23:05:20 UTC
a925471	Steven Johnson	17 January 2024, 19:28:50 UTC	Merge branch 'main' into xtensa-codegen	17 January 2024, 19:28:50 UTC