sort by:
Revision Author Date Message Commit Date
3a0b891 Merge branch 'main' into xtensa-codegen 06 March 2023, 21:43:48 UTC
4a80251 Destringify CanonicalizeGPUVars (#7386) * Destringify CanonicalizeGPUVars This new implementation takes the high-water marks of each type of GPU loop, instead of filtering using the prefix of the loop name. * Better comments * fix typo 03 March 2023, 20:57:47 UTC
aa8fcad hannk: Provide weak symbol functions to use op profiling (#7388) * hannk: Provide weak symbol functions to use op profiling You can add your own profiler with strong symbols. * hannk: Guard profiler feature with HANNK_PROFILER 03 March 2023, 20:56:52 UTC
5e81e91 Remove incorrect halide_xtensa_sat_narrow_u16 03 March 2023, 18:56:46 UTC
c855273 Merge branch 'main' into xtensa-codegen 03 March 2023, 18:55:38 UTC
387a19c Make README consistent on supported LLVM versions (#7390) Make README consistent with https://github.com/halide/Halide/pull/7093 03 March 2023, 01:10:50 UTC
122b5b6 hannk: Add device_sync method hannk::Tensor class (#7387) hannk: Add devic_sync method hannk::Tensor class This method is useful to debug performance with synchronous execution. 01 March 2023, 22:22:20 UTC
303a90c Minor cleanup of GPUCompilationCache (#7376) * Minor cleanup of GPUCompilationCache While tracking down an apparently-unrelated threading bug in the webgpu backend, I made some tweaks to this code that I think are worth keeping. The main one of importance is that `release_hold()` and `release_context()` really should acquire the mutex -- they weren't before -- so now all public methods are properly mutexed. The other changes are mostly cosmetic: - Moved helper methods to be private rather than public - Changed the id value size to be `uintptr_t` rather than `uint32_t`; the space allocated for them is sizeof(void*). (Not sure this moves the needle but it felt right.) - Removed unused ctor for CachedCompilation Also, drive-by change in printer.h to capture some logging improvements. * Fix deadlock release_all() and release_context() were contending for the mutex 01 March 2023, 17:21:48 UTC
b48f78c Merge branch 'xtensa-codegen' of https://github.com/halide/Halide into xtensa-codegen 01 March 2023, 00:42:59 UTC
040d773 Merge branch 'main' into xtensa-codegen 01 March 2023, 00:42:07 UTC
fba892a Specify a full type for llvm::IRBuilder (#7384) Specify full type for llvm::IRBuilder 01 March 2023, 00:29:42 UTC
c16b5e2 [xtensa] removed tests that are failing to compile (#7362) * [xtensa] removed tests that are failing to compile due to poor support of int48 in scalarised regime * [xtensa] removed runtime generation for xtensa tests, as it is not used 28 February 2023, 17:31:38 UTC
5c02ae2 Bounds visitor for div was missing single_point mutated case (#7379) * Bounds visitor for div was missing single_point mutated case Signed-off-by: Adrian Lebioda <adrian.lebioda@hexagon.com> * Add test Signed-off-by: Adrian Lebioda <adrian.lebioda@hexagon.com> --------- Signed-off-by: Adrian Lebioda <adrian.lebioda@hexagon.com> Co-authored-by: Adrian Lebioda <adrian.lebioda@hexagon.com> 28 February 2023, 02:14:41 UTC
bdba694 Add Callable default ctor + `defined()` method (#7380) * Add Callable default ctor + `defined()` method This allows it to behave like * Add user_assert + test 28 February 2023, 02:14:24 UTC
2fb3b62 Better handling of u1 to i16 cast & clean-up 28 February 2023, 01:21:13 UTC
0091fd9 [xtensa] Limit the number of allowed DMA channels + allocate a separate channel for the output transactions (#7381) * Limit the number of allowed DMA channels + allocate a separate channel for the output transactions * Fix formatting 28 February 2023, 00:35:02 UTC
38057b8 Merge branch 'main' into xtensa-codegen 27 February 2023, 23:28:26 UTC
c42a5b2 Remove a gross hack from gpu_only_aottest (#7378) * Remove a gross hack from gpu_only_aottest Also add metal support * Add missing include 27 February 2023, 16:46:41 UTC
09400f6 Bounds visitors for min/max were missing single_point mutated case (#7377) * Bounds visitors for min/max were missing single_point mutated case Partially fixes #7374 * Add test 25 February 2023, 17:16:32 UTC
b6a18b8 Update WABT to 1.0.32; Increase stack size for WASM AOT apps (#7373) 23 February 2023, 19:36:32 UTC
144c1a4 correctness_round should use Target::supports_type() (#7372) This gives it proper support for new GPU backends 23 February 2023, 17:59:39 UTC
b17806d Use HalideFreeHelper for the register_destructor (#7371) Slightly cleaner code. Also, drive-by change of NULL -> nullptr 23 February 2023, 17:59:13 UTC
629da52 Use single-char form of `unique_name` for semaphores (#7370) The multi-char form of `unique_name` will append a `$` to the identifier, e.g. `sema$4`. This isn't really legal for a C/C++ identifier. 23 February 2023, 17:58:37 UTC
ad6c84a [xtensa] Clean up HalideFreeHelper code (#7368) * Clean up HalideFreeHelper code - Revise HalideFreeHelper to be a templated struct, to save the unnecessary stack storage for the function - Add emit_halide_free_helper() method to consolidate usage - Add a nullptr check to the `stack_is_core_private`, per comment - Fix some minor whitespace issues (If this PR is accepted here, I will of course backport the non-xtensa portions to main) * Update CodeGen_C.cpp 22 February 2023, 23:45:59 UTC
29f3f42 Merge branch 'main' into xtensa-codegen 22 February 2023, 23:44:29 UTC
386a2d1 Clean Up HalideFreeHelper code (main) (#7369) * Clean Up HalideFreeHelper code (main) - Revise HalideFreeHelper to be a templated struct, to save the unnecessary stack storage for the function - Add emit_halide_free_helper() method to consolidate usage * Update CodeGen_C.cpp 22 February 2023, 23:39:25 UTC
e69fa42 Merge branch 'main' into xtensa-codegen 22 February 2023, 18:37:34 UTC
3246844 Use a std::unique_ptr for the IR Builder (#7356) * Use a std::unique_ptr for the IR Builder instead of a raw owning pointer * Use make_unique 21 February 2023, 21:18:30 UTC
e19a036 Overflow on casts is fine for ints < 32 bits (#7366) 21 February 2023, 17:28:43 UTC
ccc085a Update CMakePresets.json to use VS2022 instead of VS2019 (main) (#7359) 16 February 2023, 19:54:27 UTC
21d7637 Merge branch 'main' into xtensa-codegen 15 February 2023, 18:51:05 UTC
038d325 Add missing convert<native_vector_u16_x2, native_vector_i24> 15 February 2023, 18:39:31 UTC
310f641 [xtensa] Improvements to CodeGen_Xtensa (#7328) * [xtensa] Fixed common_int and common_uint for Q8. Added new types support for load_predicated, store_predicated, halide_xtensa_interleave (for u16 also added Q8 support for native_vector_u16_x3). Improved convert f32 to u32 vectors with proper intrinsic. Cleaned up. * changed "typedef" to "using" and refactored Div visit for better readability 15 February 2023, 18:25:34 UTC
b65ea62 Remove unused code in VectorizeLoops (#7354) 15 February 2023, 18:25:04 UTC
f6731b0 Merge branch 'main' into xtensa-codegen 15 February 2023, 17:42:01 UTC
18eb7d8 Permit vectorization of non-recursive atomic operations (#7346) * Vectorization of non-recursive atomic operations * Remove dead Vars 15 February 2023, 17:18:56 UTC
49e7d35 [Xtensa] 8-bit arithmetic improvements + some other smaller changes (#7294) * [xtensa] Adopted the changes from Cadence * reverted changes in xtensa_dma * fixed few simd_op_check_xtensa tests that are no longer failing * added two more previously failing tests in simd_op_check_xtensa * Removed empty line and commented failing tests back due to poor support of int48 15 February 2023, 17:15:09 UTC
e5ed226 Fix Python error handling (#7352) * Fix Python error handling Error handling in the Python bindings wasn't quite right for JIT: We previously replaced halide_error() to throw a C++ exception. Sounds good, but unfortunately, doesn't work reliably: if called from jitted code (which doesn't know about C++ exceptions), the throw statement may be unable to find the enclosing try block (which is outside jitted code), meaning it will call std::terminate. Now, instead, we just leave the JIT error handler unset, and call with an explicit JITUserContext with a custom print handler; in theory, this meant that the code in JITFuncCallContext::finalize() would check for an error after the call into jitted code, and call `halide_runtime_error` if so (which would then trigger an all-in-C++-exception). Unfortunately... (2) JITFuncCallContext is broken by design; it mutates the input JITUserContext, so that trying to use the same JITUserContext for two calls in a row leaves you with a JITUserContext with (at least) the error_handler set. Since at least one of the realize() calls does this twice (once for bounds query, once for execution), this means that an error in the second call would never be seen, since finalize() only reported errors if there wasn't a custom error handler on input. Per @abadams suggestion, we work around this by treating 'JITErrorBuffer::handler' as 'no custom error handler', which is mostly true. (But really, JITFuncCallContext and JITUserContext are a hard-to-reason-about mess and arguably need to rethought entirely.) (3) Removed entirely-unnecessary overrides of runtime print and error handlers from PyStubImpl; despite the comments, this code is unnecessary. * format 15 February 2023, 17:03:41 UTC
ec1159e [xtensa] Remove __restrict from print_assignment (#7351) Remove __restrict from print_assignment 14 February 2023, 23:59:06 UTC
b467f28 [xtensa] Generate PACKVRNR for i16(i32(i48x) >> wild_i32) (#7349) Generate PACKVRNR for i16(i32(i48x) >> wild_i32) 14 February 2023, 20:44:00 UTC
7963cd4 Change early-bound default args in Python bindings to late-bound (#7347) In PyBind11, if you specify a default argument for a method, it is evaluated when the Python module is initialized, *not* when the method is called (as you might expect in C++). For defaults that are just constants/literals, this is no big deal, but when calling get_*_target_from_environment, this means it is called at module init time -- also normally not a big deal (since the values ~never change at runtime anyway), with one big exception (no pun intended): if the function throws an exception (e.g. via calling user_assert() or similar), that exception is thrown at Module-initialization time, which is a much more inscrutable crash, and one that is very hard to recover from. This may seem unlikely, but can happen pretty easily if you set (say) HL_JIT_TARGET=host-cuda (or other gpu) and the given GPU runtime isn't present on the given system; the current behavior is basically "make if impossible for the libHalidePython bindings to run", whereas what we want is "runtime exception thrown when you call the method". This changes the relevant methods to use `Target()` as the default value, and inside the method wrapper, if the value passed equals `Target()`, it replaces the value with the righ `get_*_target_from_environment()` call. (This turned up while doing some testing of https://github.com/halide/Halide/pull/6924 on a system without Vulkan available) 14 February 2023, 17:14:05 UTC
8bd07fb Fix tuple output bounds checks (#7345) Fix #7343 Tuple outputs weren't getting appropriate bounds checks due to overzealous culling of uninteresting code in the add_image_checks pass. 14 February 2023, 01:52:53 UTC
7ea67a6 Remove unnecessary overrides in Codegen_Xtensa (#7342) 11 February 2023, 02:43:14 UTC
858ee30 Merge branch 'main' into xtensa-codegen 11 February 2023, 02:42:40 UTC
22aed20 Devirtualize the protected compile() methods in Codegen_C (#7341) With the addition of `preprocess_function_body()`, neither of these need to be virtual, and devirtualizing them avoid `hidden overloaded virtual function` warnings in subclasses that don't override them 11 February 2023, 02:42:05 UTC
6c5ca8e Tiny improvements in codegen in C backend (#7337) * Tiny improvements in codegen in C backend (1) Emit `true` or `false` instead of `(bool)(0ull)` etc for bool literals (2) Avoid redundant temporaries in print_cast_expr(), which occur in a small but nonzero number of cases Basically this means that code currently like ``` bool _523 = (bool)(0ull); bool _524 = (bool)(_523); ... foo(_524); ``` becomes ``` foo(false); ``` ...I'm sure this has no output on final object code, but it makes the generated C code less weird to read. * Also avoid extra intermediates for typed nullptr * Also use std::isnan() and std::isinf() * Update CodeGen_C.cpp 11 February 2023, 00:34:41 UTC
3e6a2c6 Remove Xtensa::compile(LoweredFunc), add Xtensa::preprocess_function_body() (#7340) This removes a nice chunk of redundant code (and adds some corner cases that were missing from the Xtensa version). 10 February 2023, 23:56:32 UTC
31e557f Add [SKIP] to correctness_simd_op_check_xtensa 10 February 2023, 21:55:08 UTC
a0ac990 Merge branch 'main' into xtensa-codegen 10 February 2023, 21:29:21 UTC
a6c5be7 Add a hook to Codegen_C::compile() (#7335) At least one subclass of Codegen_C currently has to replicate ~all of the compile(LoweredFunc) method, with the result that it has often gone stale (and still is stale) wrt changes in the base; this adds an optional method to allow some modifications to the function body just before it is printed, to avoid redundant code. 10 February 2023, 21:28:51 UTC
88d40c2 Fix issue in find_package in cross-compilation for no OS (#7282) When using toolchain where Threads libs are not available, which is the case in baremetal target cross-compilation, we were not able to load even HalideHelpers pacakge. Co-authored-by: Alex Reinking <alex.reinking@gmail.com> 10 February 2023, 18:07:56 UTC
76dc02a Merge branch 'main' into xtensa-codegen 10 February 2023, 00:23:11 UTC
35322c3 Fix a subtle uninitialized-memory-read in Buffer::for_each_value() (#7330) * Fix a subtle uninitialized-memory-read in Buffer::for_each_value() When we flattened dimensions in for_each_value_prep(), we would copy from one past the end, meaning the last element contained uninitialized garbage. (This wasn't noticed as an out-of-bounds read because we overallocated in structure in for_each_value_impl()). This garbage stride was later used to advance ptrs in for_each_value_helper()... but only on the final iteration, so even if the ptr was wrong, it didn't matter, as the ptr was never used again. Under certain MSAN configurations, though, the read would be (correctly) flagged as uninitialized. This fixes the MSAN bug, and also (slightly) improves the efficiency by returning the post-flattened number of dimensions, potentially reducing the number of iterations f for_each_value_helper() needed. * Oopsie * Update HalideBuffer.h * Update HalideBuffer.h 10 February 2023, 00:22:10 UTC
8efaae9 [xtensa] A few minor Xtensa fixes (#7333) * Remove unnecessary special-casing of vector-size in camera-pipe * Update camera_pipe_generator.cpp * Remove unnecessary special-casing of vector-size conv_layer 10 February 2023, 00:21:00 UTC
ae3f401 Explicitly remove -D_GLIBCXX_ASSERTIONS from LLVM definitions (#7332) Explicitly remove -D_GLIBCXX_ASSERTIONS from LLVM definitions as a workaround for https://reviews.llvm.org/D142279 09 February 2023, 23:08:37 UTC
4156c5a Allow _Float16 as alias for float16_t in halide_type_of<>() (#7325) (#7326) 09 February 2023, 21:51:22 UTC
734e34a Remove deprecated `HVX_shared_object` feature (#7331) This has been marked 'deprecated' for quite a while, and has no affect on codegen or, well, anything else. Let's remove it. 09 February 2023, 17:59:12 UTC
0f6003e Float16: Remove unused header dependency (#7324) IRMutator.h is not needed for the Float16.h. 08 February 2023, 20:26:31 UTC
c3f3318 Fixes for top-of-tree LLVM (#7329) * Fixes for top-of-tree LLVM * fix * times ten * Update LLVM_Output.cpp 08 February 2023, 20:25:37 UTC
7e93e0a Merge branch 'main' into xtensa-codegen 08 February 2023, 01:14:57 UTC
ddb515a Improve support for Arm baremetal compilation and runtime (#7286) * Improve support for Arm baremetal compilation and runtime - Add Target feature "semihosting" mode for baremetal runtime - Fix error of aligned_alloc() when compiled by Arm GNU toolchain * Modify comments for Target feature semihosting * Add an example app to guide cross-compilation for baremetal target * Update build steps in HelloBaremetal * Fix line-ending * Set CMake variable BAREMETAL in toolchain file 07 February 2023, 18:41:04 UTC
34d256f Make auto scheduler libs available in HalideHelpers package (#7285) * Make auto scheduler libs available in HalideHelpers package find_package(HalideHelpers) allows us to use add_halide_library(). But auto scheduler libs are not available unless they are in Halide-Interfaces.cmake. Note: Those libraries are not actually linked to the target application, but need to be available for add_custom_command call. 07 February 2023, 18:40:22 UTC
0c7722f Add buffer sync methods hannk::Tensor class (#7323) Add few methods for GPU memory interaction. 07 February 2023, 17:37:09 UTC
0b7379f Warn emulated float16 equivalent is generated (#7307) * Warn emulated float16 equivalent is generated 07 February 2023, 17:08:32 UTC
a55a09a Fix Halide cross-compilation (#7073) Use CMAKE_CROSSCOMPILING_EMULATOR for llvm-as and clang imported targets 07 February 2023, 14:17:58 UTC
1ad328a Fix LLVM 17+ build integration on 32-bit systems (#7322) * Fix LLVM 17+ build integration on 32-bit systems Fixes #7319 * add detail and precision to comment 07 February 2023, 01:18:21 UTC
91f3ac0 Fix segfault by nonconstant bound in Adams2019 (#7321) Fix segmentation fault in Adams2019 in case the estimate or bound of Func is set to nonconstant Expr. 06 February 2023, 22:23:47 UTC
01f9e2d Replace some push_backs with emplace_back (#7317) 06 February 2023, 19:05:01 UTC
9ab7a4d Merge branch 'main' into xtensa-codegen 03 February 2023, 21:18:43 UTC
e9aecee Make visit_leaf() public in hannk/ops.h (#7318) * Make visit_leaf() public in hannk/ops.h This makes it easier for downstream code to experiment with adding ops * Update ops.h 03 February 2023, 21:17:47 UTC
0782d80 Make Callable::call_argv_fast public (#7315) * Make Callable::call_argv_fast public * Add rough specification of the calling convention * Fix a typo 01 February 2023, 18:24:01 UTC
234bf6e Merge branch 'main' into xtensa-codegen 31 January 2023, 23:39:51 UTC
beba53a halide_popcount<uint64_t> is broken (#7313) Would not compile for Win32 or any other compiler without __builtin_popcountll available. (How did this get checked in without being tested on MSVC?) 31 January 2023, 21:08:15 UTC
78dc6a0 Pattern for narrow_i48_with_rounding_shift_i16 31 January 2023, 20:05:27 UTC
0a1b5a1 Replace widening_shift_left with signed widening_mul when possible 31 January 2023, 20:02:33 UTC
fae28de Patterns for quad widening add + minor clean-up 31 January 2023, 19:57:47 UTC
6c521e0 Add __attribute__((malloc)) to halide_tcm_malloc 31 January 2023, 18:45:43 UTC
f338197 Do not inline generic gather_load + specialization for gather of native_vector_f32_x2 31 January 2023, 18:44:08 UTC
63175ac Improved halide_xtensa_sat_narrow_i16 31 January 2023, 18:42:21 UTC
75de60a Add widening quad add 31 January 2023, 18:39:37 UTC
e989a3a Add store<native_vector_i16_2x> implementation 31 January 2023, 18:36:17 UTC
1589189 Merge branch 'main' into xtensa-codegen 31 January 2023, 18:34:53 UTC
fe76ab2 Minimal updates to allow Halide building with LLVM17 (#7309) * Minimal updates to allow Halide building with LLVM17 (Opening as draft initially until Buildbots build the new LLVM versions) * trigger buildbots 30 January 2023, 22:01:42 UTC
f52351f [xtensa] added code for running tests and commented failing i48 tests (#7303) * [xtensa] added infrastructure code for running tests * moved google related calls to CL 26 January 2023, 17:21:33 UTC
dd973f4 Improved halide_popcount (#7225) * Improved halide_popcount * reused popcount64 from Utils.cpp in CodeGen_C * Fixed comment for popcount 25 January 2023, 21:40:54 UTC
4605ac6 [xtensa] Minor DMA improvements (#7304) * handle min/max expressions in strides calculations * more robust check for nested loops 25 January 2023, 00:58:08 UTC
23da552 Merge branch 'main' into xtensa-codegen 23 January 2023, 17:46:55 UTC
810bd0b Hoist vector slices using rewrite rules (#7243) * Hoist slices using rewrite rules This lets us add associative variants more easily, which are helpful in the work on staging strided loads. * Don't hoist extract_element shuffles The Shuffle visitor wants to sink them * Add some static asserts * Add explanatory comment on shuffle hoisting * Fix comment * add lanes predicate to slice hoisting * add vector slice hoisting test cases Co-authored-by: Steven Johnson <srj@google.com> Co-authored-by: Alexander <ajroot@stanford.edu> 21 January 2023, 22:08:30 UTC
562d045 Post changes from #7291 to Codegen_Xtensa (#7299) 20 January 2023, 23:00:46 UTC
bafd60f [x86 & wasm] Split up double saturating-narrows from i32 (#7280) * better x86 double sat-cast + add test * fix wasm too + test Co-authored-by: Steven Johnson <srj@google.com> 20 January 2023, 18:03:25 UTC
4023640 Merge branch 'main' into xtensa-codegen 20 January 2023, 17:54:15 UTC
c601e4e Add workaround for the const-or-not user_context issue (#635) (#7291) Add a workaround for the const-or-not user_context issue (https://github.com/halide/Halide/issues/635) 20 January 2023, 17:43:56 UTC
2cc0468 Fix issue in add_halide_runtime in cross-compilation (#7284) * Fix issue in add_halide_runtime in cross-compilation add_halide_runtime() tries to build generator executable, but it fails if we are working with cross-compiler toolchain. By using existing generator set as "FROM", we can work around this. 20 January 2023, 17:39:41 UTC
d44e99d Fix error of add_halide_generator in cross-compilation (#7283) In case the project name is CamelCase, add_halide_generator() was not able to find the generator package, because CMake searches <name>Config.cmake or <lower-case-name>-config.cmake 20 January 2023, 13:12:30 UTC
147ff48 Remove dependency on platform threads library (#7297) * Refactor internal ThreadPool.h into halide_thread_pool.h tool * Drop dependency of libHalide on threads library * Remove other redundant uses of Threads::Threads * Update CMake documentation. 20 January 2023, 12:54:34 UTC
314b2fd [HVX] Fix EliminateInterleaves (#7279) * fix EliminateInterleaves Co-authored-by: Steven Johnson <srj@google.com> 20 January 2023, 00:35:14 UTC
c9f3602 Remove the watchdog timer from generator_main(). It was intended to k… (#7295) Remove the watchdog timer from generator_main(). It was intended to kill pathologically slow builds, but in the environment it was added for (Google build servers), it ended up being redundant to existing mechanisms, and removing it allows us to remove a dependency on threading libraries in libHalide. 19 January 2023, 23:48:26 UTC
51a4f6c Emit prototypes for destructor functions in C Backend (#7296) We gathered up the destructors, but only emitted the prototypes if there was at least one non-C++ function declaration needed -- so if you built with cpp_name_mangling enabled, you might omit the right prototype. Fixed and added the right flag to a Generator test to tickle this behavior. 19 January 2023, 23:36:47 UTC
e8e1481 Drop support for MIPS (#7287) (#7289) * Drop support for MIPS (#7287) * Update Target.cpp 18 January 2023, 21:56:14 UTC
2e9ae6a Merge branch 'main' into xtensa-codegen 18 January 2023, 18:51:45 UTC
back to top