be6f0f7 | Steven Johnson | 14 February 2023, 19:47:46 UTC | wip | 14 February 2023, 19:47:46 UTC |
f918585 | Steven Johnson | 14 February 2023, 18:20:57 UTC | Merge branch 'main' into vulkan-phase2-runtime | 14 February 2023, 18:20:57 UTC |
7963cd4 | Steven Johnson | 14 February 2023, 17:14:05 UTC | Change early-bound default args in Python bindings to late-bound (#7347) In PyBind11, if you specify a default argument for a method, it is evaluated when the Python module is initialized, *not* when the method is called (as you might expect in C++). For defaults that are just constants/literals, this is no big deal, but when calling get_*_target_from_environment, this means it is called at module init time -- also normally not a big deal (since the values ~never change at runtime anyway), with one big exception (no pun intended): if the function throws an exception (e.g. via calling user_assert() or similar), that exception is thrown at Module-initialization time, which is a much more inscrutable crash, and one that is very hard to recover from. This may seem unlikely, but can happen pretty easily if you set (say) HL_JIT_TARGET=host-cuda (or other gpu) and the given GPU runtime isn't present on the given system; the current behavior is basically "make if impossible for the libHalidePython bindings to run", whereas what we want is "runtime exception thrown when you call the method". This changes the relevant methods to use `Target()` as the default value, and inside the method wrapper, if the value passed equals `Target()`, it replaces the value with the righ `get_*_target_from_environment()` call. (This turned up while doing some testing of https://github.com/halide/Halide/pull/6924 on a system without Vulkan available) | 14 February 2023, 17:14:05 UTC |
8bd07fb | Andrew Adams | 14 February 2023, 01:52:53 UTC | Fix tuple output bounds checks (#7345) Fix #7343 Tuple outputs weren't getting appropriate bounds checks due to overzealous culling of uninteresting code in the add_image_checks pass. | 14 February 2023, 01:52:53 UTC |
f4c4212 | Steven Johnson | 13 February 2023, 18:42:38 UTC | Merge branch 'main' into vulkan-phase2-runtime | 13 February 2023, 18:42:38 UTC |
22aed20 | Steven Johnson | 11 February 2023, 02:42:05 UTC | Devirtualize the protected compile() methods in Codegen_C (#7341) With the addition of `preprocess_function_body()`, neither of these need to be virtual, and devirtualizing them avoid `hidden overloaded virtual function` warnings in subclasses that don't override them | 11 February 2023, 02:42:05 UTC |
6c5ca8e | Steven Johnson | 11 February 2023, 00:34:41 UTC | Tiny improvements in codegen in C backend (#7337) * Tiny improvements in codegen in C backend (1) Emit `true` or `false` instead of `(bool)(0ull)` etc for bool literals (2) Avoid redundant temporaries in print_cast_expr(), which occur in a small but nonzero number of cases Basically this means that code currently like ``` bool _523 = (bool)(0ull); bool _524 = (bool)(_523); ... foo(_524); ``` becomes ``` foo(false); ``` ...I'm sure this has no output on final object code, but it makes the generated C code less weird to read. * Also avoid extra intermediates for typed nullptr * Also use std::isnan() and std::isinf() * Update CodeGen_C.cpp | 11 February 2023, 00:34:41 UTC |
a6c5be7 | Steven Johnson | 10 February 2023, 21:28:51 UTC | Add a hook to Codegen_C::compile() (#7335) At least one subclass of Codegen_C currently has to replicate ~all of the compile(LoweredFunc) method, with the result that it has often gone stale (and still is stale) wrt changes in the base; this adds an optional method to allow some modifications to the function body just before it is printed, to avoid redundant code. | 10 February 2023, 21:28:51 UTC |
88d40c2 | Steve Suzuki | 10 February 2023, 18:07:56 UTC | Fix issue in find_package in cross-compilation for no OS (#7282) When using toolchain where Threads libs are not available, which is the case in baremetal target cross-compilation, we were not able to load even HalideHelpers pacakge. Co-authored-by: Alex Reinking <alex.reinking@gmail.com> | 10 February 2023, 18:07:56 UTC |
35322c3 | Steven Johnson | 10 February 2023, 00:22:10 UTC | Fix a subtle uninitialized-memory-read in Buffer::for_each_value() (#7330) * Fix a subtle uninitialized-memory-read in Buffer::for_each_value() When we flattened dimensions in for_each_value_prep(), we would copy from one past the end, meaning the last element contained uninitialized garbage. (This wasn't noticed as an out-of-bounds read because we overallocated in structure in for_each_value_impl()). This garbage stride was later used to advance ptrs in for_each_value_helper()... but only on the final iteration, so even if the ptr was wrong, it didn't matter, as the ptr was never used again. Under certain MSAN configurations, though, the read would be (correctly) flagged as uninitialized. This fixes the MSAN bug, and also (slightly) improves the efficiency by returning the post-flattened number of dimensions, potentially reducing the number of iterations f for_each_value_helper() needed. * Oopsie * Update HalideBuffer.h * Update HalideBuffer.h | 10 February 2023, 00:22:10 UTC |
ae3f401 | Steven Johnson | 09 February 2023, 23:08:37 UTC | Explicitly remove -D_GLIBCXX_ASSERTIONS from LLVM definitions (#7332) Explicitly remove -D_GLIBCXX_ASSERTIONS from LLVM definitions as a workaround for https://reviews.llvm.org/D142279 | 09 February 2023, 23:08:37 UTC |
4156c5a | Steven Johnson | 09 February 2023, 21:51:22 UTC | Allow _Float16 as alias for float16_t in halide_type_of<>() (#7325) (#7326) | 09 February 2023, 21:51:22 UTC |
a452b9b | Derek Gerstmann | 09 February 2023, 19:25:46 UTC | Fix split and allocate methods in region allocator to fix issues with alignment constraints - discovered a hang if requested size couldn't be fulfilled after adjusting to aligned sizes - cause was incorrect splitting of existing regions Cleanup region allocator iteration, cleanup and shutdown Added maximum_pool_size configuration option to Vulkan Memory Allocator to restrict pool sizes | 09 February 2023, 19:25:46 UTC |
734e34a | Steven Johnson | 09 February 2023, 17:59:12 UTC | Remove deprecated `HVX_shared_object` feature (#7331) This has been marked 'deprecated' for quite a while, and has no affect on codegen or, well, anything else. Let's remove it. | 09 February 2023, 17:59:12 UTC |
0f6003e | Terry Heo | 08 February 2023, 20:26:31 UTC | Float16: Remove unused header dependency (#7324) IRMutator.h is not needed for the Float16.h. | 08 February 2023, 20:26:31 UTC |
c3f3318 | Steven Johnson | 08 February 2023, 20:25:37 UTC | Fixes for top-of-tree LLVM (#7329) * Fixes for top-of-tree LLVM * fix * times ten * Update LLVM_Output.cpp | 08 February 2023, 20:25:37 UTC |
ddb515a | Steve Suzuki | 07 February 2023, 18:41:04 UTC | Improve support for Arm baremetal compilation and runtime (#7286) * Improve support for Arm baremetal compilation and runtime - Add Target feature "semihosting" mode for baremetal runtime - Fix error of aligned_alloc() when compiled by Arm GNU toolchain * Modify comments for Target feature semihosting * Add an example app to guide cross-compilation for baremetal target * Update build steps in HelloBaremetal * Fix line-ending * Set CMake variable BAREMETAL in toolchain file | 07 February 2023, 18:41:04 UTC |
34d256f | Steve Suzuki | 07 February 2023, 18:40:22 UTC | Make auto scheduler libs available in HalideHelpers package (#7285) * Make auto scheduler libs available in HalideHelpers package find_package(HalideHelpers) allows us to use add_halide_library(). But auto scheduler libs are not available unless they are in Halide-Interfaces.cmake. Note: Those libraries are not actually linked to the target application, but need to be available for add_custom_command call. | 07 February 2023, 18:40:22 UTC |
0c7722f | Terry Heo | 07 February 2023, 17:37:09 UTC | Add buffer sync methods hannk::Tensor class (#7323) Add few methods for GPU memory interaction. | 07 February 2023, 17:37:09 UTC |
0b7379f | Steve Suzuki | 07 February 2023, 17:08:32 UTC | Warn emulated float16 equivalent is generated (#7307) * Warn emulated float16 equivalent is generated | 07 February 2023, 17:08:32 UTC |
a55a09a | Dmitry Kurtaev | 07 February 2023, 14:17:58 UTC | Fix Halide cross-compilation (#7073) Use CMAKE_CROSSCOMPILING_EMULATOR for llvm-as and clang imported targets | 07 February 2023, 14:17:58 UTC |
1ad328a | Alex Reinking | 07 February 2023, 01:18:21 UTC | Fix LLVM 17+ build integration on 32-bit systems (#7322) * Fix LLVM 17+ build integration on 32-bit systems Fixes #7319 * add detail and precision to comment | 07 February 2023, 01:18:21 UTC |
91f3ac0 | Steve Suzuki | 06 February 2023, 22:23:47 UTC | Fix segfault by nonconstant bound in Adams2019 (#7321) Fix segmentation fault in Adams2019 in case the estimate or bound of Func is set to nonconstant Expr. | 06 February 2023, 22:23:47 UTC |
01f9e2d | Andrew Adams | 06 February 2023, 19:05:01 UTC | Replace some push_backs with emplace_back (#7317) | 06 February 2023, 19:05:01 UTC |
14ef177 | Derek Gerstmann | 03 February 2023, 23:30:01 UTC | Cleanup debug output for buffer related updates | 03 February 2023, 23:30:01 UTC |
e9aecee | Steven Johnson | 03 February 2023, 21:17:47 UTC | Make visit_leaf() public in hannk/ops.h (#7318) * Make visit_leaf() public in hannk/ops.h This makes it easier for downstream code to experiment with adding ops * Update ops.h | 03 February 2023, 21:17:47 UTC |
d69e36c | Derek Gerstmann | 03 February 2023, 17:13:36 UTC | Remove accidentally uncommented debug statements | 03 February 2023, 17:13:36 UTC |
ec62988 | Derek Gerstmann | 03 February 2023, 17:11:49 UTC | Fix logic for locating entry point shader binding ... assume exact match for entry point name Cleanup entry point binding variables and clarify usage | 03 February 2023, 17:11:49 UTC |
0782d80 | Tom Westerhout | 01 February 2023, 18:24:01 UTC | Make Callable::call_argv_fast public (#7315) * Make Callable::call_argv_fast public * Add rough specification of the calling convention * Fix a typo | 01 February 2023, 18:24:01 UTC |
beba53a | Steven Johnson | 31 January 2023, 21:08:15 UTC | halide_popcount<uint64_t> is broken (#7313) Would not compile for Win32 or any other compiler without __builtin_popcountll available. (How did this get checked in without being tested on MSVC?) | 31 January 2023, 21:08:15 UTC |
ad3742e | Derek Gerstmann | 31 January 2023, 00:09:47 UTC | Clang format & tidy pass | 31 January 2023, 00:09:47 UTC |
3bddbfc | Derek Gerstmann | 31 January 2023, 00:01:39 UTC | Fix shutdown sequence to iterate over descriptor sets Avoid bug in validation layer by reordering destruction sequence | 31 January 2023, 00:01:39 UTC |
9ca31fe | Derek Gerstmann | 31 January 2023, 00:00:49 UTC | Query for device limits to enforce min alignment constraints for storage and uniform buffers | 31 January 2023, 00:00:49 UTC |
0be26d7 | Derek Gerstmann | 30 January 2023, 23:58:42 UTC | Use VK_WHOLE_SIZE for setting buffer (to pass validation ... otherwise size has to be a multiple of alignment) Remove useless debug asserts for static variables Fix debug logging messages for allocations of scalars (which may not have a dim array) | 30 January 2023, 23:58:42 UTC |
eb8a0ae | Derek Gerstmann | 30 January 2023, 23:57:32 UTC | Query for 8-bit and 16-bit uniform and storage access support. Enable these as part of the device feature query chain. | 30 January 2023, 23:57:32 UTC |
4491f78 | Derek Gerstmann | 30 January 2023, 23:50:29 UTC | Add VkPhysicalDevice8BitStorageFeaturesKHR and related constants | 30 January 2023, 23:50:29 UTC |
069b294 | Derek Gerstmann | 30 January 2023, 23:48:35 UTC | Add required capability flags for 8-bit and 16-bit uniform and storage buffer access Handle casts for GLSL ops (spec requires all args to be the same type as the return type) | 30 January 2023, 23:48:35 UTC |
fe76ab2 | Steven Johnson | 30 January 2023, 22:01:42 UTC | Minimal updates to allow Halide building with LLVM17 (#7309) * Minimal updates to allow Halide building with LLVM17 (Opening as draft initially until Buildbots build the new LLVM versions) * trigger buildbots | 30 January 2023, 22:01:42 UTC |
dd973f4 | Mikhail Usvyatsov | 25 January 2023, 21:40:54 UTC | Improved halide_popcount (#7225) * Improved halide_popcount * reused popcount64 from Utils.cpp in CodeGen_C * Fixed comment for popcount | 25 January 2023, 21:40:54 UTC |
810bd0b | Andrew Adams | 21 January 2023, 22:08:30 UTC | Hoist vector slices using rewrite rules (#7243) * Hoist slices using rewrite rules This lets us add associative variants more easily, which are helpful in the work on staging strided loads. * Don't hoist extract_element shuffles The Shuffle visitor wants to sink them * Add some static asserts * Add explanatory comment on shuffle hoisting * Fix comment * add lanes predicate to slice hoisting * add vector slice hoisting test cases Co-authored-by: Steven Johnson <srj@google.com> Co-authored-by: Alexander <ajroot@stanford.edu> | 21 January 2023, 22:08:30 UTC |
bafd60f | Alexander Root | 20 January 2023, 18:03:25 UTC | [x86 & wasm] Split up double saturating-narrows from i32 (#7280) * better x86 double sat-cast + add test * fix wasm too + test Co-authored-by: Steven Johnson <srj@google.com> | 20 January 2023, 18:03:25 UTC |
c601e4e | Steven Johnson | 20 January 2023, 17:43:56 UTC | Add workaround for the const-or-not user_context issue (#635) (#7291) Add a workaround for the const-or-not user_context issue (https://github.com/halide/Halide/issues/635) | 20 January 2023, 17:43:56 UTC |
2cc0468 | Steve Suzuki | 20 January 2023, 17:39:41 UTC | Fix issue in add_halide_runtime in cross-compilation (#7284) * Fix issue in add_halide_runtime in cross-compilation add_halide_runtime() tries to build generator executable, but it fails if we are working with cross-compiler toolchain. By using existing generator set as "FROM", we can work around this. | 20 January 2023, 17:39:41 UTC |
d44e99d | Steve Suzuki | 20 January 2023, 13:12:30 UTC | Fix error of add_halide_generator in cross-compilation (#7283) In case the project name is CamelCase, add_halide_generator() was not able to find the generator package, because CMake searches <name>Config.cmake or <lower-case-name>-config.cmake | 20 January 2023, 13:12:30 UTC |
147ff48 | Alex Reinking | 20 January 2023, 12:54:34 UTC | Remove dependency on platform threads library (#7297) * Refactor internal ThreadPool.h into halide_thread_pool.h tool * Drop dependency of libHalide on threads library * Remove other redundant uses of Threads::Threads * Update CMake documentation. | 20 January 2023, 12:54:34 UTC |
314b2fd | Alexander Root | 20 January 2023, 00:35:14 UTC | [HVX] Fix EliminateInterleaves (#7279) * fix EliminateInterleaves Co-authored-by: Steven Johnson <srj@google.com> | 20 January 2023, 00:35:14 UTC |
c9f3602 | Steven Johnson | 19 January 2023, 23:48:26 UTC | Remove the watchdog timer from generator_main(). It was intended to k… (#7295) Remove the watchdog timer from generator_main(). It was intended to kill pathologically slow builds, but in the environment it was added for (Google build servers), it ended up being redundant to existing mechanisms, and removing it allows us to remove a dependency on threading libraries in libHalide. | 19 January 2023, 23:48:26 UTC |
51a4f6c | Steven Johnson | 19 January 2023, 23:36:47 UTC | Emit prototypes for destructor functions in C Backend (#7296) We gathered up the destructors, but only emitted the prototypes if there was at least one non-C++ function declaration needed -- so if you built with cpp_name_mangling enabled, you might omit the right prototype. Fixed and added the right flag to a Generator test to tickle this behavior. | 19 January 2023, 23:36:47 UTC |
e8e1481 | Steven Johnson | 18 January 2023, 21:56:14 UTC | Drop support for MIPS (#7287) (#7289) * Drop support for MIPS (#7287) * Update Target.cpp | 18 January 2023, 21:56:14 UTC |
888c41c | Steven Johnson | 18 January 2023, 00:47:52 UTC | Add CMake support for C++ backend in test/generator (#7274) * Add support for C++ backend in test/generator When the CMake rules were rewritten a while back, the support for building/testing generators with the C++ backend (instead of the standard LLVM, etc) got lost. This adds it back in. Also made some drive-by fixes to the Makefile to enable some tests there that work correctly now. Also made a drive-by fix in in Codegen_C to fix allocation nodes that were just wrappers around buffer_get_host -- this prevented the cleanup_on_error test from building with the C++ backend. | 18 January 2023, 00:47:52 UTC |
0d43318 | Steven Johnson | 10 January 2023, 19:23:58 UTC | Optimize Module::compile() for some edge cases (#7269) * Optimize Module::compile() for some edge cases Avoid redundant `compile_to_buffer()` calls for output requests that can't possibly ever need them. * Avoid mutation | 10 January 2023, 19:23:58 UTC |
a8d88bb | Steven Johnson | 10 January 2023, 17:51:38 UTC | Use ::aligned_alloc() instead of std::aligned_alloc() in HalideBuffer.h (#7268) | 10 January 2023, 17:51:38 UTC |
eea7696 | Steven Johnson | 09 January 2023, 17:58:04 UTC | Update README_python.md (#7266) | 09 January 2023, 17:58:04 UTC |
c070bb8 | Alina Sbirlea | 06 January 2023, 00:00:34 UTC | Update change following LLVM WASM change f841ad30d77eeb4c51663e68efefdb734c7a3d07 (#7264) * Update change following LLVM WASM change https://github.com/llvm/llvm-project/commit/f841ad30d77eeb4c51663e68efefdb734c7a3d07 * Update checks conditional on LLVM version. | 06 January 2023, 00:00:34 UTC |
4b74049 | Andrew Adams | 05 January 2023, 21:12:50 UTC | Inline into extern function args during bounds inference (#7261) * Inline into extern function args during bounds inference Fixes #7260 * Run CSE once at the end * Actually recursively inline * clang-tidy * trigger buildbots * Make test invariant to the number of times the warning is printed as long as it's at least once Co-authored-by: Steven Johnson <srj@google.com> | 05 January 2023, 21:12:50 UTC |
04bb986 | Steven Johnson | 28 December 2022, 17:31:01 UTC | Conditional allocations shouldn't fail for size=0 in C++ backend (#7255) (#7256) * Conditional allocations shouldn't fail for size=0 in C++ backend (#7255) Allocations can be conditional; if the condition evaluates to false, we end up calling `halide_malloc(0)` (or `halide_tcm_malloc(0)` in the xtensa branch). Since it's legal via spec for `malloc(0)` to return nullptr, we need to be cautious here: if we are compiling with assertions enabled, *and* have a malloc() (etc) implementation that returns nullptr for alloc(0), we need to skip the assertion check, since we know the result won't be used. Note: a similar check will be inserted in the xtensa branch separately. Note 2: LLVM backend already has this check via Codegen_Posix.cpp * Update CodeGen_C.cpp | 28 December 2022, 17:31:01 UTC |
ade8b56 | Steven Johnson | 20 December 2022, 20:05:19 UTC | Remove deprecated halide_target_feature_disable_llvm_loop_opt (#7247) * Remove deprecated halide_target_feature_disable_llvm_loop_opt Was deprecated in Halide 15; let's remove in Halide 16 * trigger buildbots * trigger buildbots * Update CodeGen_LLVM.cpp | 20 December 2022, 20:05:19 UTC |
10345d4 | Andrew Adams | 16 December 2022, 17:56:08 UTC | Explicitly stage strided loads (#7230) * Add a pass to do explicit densification of strided loads * densify more types of strided load * Reorder downsample in local laplacian for slightly better performance * Move allocation padding into the IR. Still WIP. * Simplify concat_bits handling * Use evidence from parent scopes to densify * Disallow padding allocations with custom new expressions * Add test for parent scopes * Remove debugging prints. Avoid nested ramps. * Avoid parent scope loops * Update cmakefiles * Fix for large_buffers * Pad stack allocations too * Restore vld2/3/4 generation on non-Apple ARM chips * Appease clang-format and clang-tidy * Silence clang-tidy * Better comments * Comment improvements * Nuke code that reads out of bounds * Fix stage_strided_loads test * Change strategy for loads from external buffers Some backends don't like non-power-of-two vectors. Do two overlapping half-sized loads and shuffle instead of one funny-sized load. * Add explanatory comment to ARM backend * Fix cpp backend shuffling * Fix missing msan annotations * Magnify heap cost effect in stack_vs_heap performance test * Address review comments * clang-tidy * Fix for when same load node occurs in two different allocate nodes | 16 December 2022, 17:56:08 UTC |
382f813 | Steven Johnson | 16 December 2022, 17:54:21 UTC | Fix "may be used uninitialized" warnings in Codegen_C::print_scalarized_expr() (#7244) | 16 December 2022, 17:54:21 UTC |
f191715 | Derek Gerstmann | 15 December 2022, 21:32:11 UTC | Remove empty lines in main | 15 December 2022, 21:32:11 UTC |
fcf0b50 | Derek Gerstmann | 15 December 2022, 21:31:06 UTC | Hookup API methods for get/set alloc_config when initializing the VulkanMemoryAllocator | 15 December 2022, 21:31:06 UTC |
3ca8870 | Derek Gerstmann | 15 December 2022, 21:30:38 UTC | Remove leftover debug ifdef | 15 December 2022, 21:30:38 UTC |
e08c646 | Derek Gerstmann | 15 December 2022, 21:29:08 UTC | Add get/set alloc_config methods and API hooks for configuring the VulkanMemoryAllocator | 15 December 2022, 21:29:08 UTC |
14c4363 | Derek Gerstmann | 15 December 2022, 21:27:36 UTC | Support any arbitary number of devices and queues for context creation Fix typos in comments | 15 December 2022, 21:27:36 UTC |
14d3ab7 | Derek Gerstmann | 15 December 2022, 21:26:47 UTC | Handle error case for uninitialized buffer allocation (rather than abort) Fix typos in comments | 15 December 2022, 21:26:47 UTC |
acd5ea5 | Derek Gerstmann | 15 December 2022, 21:25:15 UTC | Rename copy_upto(...) method to be copy_up_to(...) | 15 December 2022, 21:25:15 UTC |
fea02d5 | Derek Gerstmann | 15 December 2022, 21:24:53 UTC | Fix typo in comments | 15 December 2022, 21:24:53 UTC |
b4c9bea | Derek Gerstmann | 15 December 2022, 21:24:24 UTC | Remove leftover debug ifdef | 15 December 2022, 21:24:24 UTC |
f5d70e8 | Derek Gerstmann | 15 December 2022, 21:23:53 UTC | Fix typos and logic for Vulkan capabilities | 15 December 2022, 21:23:53 UTC |
c526891 | Derek Gerstmann | 15 December 2022, 21:20:55 UTC | Add static_assert to rotl to make compilation errors clearer (instead of using enable_if) Fix debug(3) formatting to avoid super long messages Use lookup table for SPIR-V op code names | 15 December 2022, 21:20:55 UTC |
93c5df5 | Derek Gerstmann | 15 December 2022, 21:18:57 UTC | Fix typos in comments | 15 December 2022, 21:18:57 UTC |
3f731c2 | Derek Gerstmann | 15 December 2022, 21:18:11 UTC | Change value casts to match Halide conventions | 15 December 2022, 21:18:11 UTC |
9a54485 | Derek Gerstmann | 15 December 2022, 21:17:24 UTC | Fix typos and address review comments for Vulkan readme | 15 December 2022, 21:17:24 UTC |
446b34f | Derek Gerstmann | 15 December 2022, 21:16:39 UTC | Cleanup formatting for Halide version info in Makefile | 15 December 2022, 21:16:39 UTC |
da6746e | Steven Johnson | 14 December 2022, 05:29:45 UTC | correctness/exception.cpp needs to check HALIDE_WITH_EXCEPTIONS (fixes #7240) (#7241) correctness/exception.cpp needs to check HALIDE_WITH_EXCEPTIONS | 14 December 2022, 05:29:45 UTC |
1a4a469 | Andrew Adams | 13 December 2022, 16:11:54 UTC | Fix some sources of signed integer overflow in the compiler (#7231) * Fix some sources of signed integer overflow in the compiler Also, use compiler intrinsics when possible to handle overflow, as it generates faster code. * Fix msvc macro * Must use result * Actually perform the requested operation | 13 December 2022, 16:11:54 UTC |
533e6e5 | Steven Johnson | 12 December 2022, 20:54:46 UTC | Remove rogue string suffix in simd_op_check_arm.cpp (#7227) * Remove rogue string suffix in simd_op_check_arm.cpp Interestingly, it compiles here, but in some compilers it will fail with "unexpected token". * Update simd_op_check_arm.cpp | 12 December 2022, 20:54:46 UTC |
6ecdcbd | Steven Johnson | 11 December 2022, 18:05:55 UTC | Tighten alignment promises for halide_malloc() (#7222) This makes a couple of changes to the behavior/implementation of `halide_malloc()`: * Currently, halide_malloc must return a pointer aligned to the maximum meaningful alignment for the platform for the purpose of vector loads and stores. This PR also adds the requirement that the memory returned must be legal to access in an integral multple of alignment >= the requested size (in other words: you should be able to do vector load/stores "off the end" without causing any faults). * Currently, the `halide_malloc_alignment()` function is used to determine the default alignment; this cannot be overridden by user code (well, it can be, but the override will have no useful effect). It is intended to be "internal only" but is used in at least one place outside the runtime (apps/hannk). This change removes the call entirely, in favor of a call that is harder to access from outside the runtime and much less likely for end users to attempt to call. (It also changes apps/hannk to stop using it.) | 11 December 2022, 18:05:55 UTC |
16421a7 | Steven Johnson | 09 December 2022, 17:21:30 UTC | Revise simd_op_check tests to ignore HL_TARGET (#7207) (#7216) * Revise simd_op_check tests to ignore HL_TARGET (#7207) The simd_op_check tests have historically only run using the value of HL_TARGET, which mean that the coverage they had was low (since HL_TARGET is only set to values that are runnable on at least one buildbot). This change completely disconnects these tests from HL_TARGET; instead, each test now tests for a range of targets appropriate to the architecture being tested. On all platforms, they still compile to assembly and verify that the correct instructions are generated; additionally, if the host platform can JIT for the given target, it verifies that the results are as expected. * Update simd_op_check_riscv.cpp * Update simd_op_check_x86.cpp * Update simd_op_check_x86.cpp * Update simd_op_check_arm.cpp * Add more features that must match; re-enable the bfloat instructions * Update simd_op_check_x86.cpp * Update simd_op_check_riscv.cpp * trigger buildbots * Fix simd_op_check_wasm | 09 December 2022, 17:21:30 UTC |
ba31688 | Steven Johnson | 09 December 2022, 01:22:54 UTC | Increase __clang_major__ check in Float16.h to 16 (#7224) | 09 December 2022, 01:22:54 UTC |
066559b | Steven Johnson | 08 December 2022, 23:35:01 UTC | Remove check_jit_user_context() from V8 bindings (#7220) Obsolete code from early V8 work, it can trigger inappropriately in some corner-case scenarios. Remove it entirely to avoid false errors. | 08 December 2022, 23:35:01 UTC |
8fa8221 | Steven Johnson | 08 December 2022, 04:34:13 UTC | Fix bonehead version-checking test in HalideBuffer.h for Apple (#7218) | 08 December 2022, 04:34:13 UTC |
e8615bb | Steven Johnson | 08 December 2022, 01:22:38 UTC | clang-tidy: add [[maybe-unused]] to the DECLARE_NO_INITMOD stubs. (#7215) | 08 December 2022, 01:22:38 UTC |
a7fa32e | Steven Johnson | 07 December 2022, 17:31:01 UTC | Use aligned_alloc() as default allocator for HalideBuffer.h on most platforms (#7190) Use aligned_alloc() as default allocator for HalideBuffer.h on most platforms (See also https://github.com/halide/Halide/pull/7189) Modify H::R::Buffer to default to using `aligned_alloc()` instead of `malloc()`, except: - If user code passes a non-null `allocate_fn` or `deallocate_fn`, we always use those (and/or malloc/free) - If the code is compiling under MSVC, never use `aligned_alloc` (Windows doesn't support it) - If HALIDE_RUNTIME_BUFFER_USE_ALIGNED_ALLOC is defined to be 0, never use `aligned_alloc` (this is to allow for usage on e.g. older Android and OSX versions which don't provide `aligned_alloc()` in the stdlib, regardless of C++ versions.) Also, as with #7189, this ensures that the allocated space has the start of the host data as 128-aligned, and also now ensures that the size allocated 128-aligned (rounding up as needed). | 07 December 2022, 17:31:01 UTC |
8ce1212 | Steven Johnson | 07 December 2022, 17:29:19 UTC | Fix bitrot in PowerPC testing (#7211) * Fix bitrot in PowerPC testing (See #7208) - DataLayout was wrong (and has been for a long time) - simd_op_check_powerpc had errors. Some were easy to fix; the rest I commented out with a TODO since this backend doesn't appear to be in active use. (Want to fix this in preparation for fixing #7207) * Move x86 absd tests to the right place Co-authored-by: Andrew Adams <andrew.b.adams@gmail.com> | 07 December 2022, 17:29:19 UTC |
35020c5 | Zalman Stern | 07 December 2022, 07:15:46 UTC | Extend LLVM IR type mangling to handle scalars. (#7212) Extend LLVM IR type mangling to handle scalars and use this in vector predication intrinsic codegen. Fixes an error denerating vector predicated strided stores. | 07 December 2022, 07:15:46 UTC |
d4b4c50 | Zalman Stern | 07 December 2022, 07:15:10 UTC | Add RISC V zvl flag for LLVM version 16 or greater. (#7209) | 07 December 2022, 07:15:10 UTC |
e0d1e15 | Zalman Stern | 07 December 2022, 00:58:28 UTC | Fix issue with vector predicated comparison and select instructions. (#7205) Fix invalid LLVM IR issues with vector predicated comparison and select instructions. Add start of RISC V simd_op_check test. | 07 December 2022, 00:58:28 UTC |
59f5412 | Zalman Stern | 06 December 2022, 22:57:24 UTC | Add bridging for clang _Float16 type. (#7201) Add type bridging between Halide::float16_t and _Float16 if the compiler supports the latter. Testing is done using clang specific logic and may need to be extended for other compilers. I chose not to add support for __fp16 and __bf16 right now as __fp16 is less useful in being storage only and __bf16 also only supports a subset of operations and was running into undefined symbols during compilation that did not look promising. Co-authored-by: Steven Johnson <srj@google.com> | 06 December 2022, 22:57:24 UTC |
90459b0 | Steven Johnson | 06 December 2022, 00:53:05 UTC | Revert "Fix for top-of-tree LLVM" (#7200) Revert "Fix for top-of-tree LLVM (#7194)" This reverts commit a9ea9b565018774e52bb4028cbc91e14cb86959e. | 06 December 2022, 00:53:05 UTC |
09908f3 | Derek Gerstmann | 05 December 2022, 22:34:35 UTC | Clang format pass | 05 December 2022, 22:34:35 UTC |
f752734 | Derek Gerstmann | 05 December 2022, 22:29:20 UTC | Update Vulkan readme with latest status. Everything works! More or less. =) | 05 December 2022, 22:29:20 UTC |
be6b83d | Derek Gerstmann | 05 December 2022, 22:25:28 UTC | Merge branch 'main' into vulkan-phase2-runtime | 05 December 2022, 22:25:28 UTC |
f0cc13b | Derek Gerstmann | 05 December 2022, 22:23:42 UTC | Disable Vulkan from python AOT tests and tutorials (since it requires linkage against the vulkan loader system library). | 05 December 2022, 22:23:42 UTC |
9805a29 | Derek Gerstmann | 05 December 2022, 22:22:53 UTC | Disable Vulkan performance test for async gpu (for now). | 05 December 2022, 22:22:53 UTC |
f1c004d | Derek Gerstmann | 05 December 2022, 22:22:31 UTC | Enable Vulkan asyc_device_copy test. | 05 December 2022, 22:22:31 UTC |
a6ee0c3 | Derek Gerstmann | 03 December 2022, 01:08:05 UTC | Add support for dynamic shared memory allocations for Vulkan Add dynamic workgroup dispatching to Vulkan Add optional feature flags for Vulkan capabilities Add Vulkan API version flags for target features Enable v1.3 path if requested Re-enable tests for added features Update Vulkan docs with status updates and feature flags | 03 December 2022, 01:08:05 UTC |
52982ab | Derek Gerstmann | 03 December 2022, 00:56:50 UTC | Update SPIR-V headers to v1.6 | 03 December 2022, 00:56:50 UTC |
345cf18 | Steven Johnson | 02 December 2022, 20:48:57 UTC | Don't attempt to use makecontext()/swapcontext() on Android (#7196) Despite being 'posixy', it doesn't actually implement these calls. | 02 December 2022, 20:48:57 UTC |
a9ea9b5 | Steven Johnson | 02 December 2022, 00:17:48 UTC | Fix for top-of-tree LLVM (#7194) | 02 December 2022, 00:17:48 UTC |