https://github.com/halide/Halide

sort by:
Revision Author Date Message Commit Date
c8dcb4c Fix for top-of-tree LLVM (#8421) * Fix for top-of-tree LLVM * Update simd_op_check_sve2.cpp 17 September 2024, 15:37:41 UTC
4d368bf Reschedule the matrix multiply performance app (#8418) Someone was using this as a reference expert schedule, but it was stale and a bit simplistic for large matrices. I rescheduled it to get a better fraction of peak. This also now demonstrates how to use rfactor to block an sgemm over the k axis. 15 September 2024, 14:33:28 UTC
6fb13b7 Add missing backslash (#8419) 15 September 2024, 14:32:24 UTC
a65221b Include our Markdown documentation in the Doxygen site. (#8417) A few quirks in the Markdown parser were worked around here. The most notable is that the sequence `]:` causes Doxygen to interpret a would-be link as a trailing reference even if it is not at the start of a line. Duplicating the single bracket reference is a portable workaround, i.e. [winget] ~> [winget][winget] It also doesn't stop interpreting `@` directives inside inline code, so it warns about our use of the `@` as a decorator symbol inside Python.md. 10 September 2024, 02:54:29 UTC
3e6e7e0 Link to PyPI from Doxygen index.html (#8415) 09 September 2024, 23:08:26 UTC
07fecc9 Make run-clang-tidy.sh work on macOS (#8416) 09 September 2024, 23:08:09 UTC
f658eec Fix classifier spelling (#8413) PyPI rejected this because of a spacing issue. 07 September 2024, 17:03:44 UTC
37300e3 Merge pull request #8412 * Update pip package metadata * Link to the CMake package docs from Doxygen * Fix invalid Doxygen annotation in Serialization.h 06 September 2024, 19:33:54 UTC
63609cc Document how to find Halide from a pip installation (#8411) 06 September 2024, 18:19:27 UTC
3a34741 Big documentation update (#8410) 05 September 2024, 21:17:03 UTC
95ebd01 Pip packaging at last! (#8405) 04 September 2024, 03:43:07 UTC
b3c8c8b Support CMAKE_OSX_ARCHITECTURES (#8390) 04 September 2024, 00:53:03 UTC
97eeaf0 Update README.md (#8404) The instructions for which llvm to acquire were stale 02 September 2024, 04:32:50 UTC
b87f2b1 Fix _Float16 detection on ARM64 GCC<13 (#8401) GCC 12 only supports _Float16 on x86. Support for ARM was added in GCC 13. This causes a build failure in the manylinux_2_28 images. 29 August 2024, 16:54:41 UTC
45518ac Fix incorrect std::array sizes in Target.cpp (#8396) 23 August 2024, 17:00:23 UTC
9864bd4 Fix bundling error on buildbots (#8392) LLVM as it is built on the buildbots depends on `-lrt`, which is not a target. Filter out non-target dependencies from consideration. 16 August 2024, 20:51:21 UTC
4f30d2b Partially apply clang-tidy fixes we don't enforce yet (#8376) * Partially apply clang-tidy fixes we don't use yet - Put a bunch of stuff into anonymous namespaces - Delete some redundant casts (e.g. casting an int to int) - Add some const refs to avoid copies - Remove meaningless inline qualifiers on in-class definitions and constexpr functions - Remove return-with-value from functions returning void - Delete a little dead code - Use std::min/max where appropriate - Don't use a variable after std::forwarding it. It may have been moved from. - Use std::string::empty instead of comparing length to zero * Undo unintentional formatting change * Restore some necessary casts * Add NOLINT to silence older clang-tidy 16 August 2024, 18:41:55 UTC
818f42d Fix for the removed DataLayout constructor. (#8391) * Fix for the removed DataLayout constructor. * Update CodeGen_LLVM.cpp * Update CodeGen_LLVM.cpp * Update CodeGen_LLVM.cpp --------- Co-authored-by: Steven Johnson <srj@google.com> 13 August 2024, 22:03:31 UTC
6dcdfb5 Support using vcpkg to build dependencies on all platforms (#8387) This PR adds support for using vcpkg to acquire and build Halide's dependencies on all platforms. It adds a top-level `vcpkg.json` file that explains the relationship between Halide's features and its dependencies. These features include the various LLVM `target-`s (which merely imply a dependency on the corresponding LLVM backend), `serialization` (flatbuffers), the `python-bindings` (pybind11), the `wasm-executor` (wabt), and a few meta-features: * `jit`: enables LLVM targets corresponding to the host system * `target-all`: enables all LLVM targets * `tests`: depends on everything needed for the tests and apps * `developer`: includes all other features All of these are optional (since x86 and WebAssembly are forced), but `jit` and `serialization` are on by default. vcpkg is intended to be an eventual replacement for FetchContent, at least on the buildbots. It will accelerate builds beyond ccache by directly restoring binary caches for our dependencies. Unlike FetchContent, it does not pollute our build with third-party CMake code. Indeed, our build has no idea at all when vcpkg is in use. The primary drawback is that vcpkg installation happens during (or ahead of) configuration time, so there is some initial wait. ## Try it! I have provided many CMake presets to ease adoption. As long as you have `VCPKG_ROOT` set to a fresh clone of `vcpkg`, they should work. They come in two flavors: * `vcpkg`: this acquires dependencies from the main vcpkg registry, but applies our own overlay, which disables building Python 3 (really!) and LLVM. The system is searched for these as usual. * `vcpkg-full`: this disables the Halide overlay and attempts to build ALL dependencies. All these presets enable the `developer` feature in `VCPKG_MANIFEST_FEATURES`, which can be overridden in the usual way. Here are the commands you should use to try it locally: * On Linux or Windows: `cmake --preset release-vcpkg` * On macOS: `cmake --preset macOS-vcpkg` * To use Visual Studio: `cmake --preset win32`. Here, `vcpkg` is implied and `-vcpkg-full` can be added to build LLVM. 12 August 2024, 02:25:00 UTC
6dc2b3e Rewrite bundle_static to be much more efficient. (#8386) The `bundle_static` function now detects the private static dependencies on the given target (in our case, always Halide) and uses the platform librarian tool to merge static dependencies into a static library. It picks which tool to use by checking, in order: * When targeting Windows, it looks for `lib.exe`. * When targeting macOS, it checks if `libtool` is the Apple libtool. * Whether `ar` is GNU ar and if so, generates an MRI script. * Otherwise, a `FATAL_ERROR` is issued. To mark a static library for bundling, we link privately and use the `$<BUILD_LOCAL_INTERFACE:...>` generator expression. This prevents it from being exported, too. The generator expression that implements this logic is quite complex. It involves meta-programming generator expressions during evaluation and then evaluating them. Even so, this saves a considerable amount of time unpacking LLVM into a temporary directory and adding the objects to the link line (the previous approach). 12 August 2024, 02:24:25 UTC
3cdeb53 Scan generated export files to determine dependencies. (#8385) This commit contains a module for declaring that an export file might depend on another CMake package that was found by find_package. Such dependencies are collected in a project-wide property (rather than a variable) along with a snippet of code that reconstructs the original call. Then, after we have installed an export file via install(EXPORT), we can call a helper to add install rules that will read the file as-generated by CMake to check whether any of these packages could be required. CMake does not like to expose this information, in part because generator expressions make computing the eventual link set undecidable. Even so, for our purposes if Pkg:: appears in our link-libraries list, then we need to find_package(Pkg). This module implements that heuristic. So why is this hard? It's because checking whether a dependency is actually included is very complicated. A library will appear if: 1. It is SHARED or MODULE 2. It linked privately to a STATIC target - These appear as $<LINK_ONLY:${dep}> 3. It is STATIC and linked publicly to a SHARED target; 4. It is INTERFACE or ALIAS and linked publicly 5. It is included transitively via (4) and meets (1), (2), or (3) 6. I am not sure this set of rules is exhaustive. There is an experimental feature in CMake 3.30 that will some day replace this module. 10 August 2024, 02:55:07 UTC
7b53a88 Introduce HalideFeatures system for optional components (#8384) Previously, our `option()` declarations were scattered and not well documented. They certainly weren't self-documenting. Some of them depended on other options and used various ways to handle conflicts. Sometimes inconsistencies were handled with fatal errors, other times by silently overriding an option. With this PR, we introduce a new `Halide_feature` function that is designed to handle interdependent options and default initialization in a much more regular way. It behaves very much like option in its first three parameters: Halide_feature(CMAKE_FLAG "documentation string" DEFAULT_VALUE) Only now `DEFAULT_VALUE` can be more intelligent than simply `ON` or `OFF`. It can also be `TOP_LEVEL`, which is `ON` iff `CMAKE_PROJECT_TOP_LEVEL` is true. It can also be `AUTO` which is `ON` iff the `DEPENDS` clause is defined and true. For example, Halide_feature(WITH_TEST_RUNTIME "Build runtime tests" AUTO DEPENDS NOT MSVC) If a feature is set to `ON` but its `DEPENDS` clause is false, a warning will be issued and the feature will be forced `OFF` in the cache. Furthermore, these features register their documentation strings with the built-in `FeatureSummary` system so now instead of a stream of easy-to-miss messages, the configuration ends with a summary of what is enabled and disabled: -- The following features have been enabled: * Halide_ENABLE_EXCEPTIONS, Enable exceptions in Halide * Halide_ENABLE_RTTI, Enable RTTI in Halide * WITH_AUTOSCHEDULERS, Build the Halide autoschedulers * WITH_PACKAGING, Halide's CMake package install rules * WITH_PYTHON_BINDINGS, Halide's native Python module (not the whole pip package) * WITH_SERIALIZATION, Include experimental Serialization/Deserialization code * WITH_TESTS, Halide's unit test suite * WITH_TUTORIALS, Halide's tutorial code * WITH_UTILS, Optional utility programs for Halide, including HalideTraceViz * WITH_TEST_AUTO_SCHEDULE, Build autoscheduler tests * WITH_TEST_CORRECTNESS, Build correctness tests * WITH_TEST_ERROR, Build error tests * WITH_TEST_WARNING, Build warning tests * WITH_TEST_PERFORMANCE, Build performance tests * WITH_TEST_GENERATOR, Build generator tests * WITH_TEST_RUNTIME, Build runtime tests -- The following features have been disabled: * WITH_DOCS, Halide's Doxygen documentation * WITH_TEST_FUZZ, Build fuzz tests A feature may be marked as `ADVANCED`, which excludes it from the feature summary unless the log level is set to verbose. It also marks it as advanced in the cache, which hides it from the default view in the CMake GUI and the curses-TUI. Finally, features are computed early in the build so that subdirectories see a consistent view. Some generator tests that were broken under static Halide (meaning no autoschedulers) are now properly skipped by directly checking `WITH_AUTOSCHEDULERS`. 09 August 2024, 22:30:51 UTC
ff538b1 Reflow src/CMakeLists.txt in logical groups (#8383) * style: move core features closer to library definition * style: move target export script to its own section * style: group LLVM and GPU backends together 09 August 2024, 19:10:20 UTC
ba08522 Remove vestigial AMDGPU backend (#8382) The backend was started in 2018 but never completed. Removing the stale references reduces confusion. 09 August 2024, 16:58:42 UTC
0058528 Rework LLVM into Find module and enact new component policy. (#8379) Our usage of LLVM now requires at least the X86 and WebAssembly backends. We also now unconditionally enable all backends supported by the LLVM we found. 09 August 2024, 16:25:32 UTC
8643007 Fix Numpy 2.0 compatibility bug in lesson 10 (#8381) Numpy 2.0 no longer performs narrowing conversions automatically. We manually mask here instead. Fixes #8380 09 August 2024, 16:23:15 UTC
6f650c6 Two more build fixes (#8371) * Integration test: do forward C/CXX compiler to the inner CMake invocation * `_Float16`: on i386, needs gcc14 + SSE2 It is not known by GCC13: https://ci.debian.net/packages/h/halide/testing/i386/50047733/ and fails with ``` /usr/bin/g++ -DHALIDE_ENABLE_RTTI -DHALIDE_VERSION_MAJOR=18 -DHALIDE_VERSION_MINOR=0 -DHALIDE_VERSION_PATCH=0 -DHALIDE_WITH_EXCEPTIONS -isystem /usr/include/halide18 -O3 -DNDEBUG -MD -MT CMakeFiles/main.dir/main.cpp.o -MF CMakeFiles/main.dir/main.cpp.o.d -o CMakeFiles/main.dir/main.cpp.o -c /tmp/autopkgtest.pviDWM/build.Sjp/src/test/integration/jit/main.cpp In file included from /tmp/autopkgtest.pviDWM/build.Sjp/src/test/integration/jit/main.cpp:1: /usr/include/halide18/Halide.h: In member function ‘Halide::float16_t::operator _Float16() const’: /usr/include/halide18/Halide.h:3054:40: error: SSE register return with SSE2 disabled 3054 | explicit operator _Float16() const { | ^ /usr/include/halide18/Halide.h:3057:16: error: SSE register return with SSE2 disabled 3057 | return result; | ^~~~~~ /usr/include/halide18/Halide.h: In constructor ‘Halide::Expr::Expr(_Float16)’: /usr/include/halide18/Halide.h:4679:64: error: invalid conversion from type ‘_Float16’ without option ‘-msse2’ 4679 | : IRHandle(Internal::FloatImm::make(Float(16), (double)x)) { | ^ ninja: build stopped: subcommand failed. ``` with GCC14. 09 August 2024, 16:18:19 UTC
56f14c8 Replace FetchContent with a custom dependency provider (#8378) The build no longer uses FetchContent, instead using find_package always and everywhere. When Halide is the top-level project, it will (by default) inject a dependency provider that overrides the wabt, flatbuffers, and pybind11 packages with FetchContent. Users can opt out by setting Halide_USE_FETCHCONTENT=NO. This also bumps the required wabt version to the latest release (1.0.36). This version includes a patch I submitted that fixes the CMake package when wabt is built with OpenSSL rather than picosha2. Here are relevant links to the docs: * https://cmake.org/cmake/help/latest/guide/using-dependencies/index.html#dependency-providers * https://cmake.org/cmake/help/latest/command/cmake_language.html#dependency-providers 09 August 2024, 01:40:20 UTC
3ed55b4 Move dependencies/wasm to use sites (#8377) Also replace WITH_WABT and WITH_V8 with Halide_WASM_BACKEND, which can be either wabt, V8, or a CMake false-y value such as OFF. Deprecation notices are provided to ease user transitions. 08 August 2024, 18:23:44 UTC
8feee81 Use a Find module for NodeJS (#8374) 08 August 2024, 03:39:50 UTC
206c03f Use a Find module for V8 (#8373) This also adjusts the cache variable names to follow the conventions set forth in the CMake documentation, here: https://cmake.org/cmake/help/latest/manual/cmake-developer.7.html#standard-variable-names 08 August 2024, 03:39:31 UTC
59da730 Clean up autoscheduler dependencies (#8372) 07 August 2024, 22:31:19 UTC
40ab265 List headers with target_sources FILE_SETS (#8370) Removes instances of target_include_directories and installation rules based on those directories. These are now automatically computed from the BASE_DIRS (defaults to current source dir) argument to target_sources. This models the build more accurately and avoids accidental installation of unwanted headers. Also forces us to think about the linking relationships between components; ideally this will result in a more accurate build graph. 07 August 2024, 14:57:28 UTC
17bd517 Clean up serialization build code (#8369) 06 August 2024, 21:44:19 UTC
37ab461 Distribute GenGen as a static library (#8367) Also use a mutex and timestamp checking to ensure that multiple generators in the same directory do not race to place Halide.dll next to them on Windows. 06 August 2024, 15:36:19 UTC
1a7b914 Quick CMake fixes enabled by 3.28 (#8365) * Use FindCUDAToolkit in apps/cuda_mat_mul * Replace dummy FindHalide.cmake with pkg redirects Having the dummy file on disk is confusing and is easy to accidentally install when modifying CMake install rules. Better to use the CMake 3.24+ feature of the package redirects dir to truly disable find_package for Halide inside the build. * Avoid creating a dummy file for Halide_Python * Fix formatting in CMakeLists.txt * Add SpirvIR.h to the list of Halide sources * Use BUILD_LOCAL_INTERFACE in PyStubs * Consistently use HALIDE_H variable * Comment overriding POSITION_INDEPENDENT_CODE * Check Halide_STATIC_DEFINE at configure time. This avoids sending a generator expression downstream. These are functionally identical, but it's just one less thing to evaluate. * Use BUILD_LOCAL_INTERFACE for SPIRV-Headers 02 August 2024, 18:45:58 UTC
14035e3 Make pybind11 minimum version check compatible with pybind11 v3. (#8366) Concretely: https://github.com/pybind/pybind11/blob/48f25275c44d52d0ceade122e328dc1f2e48ef44/include/pybind11/detail/common.h#L12-L14 This is needed for a Google-internal deployment, but is a useful fix regardless. 02 August 2024, 17:29:08 UTC
1872788 Add helper functions to query properties of the lowered Target (#8192) (#8359) * Add helper functions to query properties of the lowered Target (#8192) * Add Python bindings * clang-format * clang-format * Add comments 01 August 2024, 16:11:07 UTC
837308f Bump CMake minimum version to 3.28 (#8363) This is in line with our policy to track the version included in the latest Ubuntu LTS. Version 24.04 LTS now includes CMake 3.28. 01 August 2024, 13:41:55 UTC
77e5dd1 Remove warning for unsupported compilers (#8362) 01 August 2024, 02:25:59 UTC
423df3c `Python_bindings`-test-as-installed (#8355) Support not building python bindings, while running python tests against installed halide, and call `enable_testing()` there so that `ctest` can work. 31 July 2024, 16:15:47 UTC
15c181f Drop support for LLVM 16 in main (#8358) * Drop support for LLVM 16 in main Per policy, Halide 19 will support LLVM 17, 18, 19 (plus top-of-tree which is 20) * clang-format 29 July 2024, 21:04:49 UTC
c7e1b99 Bump Halide version to 19 in main branch (#8357) * Bump Halide version to 19 in main branch * Update setup.py 29 July 2024, 20:52:40 UTC
e9b9bdc Don't use le32/le64 (#8344) Use i386/x86-64 and wasm32/wasm64 targets instead of le32/le64 for the runtime. 26 July 2024, 18:13:49 UTC
5d1472f Allow LLVM 20 (#8352) 23 July 2024, 21:56:54 UTC
bebb888 Add ARMv8.x feature flags (#4489) * Add ARMv8.3a feature flag This allows selecting the ARMv8.3a feature set via a new Feature flag. We don't (yet) add any specialization to our codegen (beyond what LLVM will do for us under the hood). * Update CodeGen_ARM.cpp * Update CodeGen_ARM.cpp * Update CodeGen_ARM.cpp * Add Features for all the ARM v8.x architectures * Update CodeGen_ARM.cpp * Fixes * get_runtime_compatible_target() should use meet * Add ARMv8a * trigger buildbots 23 July 2024, 17:37:37 UTC
b741d9c Adaptive Dark colorscheme for Stmt HTML. Ability to programmatically export conceptual stmt files. (#8327) * A few color tweaks for a darker colorscheme. * Dark color scheme for Stmt HTML. Ability to programatically export the conceptual stmt files. * Toolbar for HTML Stmt viewer with various settings. * Cleanup. 16 July 2024, 15:34:53 UTC
a05f459 Fix injection of GPU buffers that do not go by a Func name (i.e. alloc groups). (#8333) * Fix injection of GPU buffers that do not go by a Func name (i.e. alloc groups). * Cleanup 16 July 2024, 15:34:08 UTC
0f34e2f Detect ARM CPU features for host target and in runtime (#8298) Adds feature detection for ARM CPUs to the runtime library and to the host target feature computation. Supports Windows, macOS, Linux, iOS, and Android. Also fix bug in Type::max() and Type::min() for float16. Fixes #4727 Fixes #6106 Fixes #7901 Fixes #7979 Fixes #8340 15 July 2024, 16:15:40 UTC
461c128 Fix incorrect output in Python tutorial, lesson 5 (#8331) 28 June 2024, 05:21:44 UTC
a6f5ca4 Remove remaining dregs of tuple_select (oops) (#8329) * Remove remaining dregs of tuple_select (oops) * Update tuple_select.py 27 June 2024, 00:01:10 UTC
a4a7531 Consider *all* Exprs a func uses, not just the RHS, in Li2018 (#8326) Fixes #8312 26 June 2024, 21:30:20 UTC
cab27d8 Fix horrifying bug in lossless_cast of a subtract (#8155) * Fix horrifying bug in lossless_cast of a subtract * Use constant integer intervals to analyze safety for lossless_cast TODO: - Dedup the constant integer code with the same code in the simplifier. - Move constant interval arithmetic operations out of the class. - Make the ConstantInterval part of the return type of lossless_cast (and turn it into an inner helper) so that it isn't constantly recomputed. * Fix ARM and HVX instruction selection Also added more TODOs * Using constant_integer_bounds to strengthen FindIntrinsics In particular, we can do better instruction selection for pmulhrsw * Move new classes to new files Also fix up Monotonic.cpp * Make the simplifier use ConstantInterval * Handle bounds of narrower types in the simplifier too * Fix * operator. Add min/max/mod * Add cache for constant bounds queries * Fix ConstantInterval multiplication * Add a simplifier rule which is apparently now necessary * Misc cleanups and test improvements * Add missing files * Account for more aggressive simplification in fuse test * Remove redundant helpers * Add missing comment * clear_bounds_info -> clear_expr_info * Remove bad TODO I can't think of a single case that could cause this * It's too late to change the semantics of fixed point intrinsics * Fix some UB * Stronger assert in Simplify_Div * Delete bad rewrite rules * Fix bad test when lowering mul_shift_right b_shift + b_shift < missing_q * Avoid UB in lowering of rounding_shift_right/left * Add shifts to the lossless cast fuzzer This required a more careful signed-integer-overflow detection routine * Fix bug in lossless_negate * Add constant interval test * Rework find_mpy_ops to handle more structures * Fix bugs in lossless_cast * Fix mul_shift_right expansion * Delete commented-out code * Don't introduce out-of-range shifts in lossless_cast * Some constant folding only happens after lowering intrinsics in codegen --------- Co-authored-by: Steven Johnson <srj@google.com> 26 June 2024, 16:08:15 UTC
9b703f3 Provide a minimum OS version for MachO objects (#8323) This gives LLVM enough information to generate a "platform load-command" in the object file. Fixes #7941 25 June 2024, 23:59:43 UTC
dd6c98b Correct the Halide version number in setup.py (#8325) 25 June 2024, 23:02:16 UTC
3d20677 Remove deprecated operators (#8321) tuple_select and the Internal versions of various fixed-point helpers were deprecated in Halide 17; we should remove them entirely for Halide 18. 25 June 2024, 22:24:12 UTC
a0e1dc0 Fix device slices for Buffer with fixed dimensionality in template. (#8313) Co-authored-by: Steven Johnson <srj@google.com> 25 June 2024, 22:23:57 UTC
8ff261e Per-pipeline-invocation profiling (#8153) * Profiler tracks per-invocation state, instead of global state This should give better results when multiple Halide pipelines are running at the same time. * Profiler improvements - Don't profile bounds queries - Simplify layout calculation - Bill time after decrementing main thread as overhead, not waiting on parallel tasks - Change waiting on parallel tasks label * name hygiene * Fix signature * Fix tracking of pipeline-level memory statistics * Address review comments * Pacify clang-tidy * [Hexagon] Profiling changes for abadams/per_instance_profiling (#8187) * Get abadams/per_instance_profiling working on hvx * More changes * Add Hexagon libraries * Fix multiple instances of profiler_state * Update hexagon libraries * clang-format --------- Co-authored-by: Steven Johnson <srj@google.com> Co-authored-by: aankit-quic <166656642+aankit-quic@users.noreply.github.com> 25 June 2024, 20:10:43 UTC
1449692 Remove Introspection (#8273) * Remove Introspection Introspection (to provide better error messages + automatic var/func/etc names) has always been kinda handy but kinda fragile, and with the evolution of the DWARF standard it's become broken for newer compilers. We don't have the bandwidth to fix it, and many large customers (e.g. Google) have never been able to rely on it, and given that it can cause crashes in some unusual situations (e.g. when embedded inside a Go app), it's time to say goodbye. Alas! Poor Introspection. I knew him, Horatio. A feature of infinite jest, of most excellent fancy. It hath borne me on his back a thousand times. * Update Deserialization.cpp --------- Co-authored-by: Andrew Adams <andrew.b.adams@gmail.com> 25 June 2024, 17:30:08 UTC
8c836b3 Update README_cmake.md (#8322) The requirements.txt is in the root of the repository now. 25 June 2024, 16:44:18 UTC
84bb8ee Fixes for top-of-tree LLVM (#8314) 24 June 2024, 23:15:59 UTC
5f6fc26 [vulkan] Dynamically load Vulkan loader library. Avoid Validation Layer crash on exit. (#8289) * Remove the compile-time link dependency for the Vulkan loader, and resolve the instance methods dynamically. Update the Vulkan readme to match the latest information regarding the SDK packages. * Formatting pass.w * Add runtime check to verify shared memory amount used in pipeline can be run on device * Fix platform ifdefs for Vulkan library names (normal ones arent defined when the runtime is compiled). * Detect if VK_LAYER_KHRONOS_validation is enabled, and bypass the module destructor which calls halide_vulkan_device_release() to avoid a segfault (at the cost of leaking!). See https://github.com/halide/Halide/issues/8290. Refactor and cleanup halide_vulkan_device_release(). Add vk_destroy_context() methods. * Fix GPU object lifetime AOT test to use TEST_VULKAN macro. * Fix clang tidy warning for usage of static in anonymous namespace * Disable Vulkan validation layer for leak tests (or we'll leak). * Add vk_validate_shader_for_device() method to check shader bindings against device limits prior to compiling to verify shader compatibility. --------- Co-authored-by: Derek Gerstmann <dgerstmann@adobe.com> Co-authored-by: Andrew Adams <andrew.b.adams@gmail.com> 23 June 2024, 21:00:55 UTC
61df9ba Add ability to pass explicit RDom to Function::define_update (#8284) * Add ability to pass explicit RDom to Function::define_update And use it in rfactor. There are cases where an RDom is attached to the original Func but not actually referred to in the LHS or RHS. Fixes #8282 * Fix comment 23 June 2024, 02:37:18 UTC
22367de Don't try to codegen predicated atomic stores (#8285) * Don't try to codegen predicated atomic stores By disabling predication if an Atomic node is found. Fixes #8280. * Add clarifying comment 23 June 2024, 02:37:05 UTC
155d693 Fix incorrect type in emulation of float16 is_inf/nan (#8310) Fixes #8309 22 June 2024, 23:15:23 UTC
198c25e scoped_truth for the loop variable being always less than the loop extent. (#8306) * scoped_truth for the loop variable being always less than the loop extent. * Correctify the range. * Complementary scoped_truth for the loop lower bound. 21 June 2024, 23:03:52 UTC
b921710 Fix OpenCL positive and negative INF constants. (#8266) 20 June 2024, 19:12:15 UTC
ea775cc Use upstream interface for consuming SPIR-V (#8265) 20 June 2024, 15:59:42 UTC
f9ccd5c No longer silently hide errors in Metal completion handlers (alternative approach) (#8240) * No longer silently hide errors in Metal completion handlers * Actually implement alternative * clang-format * Implement new API * Implement test and refine the API * Format. * Remove some debug code * Add missing includes. * Add comment noting why we manually null-terminate after strncpy * Reverse engineer Objective-C API for passing void* in a block; it turns out to be much simpler than I thought * Formatting * Don't add const-ness to declaration. --------- Co-authored-by: Steven Johnson <srj@google.com> 14 June 2024, 18:24:17 UTC
6c8a491 Fix typo in Simplify_Let.cpp (#8274) 10 June 2024, 16:38:57 UTC
340136f Stop region costs from complaining about new intrinsics (#8262) Now by default it will treat them as cost one, unless you tell it otherwise. 07 June 2024, 18:28:21 UTC
4b67712 [vulkan] Fix Vulkan SIMT mappings for GPU loop vars. (#8259) * Fix Vulkan SIMT mappings for GPU loop vars. Previous refactoring accidentally used the fully qualified var name rather than the categorized vulkan intrinsic name. * Avoid formatting the GPU kernel to a string for Vulkan (since it's binary SPIR-V needs to remain intact). --------- Co-authored-by: Derek Gerstmann <dgerstmann@adobe.com> Co-authored-by: Steven Johnson <srj@google.com> 05 June 2024, 21:43:13 UTC
74b9044 It's generally a bad idea for simplifier rules to multiply constants (#8234) Fixes #8227 but may break other things. Needs thorough testing. Also, there are more rules like this lurking. 05 June 2024, 16:24:06 UTC
46e866d Report useful error to user if the promise_clamp all fails to losslessly cast. (#8238) Co-authored-by: Steven Johnson <srj@google.com> 04 June 2024, 16:32:54 UTC
775bfbf Python binding support for int64 literals (#8254) This makes >32bit python integers get mapped to `hl.i64` implicitly. Fixes #8224 04 June 2024, 16:31:30 UTC
9c75554 Fix Metal handling for float16 literals (#8260) * Fix Metal handling of float16 from bits, infinity, neg infinity, and nans * Disable test for OpenCL half for now * Formatting 04 June 2024, 15:21:04 UTC
7ca95d8 Expose BFloat in Python bindings (#8255) There are two parts to support for BFloat16 in Python: 1) Ability to define kernels and AOT compile them [fixed in this PR] 2) Ability to call kernels from Python This fixes part 1, which is what I need for my use case. Part 2 is blocked on bfloat16 support in Python buffer protocols. See #6849 for more details. 02 June 2024, 21:39:44 UTC
7cf2951 Remove max size assert from Anderson2021 (#8253) Fixes #8252 02 June 2024, 21:34:36 UTC
a9b8fbf Rework the simplifier to use ConstantInterval for bounds (#8222) * Update the simplifier to use ConstantInterval and track the bounds through more types * Move the simplify fuzzer back to a correctness test * Make debug_indent not static Otherwise it causes a race condition in any parallel tests * Track expr info on non-overflowing casts to int * Delete commented-out code * clang-tidy * Delete unused member * Fix cmakelists for the fuzzer removal * Handle contradictions more gracefully in learn_true The contradiction was arising from: if (extent > 0) { ... } else { for (x = 0; x < extent; x++) { In here we can assume extent > 0, but we also know from the if statement that extent <= 0 } } * Better comments * Address review comments * Fix failure to pop loop var info 02 June 2024, 21:33:45 UTC
35143d2 Mark host_dirty() and device_dirty() with no_discard. (#8248) Co-authored-by: Steven Johnson <srj@google.com> 02 June 2024, 21:19:04 UTC
711dc88 Add HVX_v68 target to support Hexagon HVX v68. (#8232) 31 May 2024, 17:53:47 UTC
33d5ba9 Fix saturating add matching in associativity checking (#8220) * Fix saturating add matching in associativity checking The associative ops table defined saturating add as saturating_narrow(widen(x + y)), instead of saturating_narrow(widen(x) + y) 24 May 2024, 19:56:03 UTC
b5f5065 Add some EVAL_IN_LAMBDAs to Simplify_Sub.cpp (#8230) Massively reduces compile time and peak cl.exe memory consumption on windows (from 9.5gb down to 2.3gb). Simplify_LT.cpp has these same EVAL_IN_LAMBDAs, which is probably why it hasn't been causing build problems. 23 May 2024, 18:17:49 UTC
e9f8b04 Fix for top-of-tree LLVM (#8223) * Fix for top-of-tree LLVM * Update LLVM_Runtime_Linker.cpp 15 May 2024, 21:43:17 UTC
16d77e9 Fix give-up case in ModulusRemainder (#8221) A default-constructed ModulusRemainder means no information, which is what we want here. ModulusRemainder{0, 1} means the constant one! 15 May 2024, 17:43:34 UTC
211bafa Fix Reinterpret cmp in IREquality (#8217) fix Reinterpret cmp 14 May 2024, 20:15:57 UTC
dfaf6ad Insert apparently-missing `break;` in IREquality.cpp (#8211) * Insert apparently-missing `break;` in IREquality.cpp * Enable -Wimplicit-fallthrough * Also add -Wimplicit-fallthrough to runtime builds * Add missing break to runtime/webgpu.cpp * Also add flag to Makefile --------- Co-authored-by: Andrew Adams <andrew.b.adams@gmail.com> 30 April 2024, 15:08:26 UTC
8141197 [x86 & HVX & WASM] Use bounds inference for saturating_narrow instruction selection (#7805) * x86 bounds inference for saturating_narrow * bounds inference for HVX too * use can_represent(ConstantInterval) + clang-format * use bounds inference for WASM IS too + add tests * add tracking issue for scoped constant bounds * add TODO about lossless_cast usage --------- Co-authored-by: Steven Johnson <srj@google.com> 30 April 2024, 13:38:30 UTC
d55d82b Update debug_to_file API to remove type_code (#8183) * Add .npy support to halide_image_io The .npy format is NumPy's native format for storing multidimensional arrays (aka tensors/buffers). Being able to load/save in this format makes it (potentially) a lot easier to interchange data with the Python ecosystem, as well as providing a file format that support floating-point data more robustly than any of the others that we current support. This adds load/save support for a useful subset: - We support the int/uint/float types common in Halide (except for f16/bf16 for now) - We don't support reading or writing files that are in `fortran_order` - We don't support any object/struct/etc files, only numeric primitives - We only support loading files that are in the host's endianness (typically little-endian) Note that at present this doesn't support f16 / bf16 formats, but that could likely be added with minimal difficulty. The tricky bit of this is that the reading code has to parse a (limited) Python dict in text form. Please review that part carefully. TODO: we could probably add this as an option for `debug_to_file()` without too much pain in a followup PR. * clang-tidy * clang-tidy * Address review comments * Allow for "keys" as well as 'keys' * Add .npy support to debug_to_file() Built on top of https://github.com/halide/Halide/pull/8175, this adds .npy as an option. This is actually pretty great because it's easy to do something like ``` ss = numpy.load("my_file.npy") print(ss) ``` in Python and get nicely-formatted output, which can sometimes be a lot easier for debugging that inserting lots of print() statements (see https://github.com/halide/Halide/issues/8176) Did a drive-by change to the correctness test to use this format instead of .mat. * Add float16 support * Add support for Float16 images in npy * Assume little-endian * Remove redundant halide_error() * naming convention * naming convention * Test both mat and npy * Don't call halide_error() * Use old-school parser * clang-tidy * Update debug_to_file API to remove type_code * Clean up into single table * Update CodeGen_LLVM.cpp * Fix tmp codes * Update InjectHostDevBufferCopies.cpp * Update InjectHostDevBufferCopies.cpp * trigger buildbots 29 April 2024, 16:38:30 UTC
8202163 More aggressively unify duplicate lets (#8204) * Make unify_duplicate_lets more aggressive The simplifier can also clean up most of these, but it's harder for it because it has to consider that other mutations may have taken place. Beefing this up has no impact on lowering times for most apps, but something pathological was going on for local_laplacian. At 20 pyramid levels, this speeds up lowering by 1.3x. At 50 pyramid levels it's 2.3x. At 100 pyramid levels it's 4.1x. It also slightly reduces binary size. * Clarify comment; Avoid double-lookup into the scope Looking up with an Expr key and deep equality is expensive, so this was bad. * Add a std::move 28 April 2024, 21:39:41 UTC
64caf31 Faster vars used tracking in simplify let visitor (#8205) * Speed up the vars_used visitor in the simplifier let visitor This visitor shows up as the main cost of lowering in very large pipelines. This visitor is for tracking which lets are actually used for real inside the body of a let block (as opposed to the tracking we do when mutating, which is approximate, because we could construct and Expr that uses a Var and then discard it in a later mutation). The old implementation made a map of all variables referenced, and then checked each let name against that map one by one. If there are a small number of lets outside a huge Stmt, this is bad, because the data structure has to hold a number of names proportional to the stmt size instead of proportional to the number of lets. This new implementation instead makes a hash set of the let names, and than traverses the Stmt, removing names from the set as they are encountered. This is a big speed-up. We then make the speed-up larger by about the same factor again doing the following: 1) Only add names to the map that might be used based on the recursive mutate call. These are very very likely to be used, because we saw them at least once, and mutations that remove *all* uses of a Var are rare. 2) The visitor should early out when the map becomes empty. The let variables are often all used immediately, so this is frequent. Speeds up lowering of local laplacian by 1.44x, 2.6x, and 4.8x respectively for 20, 50, and 100 pyramid levels. Speeds up lowering of resnet50 by 1.04x. Speeds up lowering of lens blur by 1.06x * Exploit the ref count of the replacement Expr * Fix is_sole_reference logic in Simplify_Let.cpp * Reduce hash map size 28 April 2024, 21:38:54 UTC
302aa1c Refactor ConstantInterval (#8179) * Make ConstantInterval more of a first-class thing and use it in Monotonic.cpp * Restore bound_correlated_differences calls * Elaborate on TODO * Handle some TODOs Also explicit ignore lossless_cast bugs that will be fixed in #8155 * Fix constant interval mod, clean up constant interval saturating cast * Improve comment * Avoid unsigned overflow * Fix the most obvious bug in lossless_cast, to make the fuzzer pass more * Skip over pipelines that fail the lossless_cast check * Drop iteration count on lossless_cast test * Add test to CMakeLists.txt * Avoid UB in constant_interval test (signed integer overflow of the scalars) * Restore accidentally-deleted line from CMakeLists.txt * Print on success * Handle Lets in constant_integer_bounds Also, plumb the cache through the recursive calls * Delete duplicate operator<< * Just always cast the bounds back to the range of the op type * Address review comments * Redo operator<< for ConstantIntervals * Improve comment; disable buggy code for now 25 April 2024, 18:58:23 UTC
e39497b Make Interval::is_single_point check for deep equality (#8202) * Make is_single_point compare min and max by deep equality Interval::is_single_point() used to only compare expressions by shallow equality to see if they are the same Expr object. However, bounds_of_expr_in_scope is really improved if it uses deep equality instead, so it has a prepass that goes over the provided scope, calls equal(min, max) on everything, and fixes up anything where deep equality is true but shallow equality. This prepass costs O(n) for n things in scope, regardless of how complex the expression being analyzed is. So if you ask for the bounds of '4' say in a context where there are lots of things in the scope, it's absurdly slow. We were doing this! BoxTouched calls bounds_of_expr_in_scope lots of times on small index Exprs within the same very large scope. It's better to just make Interval::is_single_point() check deep equality. This speeds up local laplacian lowering by 1.1x, and resnet50 lowering by 1.5x. There were also places where intervals that were a single point were diverging due to carelessly written code. E.g. the interval [40*8, 40*8], where both of those 40*8s are the same Mul node, was being simplified like this: interval.min = simplify(interval.min); interval.max = simplify(interval.max); Not only does this do double the simplification work it should, but it also caused something that was a single point to diverge into not being a single point, because the repeated constant-folding creates a new Expr. With the new is_single_point this matters a lot less, but even so, I centralized simplification of intervals into a single helper that doesn't do the pointless double-simplification for single points. Some of these shallowly-unequal but deeply-equal Intervals were being created in bounds inference itself after the prepass, which may have been generating suboptimal bounds. This change should fix that in addition to the compile-time benefits. Also added a simplify call in SkipStages because I noticed when it processed specializations it was creating things like (condition) || (!condition). 21 April 2024, 03:43:38 UTC
31c52ab Faster substitute_facts (#8200) * Fix computational complexity of substitute_facts It was O(n) for n facts. This makes it O(log(n)) This was particularly bad for pipelines with lots of inputs or outputs, because those pipelines have lots of asserts, which make for lots of facts to substitute in. Speeds up lowering of local laplacian with 20 pyramid levels (which has only one input and one output) by 1.09x Speeds up lowering of the adams 2019 cost model training pipeline (lots of weight inputs and lots outputs due to derivatives) by 1.5x Speeds up resnet50 (tons of weight inputs) lowering by 7.3x! * Add missing switch breaks * Add missing comments * Elaborate on why we treat NaNs as equal 19 April 2024, 19:59:34 UTC
dd1d0e8 [HEXAGON] Keep support for hexagon_remote/Makefile (#8186) Update hexagon_remote/Makefile 19 April 2024, 17:33:44 UTC
4e0b313 Rewrite IREquality to use a more compact stack instead of deep recursion (#8198) * Rewrite IREquality to use a more compact stack instead of deep recursion Deletes a bunch of code and speeds up lowering time of local laplacian with 20 pyramid levels by ~2.5% * clang-tidy * Fold in the version of equal in IRMatch.h/cpp * Add missing switch breaks * Add missing comments * Elaborate on why we treat NaNs as equal 18 April 2024, 19:48:59 UTC
7994e70 Fix corner case in if_then_else simplification (#8189) Co-authored-by: Steven Johnson <srj@google.com> 16 April 2024, 21:27:43 UTC
f4c7831 Don't print on parallel task entry/exit with -debug flag (#8185) Fixes #8184 11 April 2024, 22:07:20 UTC
dc83707 Add .npy support to debug_to_file() (#8177) * Add .npy support to halide_image_io The .npy format is NumPy's native format for storing multidimensional arrays (aka tensors/buffers). Being able to load/save in this format makes it (potentially) a lot easier to interchange data with the Python ecosystem, as well as providing a file format that support floating-point data more robustly than any of the others that we current support. This adds load/save support for a useful subset: - We support the int/uint/float types common in Halide (except for f16/bf16 for now) - We don't support reading or writing files that are in `fortran_order` - We don't support any object/struct/etc files, only numeric primitives - We only support loading files that are in the host's endianness (typically little-endian) Note that at present this doesn't support f16 / bf16 formats, but that could likely be added with minimal difficulty. The tricky bit of this is that the reading code has to parse a (limited) Python dict in text form. Please review that part carefully. TODO: we could probably add this as an option for `debug_to_file()` without too much pain in a followup PR. * clang-tidy * clang-tidy * Address review comments * Allow for "keys" as well as 'keys' * Add .npy support to debug_to_file() Built on top of https://github.com/halide/Halide/pull/8175, this adds .npy as an option. This is actually pretty great because it's easy to do something like ``` ss = numpy.load("my_file.npy") print(ss) ``` in Python and get nicely-formatted output, which can sometimes be a lot easier for debugging that inserting lots of print() statements (see https://github.com/halide/Halide/issues/8176) Did a drive-by change to the correctness test to use this format instead of .mat. * Add float16 support * Add support for Float16 images in npy * Assume little-endian * Remove redundant halide_error() * naming convention * naming convention * Test both mat and npy * Don't call halide_error() * Use old-school parser * clang-tidy 11 April 2024, 18:04:42 UTC
8f3f6cf Update Hexagon Install Instructions (#8182) update Hexagon install instructions 11 April 2024, 16:58:36 UTC
back to top