https://github.com/halide/Halide

sort by:
Revision Author Date Message Commit Date
92e4385 Merge branch 'main' into srj/tls-4 21 September 2022, 23:20:58 UTC
b4b27b2 Revert "Temporarily disable testing for apps/fft (#7033)" (#7040) Revert "Temporarily disable testing for apps/fft (#7033) (#7035)" This reverts commit 48d56d8066f322e016d60d486613837e3670dd00. 21 September 2022, 23:20:06 UTC
33f6a0f Handle widen_right_* intrinsics in bounds inference (#7039) 21 September 2022, 22:01:10 UTC
376ca19 Update mini_qurt.h 21 September 2022, 21:42:59 UTC
623dd5b Add QURT version 21 September 2022, 21:36:23 UTC
596d5ba Add Windows HalideContext, use ScopedLock elsewhere 21 September 2022, 20:33:42 UTC
beaa2f8 Update posix_halide_context.cpp 21 September 2022, 19:40:20 UTC
358b513 Merge branch 'main' into srj/tls-4 21 September 2022, 19:00:13 UTC
e41a97d fixes 21 September 2022, 19:00:09 UTC
7ec2a4e Add stack-size-canary test to apps/fft's CMake file (#7034) * Add stack-size-canary test to apps/fft's CMake file This was apparently meant as a canary for stack size usage, but the necessary setting only happened in the Makefile, not the CMake file. Also, drive-by fix in Makefile to ignore warnings about `-ObjC++` being ignored, which apparently can be the case with current AppleClang configs. * Update CMakeLists.txt 21 September 2022, 17:20:03 UTC
985aac6 Merge branch 'srj/terminate-handler' into srj/tls-4 21 September 2022, 03:05:19 UTC
aae4db0 Merge branch 'main' into srj/terminate-handler 21 September 2022, 01:12:21 UTC
cc62ba5 Merge branch 'main' into srj/tls-4 21 September 2022, 00:56:37 UTC
0d02a0b Fix Wasm BulkMemory Codgen + Minor fixes to apps/HelloWasm (#7026) * Minor fixes to apps/HelloWasm * trigger buildbots * Fix Codegen * Update CMakeLists.txt 21 September 2022, 00:55:56 UTC
b0efc39 Update Error.cpp 20 September 2022, 23:22:26 UTC
adb2c2c Merge branch 'srj/HelloWasm' into srj/tls-4 20 September 2022, 23:00:50 UTC
47a02be Merge branch 'main' into srj/tls-4 20 September 2022, 23:00:39 UTC
b85aa68 Add a terminate_handler to try to report unhandled exceptions This uses an ugly trick: since a terminate_handler doesn't see the exception that was unhandled, we use a try/(re)throw/catch inside the terminate_handler to sniff at it. This is not standard C++, but apparently GCC and Clang (at least) support it, and it dramatically improves the quality of error reporting for our tests when Halide is built with exceptions enabled, so let's see how it works in practice. 20 September 2022, 22:59:22 UTC
7351070 Codegen_C for user_context (#7031) Codegen_C fixes 20 September 2022, 22:52:26 UTC
997881a Update CMakeLists.txt 20 September 2022, 22:41:37 UTC
ba53b93 Add reinterpret simplifications (#7029) Co-authored-by: Steven Johnson <srj@google.com> 20 September 2022, 22:00:14 UTC
cf1b311 Merge branch 'main' into srj/HelloWasm 20 September 2022, 20:23:28 UTC
2281db6 Merge branch 'main' into srj/tls-4 20 September 2022, 19:15:44 UTC
48d56d8 Temporarily disable testing for apps/fft (#7033) (#7035) Want to avoid reporting this known bug while fix is investigated 20 September 2022, 19:08:29 UTC
ae32106 fixes 20 September 2022, 17:26:17 UTC
4addf2a format 20 September 2022, 00:49:58 UTC
e6c41c1 Update fake_halide_context.cpp 20 September 2022, 00:48:34 UTC
b442ee1 Update CMakeLists.txt 20 September 2022, 00:47:13 UTC
88f35c7 wip 20 September 2022, 00:44:53 UTC
6ffd29c Fix Codegen 20 September 2022, 00:03:08 UTC
4e4e62d Merge branch 'main' into srj/HelloWasm 19 September 2022, 23:32:31 UTC
1702fa6 trigger buildbots 19 September 2022, 23:31:46 UTC
36d601f Don't use `-g` for EMCC (#7025) * Don't use `-g` for EMCC Combining `-g` with other default EMCC flags will now emit warning/error messages regarding binaryen optimization, so, don't use that flag in the default settings. * Update Error.cpp 19 September 2022, 23:18:19 UTC
62389a3 Minor fixes to apps/HelloWasm 19 September 2022, 22:07:39 UTC
3a5941d Appease Python linter (#7022) Apparently it's "more Pythonic" to use "not in" vs "not ... in", etc. 16 September 2022, 23:38:19 UTC
ef10a42 Define a Generator framework in Python (#6764) * Define a Generator framework in Python 16 September 2022, 21:06:17 UTC
5ff15bb Fix SpecificExpr canonicalization (#7016) fix SpecificExpr canonicalization 15 September 2022, 01:54:29 UTC
18b06f0 Revert "[HVX] Simplify constant factor before distributing" (#7013) Revert "[HVX] Simplify constant factor before distributing (#7009)" This reverts commit 69b50af793d6eb850eea3dccb16426479edbff9d. 14 September 2022, 20:49:20 UTC
ace8028 Add minimum GitHub token permissions for workflow (#7011) Signed-off-by: Varun Sharma <varunsh@stepsecurity.io> Signed-off-by: Varun Sharma <varunsh@stepsecurity.io> 14 September 2022, 14:04:29 UTC
655211e Rework Python Extension C++ code (again) (#7010) * Rework Python Extension C++ code (again) My previous effort was too clever for itself: while it worked for Halide's build systems, some other build systems (e.g. Blaze) are much more finicky about the C++ files you build Python extensions from, and re-using the same C++ files with different preprocessor settings turns out to be too problematic there, for reasons that aren't important here. Anyway, the important part here was to rework so that (1) All the C++ source files needed are compiled exactly once (2) All the C++ source files needed can be compiled with the same set of preprocessor definitions To that end, I have extended GenGen's `-r` flag to allow using `-e python_extension`; this emits the bare module-registration code by itself. So now, we generate the Python Extension code as before, but define HALIDE_PYTHON_EXTENSION_OMIT_MODULE_DEFINITION to defeat the standalone module registration for each one, then also compile in the new 'standalone' registration, with HALIDE_PYTHON_EXTENSION_MODULE and HALIDE_PYTHON_EXTENSION_FUNCTIONS defined to fill in the blanks. Also, a little drive-by cleanup in CodeGen_C to make extern "C" blocks more findable, and some restructuring in PyExtGen. * Update user_context_generator.cpp * Dummy source file * IF NOT EXISTS before file(WRITE) 14 September 2022, 02:05:02 UTC
27b8a7d Add one-sided widening intrinsics. (#6967) * implement widen_right_ ops * update HVX patterns with one-sided widening intrinsics * remove unused HVX pattern flags * strengthen logic for finding rounding shifts Co-authored-by: Steven Johnson <srj@google.com> 13 September 2022, 18:14:42 UTC
69b50af [HVX] Simplify constant factor before distributing (#7009) * simplify constant factor before distributing * add simd_op_check test 12 September 2022, 23:51:33 UTC
ff47ab0 Fix some bugs in div_round_to_zero (#7008) * Fix some bugs in div_round_to_zero ... and fast_integer_divide_round_to_zero These were never adequately tested, and there were a few issues. * Add missing print 12 September 2022, 16:25:26 UTC
a4f86de Fix Python handling of boolean buffers (#7006) The Python Extension code didn't handle boolean buffers correctly, making it impossible to construct one in Python and pass it thru to Halide-generated code. This fixes that, and also fixes the test that just expected it to fail (!). 11 September 2022, 23:57:33 UTC
c98f193 Couple small fixes to update RISC V to current LLVM flags and enable vscale use. (#6995) Couple small fixes to update RISC V to current LLVM flags and enable vscale use. Co-authored-by: Steven Johnson <srj@google.com> 10 September 2022, 16:02:48 UTC
a0a1d09 Prohibit C99 VLA usage in runtime code (#7005) * Prohibit C99 VLA usage in runtime code AFAICT we aren't doing this in Halide at present, but some experimental code in Google runtime was doing so; this caused some issues with some experimental Clang patches, but also was never really intended to be used in the first place. Adding the flag here to be sure no unintended use creeps back in. While I was there, took the time to ensure that the flags for runtime are unified across CMake and Make. * oops 10 September 2022, 00:07:12 UTC
4e352d3 Clean up Adams2019 CMake file (#7003) 09 September 2022, 16:49:26 UTC
bd74f94 Log target info in performance_fast_pow (#6997) (#6998) Try to gather info to track down heisenbug 08 September 2022, 00:40:05 UTC
e5069ef Apply _Halide_place_dll() to _Halide_gengen (#6999) (#7000) 07 September 2022, 21:15:58 UTC
1644e64 [Codegen] Adapt ModuleAddressSanitizerPass/ModuleSanitizerCoveragePass renaming (#6996) https://github.com/llvm/llvm-project/commit/93600eb50ceeec83c488ded24fa0fd25f997fec6 renamed ModuleAddressSanitizerPass to AddressSanitizerPass. https://github.com/llvm/llvm-project/commit/4c18670776cd6ac31099a455b2b22b38b0408006 renamed ModuleSanitizerCoveragePass. 07 September 2022, 18:28:08 UTC
cbe2e63 Fix compiler warnings in Elf.cpp (#6992) * Fix compiler warnings in Elf.cpp Some versions of GCC will complain that there is a possible use of uninitialized field `Sym<>::st_info` here; that's technically true, in that it is a bitfield that we previously set via two calls, so it temporarily could use uninitialized bits, but those would immediately be overwritten by well-defined bits. That said, the API could have been misused, so I collapsed Sym::set_type and Sym::set_bindings into a single call to avoid this warning. While I was there, I did a little hygiene on Rel<> and Rela<> as well, as there was an unused-but-similarly-dubious API there. Also added some C++17 `if constexpr` love. * Removed constexpr 06 September 2022, 17:18:50 UTC
8b9c081 Fixes for Xcode "new" build system. (#6993) 1. TargetExportScript was running into an Xcode bug with its handling of linker flags. Now using XCODE_ATTRIBUTE_EXPORTED_SYMBOLS_LIST as a workaround. 2. Added a missing dependency in Python module definition code. Fixes #6987 02 September 2022, 01:55:43 UTC
ce2e7f3 Refactor buffer-unpacking code in PythonExtensionGen (#6991) This moves most of the interesting code into the common module block, so we don't risk duplicating code for extensions that contain multiple function definitions. 01 September 2022, 17:58:57 UTC
95e37ee Improve error-handling in Python Extensions (#6986) * Improve error-handling in Python Extensions Currently, Python Extensions don't make any effort to override `halide_error`, so the default (which aborts) is generally used... this is very unfriendly. This modifies the standard Python Extension glue code to hook halide_error, saving the text in a thread local, and then throwing a Python exception after the extension's AOT call is finished (if an error occurred, of course). Also does a drive-by default hooking of `halide_print` to ensure that it goes to whatever Python thinks that `stdout` is. (Note that it would be really nice if we could use closures of some sort for halide_error, halide_print, etc so that we could save context in the actual Python module, rather than in a thread-local global var, but this currently isn't possible without nontrivial refactoring in the Halide runtime.) * Make Windows happy * Remove dangling code bits * Allow defeating of error-handler via HALIDE_PYTHON_EXTENSION_OMIT_ERROR_AND_PRINT_HANDLERS 31 August 2022, 21:44:17 UTC
e531e24 Fix markdown links (#6988) 31 August 2022, 16:55:51 UTC
386e1ee Add `add_halide_runtime` rule (#6985) Fixes #6981 30 August 2022, 20:19:58 UTC
e7c1c86 Add test for _Halide_target_export_single_symbol (#6983) Add test for _Halide_target_export_single_symbol 29 August 2022, 23:09:20 UTC
c24b406 Add add_halide_python_extension_library() rule (#6979) * Add add_halide_python_extension_library() rule This adds a rule to create a single Python extension library from one (or more) halide_library rules. This allows you to package multiple Halide filters into a single Python module, which is nice because (1) being able to organize is good, and (2) all the filters in a single Python extension module share the same Halide runtime, including (e.g.) thread pools and method overrides. (It also removes the just-recently-added PYTHON_EXTENSION_LIBRARY option from the add_halide_library rule, as this new rule is better and more flexible in pretty much every way.) This modifies the content of our `python_extension` output in such a way that existing uses should be completely unaffected, but defining the right preprocessor macros allows us to split the function wrappers up from the method-definition declaration, so we don't have to generate any new code artifiacts to make this work. Partially addresses #6956. * Omits -D in target_compile_definitions * be explicit about setting to empty * Add quotes * Add comments re BUILD_INTERFACE * Add MODULE_NAME comment * Remove "defined in HalideGeneratorHelpers.cmake" * Add comment re add_halide_runtime() * osx, macos, darwin, oh m * blankity blank blank * Use OBJECT library instead * Add comment about X-macros * Update HalideGeneratorHelpers.cmake 29 August 2022, 20:48:23 UTC
5a09dda Fix XCode by wrapping weights in an OBJECT library (#6977) The XCode "new build system" doesn't like generated source files to be associated with more than one target. Going through an OBJECT library like this fixes that problem, but also saves us a compilation, so it's a good thing to do anyway. Fixes #6976 26 August 2022, 00:42:26 UTC
50018c4 Small refactor to remove confusion between CodeGen_LLVM and CodeGen_Internal. (#6973) * Small refactor to remove confusion between CodeGen_LLVM and CodeGen_Internal. 25 August 2022, 23:01:39 UTC
4877b9f Lower saturating_cast in bounds inference (#6970) * lower saturating_cast in bounds inference * openGL fix to saturating_cast 24 August 2022, 16:21:24 UTC
068f1e2 Don't cache Halide_ASAN_ENABLED (#6969) check_cxx_symbol_exists saves its output in the cache and does not run if its destination variable is defined. This is OK when used to test something that necessitates a totally fresh configuration, like any property of the target architecture, which would require changing the toolchain file. However, the result here can change if someone just modifies CMAKE_CXX_FLAGS, so it can get out of sync in some cases. 24 August 2022, 00:35:10 UTC
eacadb6 Use CMake target to handle vendored SPIRV headers (#6968) 24 August 2022, 00:35:00 UTC
2f0957b CMake packaging fixes (#6966) * Add Halide_ASAN_ENABLED to package. * Fix handling of optional components. Make PNG/JPEG optional. * Make it easier to find HalideHelpers Before this change, users would either need to set Halide_ROOT to a Halide installation path or add said path to CMAKE_PREFIX_PATH. If they tried to use a different mechanism, like setting Halide_DIR or directly annotating their find_package call with HINTS or PATHS, then it would fail to find HalideHelpers.cmake. Adding this hint inside HalideConfig.cmake makes the package more robust, while still respecting the more powerful Halide_ROOT and CMAKE_PREFIX_PATH variables. * Delete undocumented variables in HalideConfig Some of our package's internal variables and macros leak out into user builds. We don't want users to use any of these. We might hit Hyrum's Law here, but I hope not. Users of these variables and macros should seek other means. 23 August 2022, 19:38:51 UTC
ca6319b Some minor top-level CMakeLists.txt reorganization (#6957) * Disables the usage warning when CMAKE_BUILD_TYPE is defined, but explicitly empty. * Overrides C++ standard variables using the cache (CMake 3.21+) * Allows including projects to build our tests, etc. but disables by default via PROJECT_IS_TOP_LEVEL (CMake 3.21+) * Removes misleading distrib target. * Removes deprecated clang-format target (use ./run-clang-format.sh instead) 23 August 2022, 04:07:09 UTC
37f7514 Python: don't crash for repr(Expr()) (#6962) 23 August 2022, 00:40:17 UTC
671b26d Enable deprecations warnings (#6555) * Enable deprecations warnings We currently disable deprecation warnings inside Halide. This re-enables them there, and also inside add_halide_generator(). 23 August 2022, 00:39:28 UTC
f7a30e0 Fix RPATH for Python wheels on macOS (#6958) 22 August 2022, 18:35:26 UTC
fd3bec3 [HVX] Fix state_var issue (#6894) * fix HVX state_var issue * abort if host is nullptr 22 August 2022, 16:08:34 UTC
4bcd6fa Remove add_python_stub_extension(), adding the functionality to add_halide_generator() instead (#6952) * Remove add_python_aot_extension() rule in CMake Move it into `add_halide_library` instead, as another output option. (add_python_stub_extension will likely be moved as well, in a subsequent PR) * Update README_cmake.md * Skip Python tests when compiling for WASM * PyStubs * Update README_cmake.md * Update Generator.h * Update CMakeLists.txt * Revert "Update CMakeLists.txt" This reverts commit ed5bb00283f0e4fbdbea74adf497d4ff93b0c8d1. * fixes * fixes * Update CMakeLists.txt * fixes * fixes * fixes * Remove LIBRARY DESTINATION * Update CMakeLists.txt * fixup packaging Co-authored-by: Alex Reinking <reinking@google.com> 19 August 2022, 21:17:21 UTC
a1cd71c Build fixes for manylinux2014 (#6953) 19 August 2022, 03:55:04 UTC
1068403 Remove add_python_aot_extension() rule in CMake (#6949) * Remove add_python_aot_extension() rule in CMake Move it into `add_halide_library` instead, as another output option. (add_python_stub_extension will likely be moved as well, in a subsequent PR) * Update README_cmake.md * Skip Python tests when compiling for WASM 17 August 2022, 23:20:43 UTC
dd5fe8d Two quick build fixes (#6950) * ASLog is linked to autoscheduler MODULES; needs PIC * Out-of-source Python bindings just need libHalide, not imageio * Fixes to setup.py 17 August 2022, 22:17:40 UTC
807d988 Handle saturating_cast in compute_expr_cost() (#6947) 17 August 2022, 19:33:38 UTC
5100ad6 Don't throw an exception from generate_filter_main (#6946) 17 August 2022, 01:06:52 UTC
4f5c53c Add/update Python Readme (#6939) * Add/update Python Readme This moves the Python README to the toplevel and reworks it considerably, adding details and updating various bits. Note that the Python documentation here is still incomplete; this is intended as a prelude to adding documentation for Python Generators in a future PR. 16 August 2022, 22:15:53 UTC
63d563f Export HalidePythonExtensionHelpers.cmake for installs (#6941) * Export HalidePythonExtensionHelpers.cmake for installs * oops * fixes * Fix broken code in target_export_script() * oops #2 * Add WITH_SOABI to stubs as well as AOT * More fixes * Update CMakePresets.json * Update CMakePresets.json 16 August 2022, 21:38:42 UTC
52b91a4 Add minimal useful implementation of extracting and concatenating bits (#6928) * Minimal approach to making Deinterleave correct for Reinterpret * Add minimal useful implementation of extracting and concatenating bits * clang-tidy * More clang-tidy fixes * Add missing error message * Add low-bit-depth noise test * Add test to cmake build * Fix power-of-two check * Remove dead object * Add little-endian comment to reinterpret IR node * Simplify concat_bits of single arg * Add missing second arg * Fix concat_bits call Co-authored-by: Andrew Adams <anadams@adobe.com> 14 August 2022, 17:24:30 UTC
f60a8fb Fix badly-merged CMakePresets.json file (#6936) 12 August 2022, 20:45:37 UTC
5e8f97b Add ASAN support to CMake via toolchain file (#6920) Add ASAN support Co-authored-by: Alex Reinking <reinking@google.com> 11 August 2022, 22:57:54 UTC
b734957 Add build & test presets for release and debug CMake builds (#6934) Also, drive-by rename of 'default' to 'base' to better imply the inheritance 11 August 2022, 19:35:28 UTC
4cdc2a1 Tutorial 10 needs to be skipped for Python when targeting Wasm (just as non-Python does) (#6932) * Tutorial 10 needs to be skipped for Python when targeting Wasm (just as non-Python does) * fixes * Update CMakeLists.txt 11 August 2022, 19:30:57 UTC
43e6a26 Rework internal PYTHONPATH maintenance (#6922) * Rework PYTHONPATH * Move pure-Python file copying logic to build time. * Use TARGET_RUNTIME_DLLS to copy all DLLs instead of just Halide. * Ensure that the last path component for Halide_Python is always `halide` * Simplify __init__.py now that it's copied to build tree * Add helper to de-duplicate PYTHONPATH test logic Fixes #6870 Co-authored-by: Alex Reinking <alex.reinking@gmail.com> Co-authored-by: Alex Reinking <reinking@google.com> 10 August 2022, 22:05:11 UTC
92de4a1 Halide::Error should not extend std::runtime_error (#6927) * Halide::Error should not extend std::runtime_error Unfortunately, the std error/exception classes aren't marked for DLLEXPORT under MSVC; we need our Error classes to be DLLEXPORT for libHalide (and python bindings). The current situation basically causes MSVC to generator another version of `std::runtime_error` marked for DLLEXPORT, which can lead to ODR violations, which are bad. AFAICT we don't really rely on this inheritance anywhere, so this just eliminates the inheritance entirely. (Note that I can't point to a specific malfunction resulting from this, but casual googling based on the many warnings MSVC emits about the current situation has me convinced that it needs addressing.) * noexcept 10 August 2022, 17:30:46 UTC
1bf1599 Make saturating_cast an intrinsic (#6900) * Make saturating_cast an intrinsic * handle saturating_cast in Bounds.cpp + add bounds tests * update saturating_cast CodeGen * with_lanes should work on intrinsics as well * lift to saturating_cast in FindIntrinsics * update intrinsics test for u16_sat * better sat_cast(widen(expr)) handling in find_intrinsics * simplify bounds of saturating_cast + update is_monotonic 08 August 2022, 21:24:16 UTC
8794fac Make use of CMake 3.22 features (#6919) * Remove AddCudaToTarget.cmake * Remove MakeShellPath.cmake * Use CheckLinkerFlag in TargetExportScript * Use DEPFILE for all generators * Use REQUIRED with find_program, where applicable * Use REQUIRED with find_library, where applicable * Use CMake 3.21 cache behavior in HalideTargetHelpers.cmake * Replace uses of get_filename_component with cmake_path * Rework BLAS detection in linear_algebra app * Drive-by: fix autotune_loop.sh install rule. * Fix CBLAS header in linear_algebra test_halide_blas 08 August 2022, 20:44:32 UTC
9ca7560 Fix wrong install path for *.py files (#6921) * Fix wrong install path for *.py files We were looking in a nonexistent dir, so we never copied `__init__.py` as we should have. * Update CMakeLists.txt 05 August 2022, 02:20:29 UTC
256c4d9 Fix bug when realize condition depends on tuple call (#6915) If the realization is tuple-valued, and the condition on the realization uses a tuple call (index != 0), then the condition wasn't getting resolved during the split_tuples pass. The cause was a missing mutate call. 04 August 2022, 20:21:01 UTC
ffa2c36 Fix two warnings found with clang 16 (#6918) - variable 'count' set but not used - warning: use of bitwise '|' with boolean operands 04 August 2022, 20:10:37 UTC
3a04fc0 Remove unused GHA and packaging workflows. (#6917) 04 August 2022, 14:00:45 UTC
cc44ee5 Upgrade CMake minimum version to 3.22 (#6916) Fixes #6910. 04 August 2022, 01:40:22 UTC
857b045 LICENSE.txt: add BLAS license. (#6914) 03 August 2022, 22:24:43 UTC
a893d5e LICENSE.txt: add spirv license (#6913) 03 August 2022, 22:23:44 UTC
0072946 LICENSE.txt: Include full text of Apache 2.0 license (not just the 'header' version) (#6912) 03 August 2022, 22:23:28 UTC
88e7229 Start developing pip package (#6886) Co-authored-by: Lukas Trümper <lukas.truemper@outlook.de> 02 August 2022, 20:55:53 UTC
dd391e6 Fix broken Makefile rules for autoschedulers on OSX (#6906) * Fix broken Makefile rules for autoschedulers on OSX A few issues here: - Make was building the plugins as .dylib on OSX, but they should have been .so to match Linux (and just on general principles) - On OSX, explicitly linking libHalide.dylib into a plugin means that it will load its own copy of libHalide, which is bad, because it means the plugin doesn't share the same set of globals. We need to omit that explicit dependency and allow it to just find the exported symbols at load time. - Add a test to verify the fix; run it everywhere even though it should only have been failing for Make-build OSX builds. Finally, let me add that we really need to set a sunset date for supporting Make in Halide. The Makefiles aren't really maintained properly anymore, and when something subtle goes wrong, it takes an unreasonable amount of time to debug for something that is no longer our canonical build tool. * Use order-only prerequisites * Remove new load_plugin.cpp test Not worth the complexity for the extra test coverage. 02 August 2022, 20:32:46 UTC
2239119 Fix autoscheduling trivial lut wrappers (#6905) * Fix autoscheduling trivial lut wrappers Fixes #6899 * trigger buildbots Co-authored-by: Steven Johnson <srj@google.com> 02 August 2022, 15:24:39 UTC
8871404 Allow AMX instructions with K dimension larger than 4 bytes (#6582) * recognize the patterns used for the RHS matrix * make 1d tile matcher more robust * put getting rhs tile's index into a separate func * expand the tests used in correctness check * add exclamation mark * remove unused vars * run format and tidy * check for null before using IR in the next step * check if the broadcast was found * llvm below 13 is no longer supported * replace single pattern with commutative permutations * check if the stride is an `IntImm`, otherwise reject pattern * apply clang-format-13 * rename wild_i32 -> v2 * check if v1 could be the stride value * add more detail to a receiving a bad type * added short explanation of the right-hand matrix layout * added explanation for where the 4 comes from * provide further documentation as to the layout of AMX * add comments for expected patterns to get_3d_rhs_tile_index * Document the matched pattern Co-authored-by: Steven Johnson <srj@google.com> 01 August 2022, 23:23:28 UTC
703a738 Upgrade clang-format and clang-tidy to v14 (v2) (#6902) 01 August 2022, 20:08:13 UTC
e35654b Don't try to fold saturating_sub of VectorReduce (#6896) don't fold saturating_sub of VectorReduce 01 August 2022, 17:35:48 UTC
back to top