c48a820 | Volodymyr Kysenko | 14 December 2022, 00:29:39 UTC | Address review comments | 14 December 2022, 00:29:39 UTC |
036d326 | Volodymyr Kysenko | 13 December 2022, 23:41:35 UTC | Handle an error in halide_init_dma | 13 December 2022, 23:41:35 UTC |
1ee8030 | Volodymyr Kysenko | 13 December 2022, 22:55:35 UTC | Fix review comments | 13 December 2022, 22:55:35 UTC |
cc9a142 | Volodymyr Kysenko | 13 December 2022, 22:13:11 UTC | [xtensa] DMA support improvements. This includes multiple related changes: * all transactions are 2D. * Each buffer will use a separate DMA channel. * For the case when destination is an output buffer, we can delay the wait for completion until the beginning of it's producer. | 13 December 2022, 22:13:11 UTC |
d0f0027 | Steven Johnson | 13 December 2022, 20:52:30 UTC | [xtensa] Add xtensa_io.cpp (#7233) * [xtensa] Add xtensa_io.cpp This is a better option than posix_io.cpp on Xtensa * Update xtensa_io.cpp | 13 December 2022, 20:52:30 UTC |
128be06 | Steven Johnson | 13 December 2022, 18:04:46 UTC | Merge branch 'main' into xtensa-codegen | 13 December 2022, 18:04:46 UTC |
1a4a469 | Andrew Adams | 13 December 2022, 16:11:54 UTC | Fix some sources of signed integer overflow in the compiler (#7231) * Fix some sources of signed integer overflow in the compiler Also, use compiler intrinsics when possible to handle overflow, as it generates faster code. * Fix msvc macro * Must use result * Actually perform the requested operation | 13 December 2022, 16:11:54 UTC |
33f57b5 | Steven Johnson | 12 December 2022, 20:55:18 UTC | Merge branch 'main' into xtensa-codegen | 12 December 2022, 20:55:18 UTC |
533e6e5 | Steven Johnson | 12 December 2022, 20:54:46 UTC | Remove rogue string suffix in simd_op_check_arm.cpp (#7227) * Remove rogue string suffix in simd_op_check_arm.cpp Interestingly, it compiles here, but in some compilers it will fail with "unexpected token". * Update simd_op_check_arm.cpp | 12 December 2022, 20:54:46 UTC |
c4d1781 | Volodymyr Kysenko | 12 December 2022, 18:29:38 UTC | Reorder convert<> function | 12 December 2022, 18:29:38 UTC |
145c2b6 | Steven Johnson | 12 December 2022, 17:01:46 UTC | Merge branch 'main' into xtensa-codegen | 12 December 2022, 17:01:46 UTC |
6ecdcbd | Steven Johnson | 11 December 2022, 18:05:55 UTC | Tighten alignment promises for halide_malloc() (#7222) This makes a couple of changes to the behavior/implementation of `halide_malloc()`: * Currently, halide_malloc must return a pointer aligned to the maximum meaningful alignment for the platform for the purpose of vector loads and stores. This PR also adds the requirement that the memory returned must be legal to access in an integral multple of alignment >= the requested size (in other words: you should be able to do vector load/stores "off the end" without causing any faults). * Currently, the `halide_malloc_alignment()` function is used to determine the default alignment; this cannot be overridden by user code (well, it can be, but the override will have no useful effect). It is intended to be "internal only" but is used in at least one place outside the runtime (apps/hannk). This change removes the call entirely, in favor of a call that is harder to access from outside the runtime and much less likely for end users to attempt to call. (It also changes apps/hannk to stop using it.) | 11 December 2022, 18:05:55 UTC |
df2430e | Volodymyr Kysenko | 09 December 2022, 19:40:13 UTC | Add missing halide_xtensa_deinterleave_odd_u16 | 09 December 2022, 19:40:13 UTC |
43966f5 | Mikhail Usvyatsov | 09 December 2022, 18:54:09 UTC | [xtensa] Fixed xtensa simd correctness testing (#7214) * Commented failing tests out * [xtensa] fixed most of failing tests * [xtensa] added sanitized op name check to simd_op_check_xtensa * [xtensa] Made `serialize` to be a pure function, fixed IVP_MULN_2X32 test | 09 December 2022, 18:54:09 UTC |
9f3c4b2 | Mikhail Usvyatsov | 09 December 2022, 18:52:58 UTC | [xtensa] Added initial support for float16_t (#7198) * [xtensa] Added initial support for float16_t * Added SELECT support for float16_t * [xtensa] added conversions between float16_t and int32_t | 09 December 2022, 18:52:58 UTC |
f7b3fec | Steven Johnson | 09 December 2022, 18:43:28 UTC | [xtensa] Also special-case WEAK_INLINE for xtensa (#7226) * [xtensa] Also special-case WEAK_INLINE for xtensa * Update runtime_internal.h * Also use the version of `halide_malloc_alignment()` from runtime_internal.h instead of an extern decl, so we can inline it | 09 December 2022, 18:43:28 UTC |
5afd708 | Steven Johnson | 09 December 2022, 17:34:22 UTC | Merge branch 'main' into xtensa-codegen | 09 December 2022, 17:34:22 UTC |
16421a7 | Steven Johnson | 09 December 2022, 17:21:30 UTC | Revise simd_op_check tests to ignore HL_TARGET (#7207) (#7216) * Revise simd_op_check tests to ignore HL_TARGET (#7207) The simd_op_check tests have historically only run using the value of HL_TARGET, which mean that the coverage they had was low (since HL_TARGET is only set to values that are runnable on at least one buildbot). This change completely disconnects these tests from HL_TARGET; instead, each test now tests for a range of targets appropriate to the architecture being tested. On all platforms, they still compile to assembly and verify that the correct instructions are generated; additionally, if the host platform can JIT for the given target, it verifies that the results are as expected. * Update simd_op_check_riscv.cpp * Update simd_op_check_x86.cpp * Update simd_op_check_x86.cpp * Update simd_op_check_arm.cpp * Add more features that must match; re-enable the bfloat instructions * Update simd_op_check_x86.cpp * Update simd_op_check_riscv.cpp * trigger buildbots * Fix simd_op_check_wasm | 09 December 2022, 17:21:30 UTC |
ba31688 | Steven Johnson | 09 December 2022, 01:22:54 UTC | Increase __clang_major__ check in Float16.h to 16 (#7224) | 09 December 2022, 01:22:54 UTC |
88b5063 | Steven Johnson | 09 December 2022, 00:30:10 UTC | Merge branch 'main' into xtensa-codegen | 09 December 2022, 00:30:10 UTC |
066559b | Steven Johnson | 08 December 2022, 23:35:01 UTC | Remove check_jit_user_context() from V8 bindings (#7220) Obsolete code from early V8 work, it can trigger inappropriately in some corner-case scenarios. Remove it entirely to avoid false errors. | 08 December 2022, 23:35:01 UTC |
480bcbd | Steven Johnson | 08 December 2022, 23:30:25 UTC | [xtensa] Remove xtensa_allocator.cpp (#7221) It's functionally identical to posix_allocator.cpp and the WEAK issue should be resolved by now. | 08 December 2022, 23:30:25 UTC |
4403d48 | Steven Johnson | 08 December 2022, 04:34:47 UTC | Merge branch 'main' into xtensa-codegen | 08 December 2022, 04:34:47 UTC |
8fa8221 | Steven Johnson | 08 December 2022, 04:34:13 UTC | Fix bonehead version-checking test in HalideBuffer.h for Apple (#7218) | 08 December 2022, 04:34:13 UTC |
e8615bb | Steven Johnson | 08 December 2022, 01:22:38 UTC | clang-tidy: add [[maybe-unused]] to the DECLARE_NO_INITMOD stubs. (#7215) | 08 December 2022, 01:22:38 UTC |
ba4a7f6 | Steven Johnson | 07 December 2022, 17:32:01 UTC | Merge branch 'main' into xtensa-codegen | 07 December 2022, 17:32:01 UTC |
a7fa32e | Steven Johnson | 07 December 2022, 17:31:01 UTC | Use aligned_alloc() as default allocator for HalideBuffer.h on most platforms (#7190) Use aligned_alloc() as default allocator for HalideBuffer.h on most platforms (See also https://github.com/halide/Halide/pull/7189) Modify H::R::Buffer to default to using `aligned_alloc()` instead of `malloc()`, except: - If user code passes a non-null `allocate_fn` or `deallocate_fn`, we always use those (and/or malloc/free) - If the code is compiling under MSVC, never use `aligned_alloc` (Windows doesn't support it) - If HALIDE_RUNTIME_BUFFER_USE_ALIGNED_ALLOC is defined to be 0, never use `aligned_alloc` (this is to allow for usage on e.g. older Android and OSX versions which don't provide `aligned_alloc()` in the stdlib, regardless of C++ versions.) Also, as with #7189, this ensures that the allocated space has the start of the host data as 128-aligned, and also now ensures that the size allocated 128-aligned (rounding up as needed). | 07 December 2022, 17:31:01 UTC |
8ce1212 | Steven Johnson | 07 December 2022, 17:29:19 UTC | Fix bitrot in PowerPC testing (#7211) * Fix bitrot in PowerPC testing (See #7208) - DataLayout was wrong (and has been for a long time) - simd_op_check_powerpc had errors. Some were easy to fix; the rest I commented out with a TODO since this backend doesn't appear to be in active use. (Want to fix this in preparation for fixing #7207) * Move x86 absd tests to the right place Co-authored-by: Andrew Adams <andrew.b.adams@gmail.com> | 07 December 2022, 17:29:19 UTC |
35020c5 | Zalman Stern | 07 December 2022, 07:15:46 UTC | Extend LLVM IR type mangling to handle scalars. (#7212) Extend LLVM IR type mangling to handle scalars and use this in vector predication intrinsic codegen. Fixes an error denerating vector predicated strided stores. | 07 December 2022, 07:15:46 UTC |
d4b4c50 | Zalman Stern | 07 December 2022, 07:15:10 UTC | Add RISC V zvl flag for LLVM version 16 or greater. (#7209) | 07 December 2022, 07:15:10 UTC |
e0d1e15 | Zalman Stern | 07 December 2022, 00:58:28 UTC | Fix issue with vector predicated comparison and select instructions. (#7205) Fix invalid LLVM IR issues with vector predicated comparison and select instructions. Add start of RISC V simd_op_check test. | 07 December 2022, 00:58:28 UTC |
59f5412 | Zalman Stern | 06 December 2022, 22:57:24 UTC | Add bridging for clang _Float16 type. (#7201) Add type bridging between Halide::float16_t and _Float16 if the compiler supports the latter. Testing is done using clang specific logic and may need to be extended for other compilers. I chose not to add support for __fp16 and __bf16 right now as __fp16 is less useful in being storage only and __bf16 also only supports a subset of operations and was running into undefined symbols during compilation that did not look promising. Co-authored-by: Steven Johnson <srj@google.com> | 06 December 2022, 22:57:24 UTC |
90459b0 | Steven Johnson | 06 December 2022, 00:53:05 UTC | Revert "Fix for top-of-tree LLVM" (#7200) Revert "Fix for top-of-tree LLVM (#7194)" This reverts commit a9ea9b565018774e52bb4028cbc91e14cb86959e. | 06 December 2022, 00:53:05 UTC |
c25b7e2 | Steven Johnson | 03 December 2022, 00:02:17 UTC | Merge branch 'main' into xtensa-codegen | 03 December 2022, 00:02:17 UTC |
dbdfedf | Steven Johnson | 03 December 2022, 00:02:12 UTC | Merge branch 'xtensa-codegen' of https://github.com/halide/Halide into xtensa-codegen | 03 December 2022, 00:02:12 UTC |
345cf18 | Steven Johnson | 02 December 2022, 20:48:57 UTC | Don't attempt to use makecontext()/swapcontext() on Android (#7196) Despite being 'posixy', it doesn't actually implement these calls. | 02 December 2022, 20:48:57 UTC |
c38cb5d | Mikhail Usvyatsov | 02 December 2022, 17:17:45 UTC | [xtensa] Added add_platform_headers hook in CodeGen_C and relocated the common Xtensa code there. (#7186) * Added add_platform_headers hook in CodeGen_C and relocated the common Xtensa code there. * Fixed spelling mistake in the comment and improved the function naming to add_platform_prologue | 02 December 2022, 17:17:45 UTC |
80b9a1f | Mikhail Usvyatsov | 02 December 2022, 01:04:16 UTC | [xtensa] Added missing types to CodeGen_Xtensa.cpp and fixed the issues with 0_off_3 functions. (#7184) * Added missing types to CodeGen_Xtensa.cpp and fixed the issues with 0_off_3 functions * improved is_extract_0_of_3 variable naming | 02 December 2022, 01:04:16 UTC |
d072099 | Mikhail Usvyatsov | 02 December 2022, 01:03:34 UTC | [xtensa] Improved gather_load with IVP_GATHER (#7187) * Improved gather_load with IVP_GATHER * Improved gather_load specialization | 02 December 2022, 01:03:34 UTC |
a9ea9b5 | Steven Johnson | 02 December 2022, 00:17:48 UTC | Fix for top-of-tree LLVM (#7194) | 02 December 2022, 00:17:48 UTC |
02a096b | Steven Johnson | 01 December 2022, 19:15:09 UTC | Merge branch 'main' into xtensa-codegen | 01 December 2022, 19:15:09 UTC |
43911f4 | Steven Johnson | 01 December 2022, 18:20:03 UTC | Add a -v flag to generator_main() (#7193) This is a simple thing that just logs the path to all generated file(s) to stdout if `-v=1` is specified. It's intended for people running Generators directly from the commandline, and is intended as a more user-friendly alternative to HL_DEBUG_CODEGEN=1. No makefiles, etc specify it at present, but I anticipate using it in some tooling in the future. Example usage: ``` $ resize_image_bilinear.generator_binary -v 1 -o /tmp -g resize_image_bilinear -n resize_image_bilinear_uint16 -f resize_image_bilinear_uint16 -e assembly,c_header,llvm_assembly,registration,static_library,stmt 'target=arm-64-android' 'input.type=uint16' 'output.type=uint16' Generated file: /tmp/resize_image_bilinear_uint16.s Generated file: /tmp/resize_image_bilinear_uint16.h Generated file: /tmp/resize_image_bilinear_uint16.ll Generated file: /tmp/resize_image_bilinear_uint16.registration.cpp Generated file: /tmp/resize_image_bilinear_uint16.a Generated file: /tmp/resize_image_bilinear_uint16.stmt ``` | 01 December 2022, 18:20:03 UTC |
5a8c324 | Steven Johnson | 30 November 2022, 01:44:30 UTC | Fix metadata generation for multitarget Generators (#7181) Fix metadata generation for multitarget Generators We had a mechanism in place to ensure that Outputs that got renamed during lowering still emitted the proper names in the metadata... but this didn't work reliably for Multitarget generation. Now it does. | 30 November 2022, 01:44:30 UTC |
46e6831 | Steven Johnson | 29 November 2022, 21:48:44 UTC | [xtensa] Add support for `extract_n_of-4` for float32 (#7185) | 29 November 2022, 21:48:44 UTC |
33b51e3 | Steven Johnson | 29 November 2022, 19:19:44 UTC | Merge branch 'main' into xtensa-codegen | 29 November 2022, 19:19:44 UTC |
caf4b71 | Steven Johnson | 29 November 2022, 17:45:49 UTC | Disable unreachable-code clang-tidy warnings (#7182) Some configurations of clang-tidy will (correctly) complain that the code inside the `if` clauses here will never be executed, since it ends up as something like `if (strcmp("foo", "foo"))`... but for testing purposes, we want to keep it, for obvious reasons. It's hard to construct a string-compare here as constexpr, so I'm just going to NOLINT it. Also changed the `count_buffers()` check to a static_assert for simplicity. | 29 November 2022, 17:45:49 UTC |
c99e61f | Steven Johnson | 28 November 2022, 22:23:53 UTC | Merge branch 'main' into xtensa-codegen | 28 November 2022, 22:23:53 UTC |
2cfc315 | Steven Johnson | 28 November 2022, 22:11:35 UTC | Tweak the import paths in Python apps & tests (#7179) * Tweak the import paths in Python apps & tests This change makes it a bit easier for me to transform the import paths when merging into Google: we can't set PYTHONPATH, and calling `sys.path.append()` is frowned upon. This should have no effect on the GitHub repo but will make my life easier downstream. * More tweaks * force builds * Update Generator.cpp | 28 November 2022, 22:11:35 UTC |
73c61c3 | Steven Johnson | 28 November 2022, 21:52:51 UTC | Add optional "function_info" header output (#7170) Add optional "function_info" header output At first glance, this looks like a subset of what is already provided by the `_metadata()` functionality: describing the argument attributes of an AOT-generated Halide function. However, _metadata() is suboptimal for some use cases: Because it's expressed as ordinary data, we can only process it at runtime; the new fuctionality is expressed as a `constexpr` data structure, meaning we can process it at *compile* time if we so choose. (This is quite useful for producing automatic call wrappers, etc). At first I considered adding this to the normal `.h` file, but moving it into a new file is cleaner in a few ways: - It maintains the 'C-only' nature of the existing .h files (adding this would have imposed a C++17-only section on them) - Splitting into a new file means no existing users are affected by this change at all Note also that this is deliberately not replicating all of the existing `_metadata()` functionality (it's just the argument signature, but no e.g. estimates or default values, etc). This approach means that it is probably more sensible to add several separate constexpr "getters" to this file, rather than trying to mash everything together into one clumsy structure. (With _metadata(), there was an incentive to keep the surface area of the API small, even if that meant combining somewhat-unrelated concerns; there is no such incentive here.) | 28 November 2022, 21:52:51 UTC |
3ff9e66 | Dmitry Kurtaev | 28 November 2022, 20:00:58 UTC | Use n32:64 in RISC-V data layout (#7175) * Use n32:64 in RISC-V data layout * Remove unused LLVM header | 28 November 2022, 20:00:58 UTC |
81c79d5 | Steven Johnson | 28 November 2022, 19:12:11 UTC | README_python.md should be installed with other READMEs (#7177) | 28 November 2022, 19:12:11 UTC |
8be767a | Steven Johnson | 18 November 2022, 18:53:10 UTC | Merge branch 'main' into xtensa-codegen | 18 November 2022, 18:53:10 UTC |
270c24a | Dmitry Kurtaev | 18 November 2022, 18:52:51 UTC | Migrate from MCJIT to ORC JIT (#7166) * Migrate from MCJIT to ORC JIT | 18 November 2022, 18:52:51 UTC |
7b0fdf5 | Steven Johnson | 18 November 2022, 17:08:56 UTC | Add fopen() bottleneck to runtime (#7171) * Add fopen() bottleneck to runtime Prefer using `fopen64()` on Linux systems. Also, drive-by sorting of the list of initmods that was supposed to be kept sorted. * fopen_32 -> fopen, fopen_64 -> fopen_lfs | 18 November 2022, 17:08:56 UTC |
0453cad | Steven Johnson | 17 November 2022, 18:46:09 UTC | Merge branch 'main' into xtensa-codegen | 17 November 2022, 18:46:09 UTC |
be055a8 | Andrew Adams | 16 November 2022, 01:02:37 UTC | Slightly improve error message for non-integer RDom min/extent (#7151) Improve error message for non-integer RDom min/extent Co-authored-by: Steven Johnson <srj@google.com> | 16 November 2022, 01:02:37 UTC |
e4423c5 | Steven Johnson | 11 November 2022, 22:08:37 UTC | Upgrade Poor Man's Profile + add predefined_vectors | 11 November 2022, 22:08:37 UTC |
41fe8b3 | Zalman Stern | 11 November 2022, 03:32:07 UTC | Factor simd_op_check into separate files by architecture. (#7163) | 11 November 2022, 03:32:07 UTC |
9916b4e | Steven Johnson | 08 November 2022, 16:54:25 UTC | Add `bfloat` support to `halide_type_to_string()` (#7154) | 08 November 2022, 16:54:25 UTC |
58421be | Steven Johnson | 08 November 2022, 16:54:15 UTC | Call cache.clear between internal functions in CG_C (#7155) We didn't call cache.clear() between internal functions in the C backend, so the cache could try to re-use something declared in a previous (internal, closure) function and would fail to compile. Easy fix. (I'm surprised we haven't seen this fail before now.) | 08 November 2022, 16:54:15 UTC |
c6815b0 | Steven Johnson | 08 November 2022, 16:53:43 UTC | C Backend should call halide_buffer_to_string() (#7156) Just assume that this is present and call it for stringify() on buffers in the C backend. (If it's missing, the user will be expected to provide an implementation, as is usual for runtime with the C backend.) | 08 November 2022, 16:53:43 UTC |
102c059 | Andrew Adams | 08 November 2022, 00:38:44 UTC | Fix readnone attribute for llvm 16 (#7152) * Fix readnone attribute for llvm 16 The readnone flag was changed to memory(none) when applied to functions. llvm-as dynamically upgrades readnone applied to functions, so our .ll is fine for now, but there were places in the compiler we were manually sticking 'readnone' on a function. Also did a driveby makefile fix to remove some vestigial wasm stuff that was throwing errors with newer versions of llvm-config * Revert formatting changes | 08 November 2022, 00:38:44 UTC |
8f8edeb | Steven Johnson | 03 November 2022, 20:37:51 UTC | Don't use TF_LITE_KERNEL_LOG in apps/hannk (#7147) TF_LITE_KERNEL_LOG was intended for TFLite Micro but usage leaked out into example code; we should use ReportError instead. | 03 November 2022, 20:37:51 UTC |
1230042 | Steven Johnson | 02 November 2022, 00:10:51 UTC | Fix Python wheel-building (#7144) Various bits of code rearrangement had invalidated some of the build scripts for Python wheels for our bindings; this fixes that, and also subtracts some other irrelevant stuff that was getting included (e.g. the stub directory). Also updated the "long description" to use README_python.md rather than README.md. | 02 November 2022, 00:10:51 UTC |
d3e9d85 | Steven Johnson | 01 November 2022, 22:11:16 UTC | Upgrade some Actions in pip.yml (#7141) Needed to avoid deprecation warnings | 01 November 2022, 22:11:16 UTC |
b676567 | Steven Johnson | 01 November 2022, 22:10:52 UTC | Bump Halide version in main's setup.py to 16 (#7142) | 01 November 2022, 22:10:52 UTC |
bb7715a | Steven Johnson | 01 November 2022, 20:13:13 UTC | Move Python apps to toplevel of python_bindings -- they don't belong … (#7140) * Move Python apps to toplevel of python_bindings -- they don't belong under test/ * Update CMakeLists.txt | 01 November 2022, 20:13:13 UTC |
115f67a | Steven Johnson | 01 November 2022, 16:12:26 UTC | Give pip.yml permission to read packages (#7139) | 01 November 2022, 16:12:26 UTC |
97b40c2 | Steven Johnson | 31 October 2022, 22:35:17 UTC | Merge branch 'main' into xtensa-codegen | 31 October 2022, 22:35:17 UTC |
4987365 | Steven Johnson | 31 October 2022, 22:22:26 UTC | Rewrite python_bindings/apps (#7133) * apps * wip * WIP 2 * Fix comments * _GPU_SCHEDULE_ENUM_MAP * Update blur_generator.py * Add hl.funcs, hl.vars, plus formatting tweaks | 31 October 2022, 22:22:26 UTC |
e6066ac | Steven Johnson | 31 October 2022, 20:07:41 UTC | halide.imageio needs to support arbitrary bufferviews (#7137) * halide.imageio needs to support arbitrary bufferviews As written, the helper code assumed that everything passed in was a numpy array of some sort; this meant that passing hl.Buffer didn't work. Restructured so that we only assume that the objects passed in satisfies the Python buffer protocol, so this should now work very generically. * Update imageio.py * More fixes | 31 October 2022, 20:07:41 UTC |
5da5dfd | Alexander Root | 31 October 2022, 18:36:41 UTC | [x86] Generate AVX512 fixed-point instructions (#7129) * clean-up abs and saturating_pmulhrs, fix AVX512 saturating_ ops * add test coverage for AVX512 fp ops * generate vpabs on AVX512 * faster AVX2 lowering of saturating_pmulhrs | 31 October 2022, 18:36:41 UTC |
bad945f | Steven Johnson | 31 October 2022, 16:57:09 UTC | Apply 'Black' formatter to py/test/correctness and py/test/generators (#7135) * Apply 'Black' formatter to py/test/correctness and py/test/generators Trying to regularize all our Python code to a common style. Should be no functional changes here, just autoformatting + a few tweaks. * Update complexpy_generator.py | 31 October 2022, 16:57:09 UTC |
0c03ff8 | Alex | 31 October 2022, 16:22:53 UTC | GitHub Workflows security hardening (#7136) build: harden pip.yml permissions Signed-off-by: Alex <aleksandrosansan@gmail.com> Signed-off-by: Alex <aleksandrosansan@gmail.com> | 31 October 2022, 16:22:53 UTC |
bd15cee | Alexander Root | 29 October 2022, 21:19:47 UTC | [WASM] Use rounding_mul_shift_right for q15mulr_sat_s pattern (#7134) Use rounding_mul_shift_right for WASM q15mulr_sat_s pattern | 29 October 2022, 21:19:47 UTC |
2f1587e | Steven Johnson | 28 October 2022, 00:18:41 UTC | Fix Python buffer handling (#7125) * Fix Python buffer handling In the category of "how did this ever work"... TL;DR: in general, Halide Buffers have the opposite axis ordering from Python/NumPy buffers; in Halide, the most-frequently-varying dimension comes first, while in Python, it comes last. This isn't surprising, though, since Halide's indexing scheme is effectively column-major while NumPy's is row-major. Anyway: what we *should* have done was to reverse the order of dimensions when converting to/from Halide Buffers vs Python buffers; instead, we kept the same order, then jumped thru hoops to rearrange buffers to fit this setup. This PR does the appropriate axis reordering, fixing the apps and tests as needed. It also adds some helper code for image reading and writing; by default, we use `imageio` for this, but imageio ~always wants RGB/RGBA images to be interleaved (vs the planar that Halide prefers). So, I added the `halide.imageio` package, that has wrapper functions to quietly convert to/from planar as needed. Needless to say, this change is likely to break existing code that is using 3d buffers in Halide, but I think it's the right long-term thing to do. Opinions greatly welcomed here. * Update PyBuffer.cpp * -"for better vectorization" * public halide.imageio utilities should copy() buffers * PEP8 * Update imageio.py * Update imageio.py * add 'reverse_axes' options to Buffer conversions (#7127) * add 'reverse_axes' options to Buffer conversions | 28 October 2022, 00:18:41 UTC |
48345d9 | Steven Johnson | 27 October 2022, 02:22:38 UTC | Add range-checking to Buffer objects in Python (#7128) using () to get or set a Buffer element wasn't being checked at runtime for Python, but it clearly should be, because Python. (Note that in C++ we don't always range-check for these operations -- it's limited to `assert()` checks -- but in Python the expectations are clearly different.) | 27 October 2022, 02:22:38 UTC |
2479825 | Steven Johnson | 26 October 2022, 22:14:18 UTC | Merge branch 'main' into xtensa-codegen | 26 October 2022, 22:14:18 UTC |
da87cb2 | Zalman Stern | 26 October 2022, 22:12:28 UTC | RISC V vector predication support intrinsics support (#7119) Turn on vector predication support for RISC V. (First architecture to use this code. Bug fixes included here.) Add architecture specific vector intrinsics support as well. Should not affect anything outside of RISC V. | 26 October 2022, 22:12:28 UTC |
fd63349 | Steven Johnson | 25 October 2022, 17:29:07 UTC | Require Python 3.8+ in CMake build (#7117) * Require Python 3.8+ in CMake build * Update CMakeLists.txt * Update CMakeLists.txt | 25 October 2022, 17:29:07 UTC |
9163310 | Zalman Stern | 25 October 2022, 05:24:32 UTC | Add support for generating LLVM vector predication intrinsics. (#7111) Add support for generating llvm.vp.* intrinsics. This is particularly useful for RISC V, but it may be a simpler, better optimized path, for Halide vector operations in general. Add support for a maximum vector size that might be larger than the native vector size. RISC V vector LMUL support is an example of an architecture supporting this. | 25 October 2022, 05:24:32 UTC |
44102c0 | Steven Johnson | 24 October 2022, 16:37:40 UTC | Add evaluate() and evaluate_may_gpu() to Python bindings (#7108) * Add evaluate() and evaluate_may_gpu() to Python bindings * pacify clang-tidy | 24 October 2022, 16:37:40 UTC |
5ade1fb | Steven Johnson | 21 October 2022, 17:26:49 UTC | Attempt to fix pip build issues (#7098) | 21 October 2022, 17:26:49 UTC |
8204b05 | Steven Johnson | 21 October 2022, 17:17:53 UTC | Minor updates to apps/hannk (#7110) * Update hannk to TFLite 2.8.3 * Newer Android NDK use llvm-ar * Avoid 'unscheduled' warning for Elementwise | 21 October 2022, 17:17:53 UTC |
da22f6f | Andrew Adams | 20 October 2022, 19:26:46 UTC | Fix some dead links to the 'master' branch (#7107) | 20 October 2022, 19:26:46 UTC |
4f7100b | Steven Johnson | 20 October 2022, 00:42:58 UTC | Fix subtle CMake Install bugs (#7103) * Update CMakeLists.txt * Update CMakeLists.txt | 20 October 2022, 00:42:58 UTC |
8c0b937 | Volodymyr Kysenko | 20 October 2022, 00:17:36 UTC | Fix stupid mistake | 20 October 2022, 00:17:36 UTC |
d3c72f4 | Volodymyr Kysenko | 19 October 2022, 17:41:03 UTC | Merge branch 'main' into xtensa-codegen | 19 October 2022, 17:41:03 UTC |
5256aa6 | Steven Johnson | 18 October 2022, 04:14:26 UTC | Remove HALIDE_ALLOW_LEGACY_AUTOSCHEDULER_API (#7096) * Remove HALIDE_ALLOW_GENERATOR_EXTERNAL_CODE * Remove HALIDE_ALLOW_LEGACY_AUTOSCHEDULER_API * clang-format * Update CMakeLists.txt | 18 October 2022, 04:14:26 UTC |
83ccd8e | Steven Johnson | 18 October 2022, 04:14:07 UTC | Revert "Update pip.yml to use LLVM 15.0.2" (#7099) Revert "Update pip.yml to use LLVM 15.0.2 (#7097)" This reverts commit 26b1f3c938b07e6fc73f34e5af223c0eaf8e909f. | 18 October 2022, 04:14:07 UTC |
8f210f1 | Steven Johnson | 17 October 2022, 23:02:40 UTC | Remove HALIDE_ALLOW_GENERATOR_EXTERNAL_CODE (#7094) | 17 October 2022, 23:02:40 UTC |
bb6092b | Steven Johnson | 17 October 2022, 23:01:13 UTC | Remove everything flagged with HALIDE_ATTRIBUTE_DEPRECATED (#7095) | 17 October 2022, 23:01:13 UTC |
6702d86 | Steven Johnson | 17 October 2022, 22:52:03 UTC | Update Halide main branch to v16 (#7093) * Update Halide main branch to v16 * Drop support for LLVM13 in main Now that release/15.x has branched, main is now Halide 16 and no longer needs to support LLVM13. Update the docs, prune the requirements, eliminate old special cases we don't need anymore. * Revert mistaken changes * Update LLVM_Headers.h | 17 October 2022, 22:52:03 UTC |
26b1f3c | Steven Johnson | 17 October 2022, 22:37:48 UTC | Update pip.yml to use LLVM 15.0.2 (#7097) Newly released since this script was written | 17 October 2022, 22:37:48 UTC |
e70b7d9 | Volodymyr Kysenko | 17 October 2022, 22:07:53 UTC | Generate dot() in the Metal backend (#7085) * dot() support for Metal backend) * Restrict dot() to floats | 17 October 2022, 22:07:53 UTC |
0a04beb | Steven Johnson | 17 October 2022, 18:31:15 UTC | Update README.md (#7091) | 17 October 2022, 18:31:15 UTC |
eb2e336 | Steven Johnson | 17 October 2022, 17:11:05 UTC | Fix #7076, #7077 (#7080) * Fix issue 7076 * fixes * Fixes * Update ScheduleFunctions.cpp * Update ScheduleFunctions.cpp * Update ScheduleFunctions.cpp | 17 October 2022, 17:11:05 UTC |
92b2227 | Volodymyr Kysenko | 14 October 2022, 21:02:11 UTC | Merge branch 'xtensa-codegen' of https://github.com/halide/Halide into xtensa-codegen | 14 October 2022, 21:02:11 UTC |
abd4fa9 | Volodymyr Kysenko | 14 October 2022, 21:00:26 UTC | Initial support for Q8 | 14 October 2022, 21:00:26 UTC |
aec5afd | Volodymyr Kysenko | 14 October 2022, 20:47:54 UTC | Merge branch 'main' into xtensa-codegen | 14 October 2022, 20:47:54 UTC |