https://github.com/halide/Halide

sort by:
Revision Author Date Message Commit Date
27ee253 Use print_type function when printing vector type definitions, which can be overridden in derived classes if needed. 17 March 2021, 21:24:04 UTC
b549d18 Grab-bag of minor fixes (#5813) * All TraceViz scripts should set pipefail (at least) * Use ".ldscript" for linker version scripts * More TraceViz fixes 16 March 2021, 21:05:33 UTC
e1c0fd3 Conform to CMP0116 when using CMake 3.20+ (#5810) 16 March 2021, 20:19:05 UTC
5aa1a65 Add Halide_CCACHE_BUILD option for CMake (#5804) * Add Halide_CCACHE_BUILD option for CMake This replicates the approach LLVM already has of baking a use-cache option directly into the CMake options, without having to futz with changing CC/CXX/etc. * Update CMakeLists.txt * Add CCACHE_SLOPPINESS=pch_defines,time_macros * Update CMakeLists.txt 16 March 2021, 03:16:33 UTC
b061a32 Avoid printing the statement when it is unchanged (#5809) * Only print statement if it changed. * Move simplification after unroll from lower to after unrolling a loop. * Avoid making new IR when not necessary. * Use a small helper class instead. * clang-format. Co-authored-by: Steven Johnson <srj@google.com> 15 March 2021, 21:58:31 UTC
c49b4af support building for m1 with only AARCH target (#5802) move test for ARM + metal to AARCH64 Co-authored-by: Steven Johnson <srj@google.com> 15 March 2021, 20:12:57 UTC
28c742f Set CMAKE_CROSSCOMPILING_EMULATOR in toolchain.linux-arm32.cmake (#5808) Also add comments 15 March 2021, 19:21:36 UTC
d523b83 Fix deprecation warnings for trunk LLVM (#5803) (1) Both the 'deprecated' and new, non-deprecated variants existed back to at least LLVM10, and the deprecated variant was commented as deprecated at that point as well; the change in LLVM13 is that they are now annotated with LLVM_ATTRIBUTE_DEPRECATED so we get compiler warnings (and thus errors). (2) The fixes are simply replicating what the old, deprecated methods did internally. 13 March 2021, 00:54:23 UTC
c3882a5 Change warmup strategy for sliding window (#5755) * Implement sliding window warmups by backing up the loop min. * Fix indirect sliding windows. * Improve is_monotonic. * Small cleanups. * Avoid generating vector valued bounds. * Fix build error on some compilers. * Fix loop bounds. * Don't try to slide things that should just be compute_at the store_at location. * Print condition when printing boxes. * Less things broken. * Add/fix comments. * Comments * Fix async by moving if inside consume (and so inside acquires). * Fix division. * This doesn't work on master either. * Add TODO * Acquire is not a no-op. * Add comment about unfortunate simplification. * Remove debug(0) * Add simplification of for { acquire { noop } } * Fix folding factors finally! * Update storage_folding test. * Fix bug when cloning a semaphore used more than once. * Disable failing test. * Work around bad complexity in is_monotonic. * Fix sub bug * Significantly faster schedule for blur. * Update tracing test. * New simplifications that help with upsampled and downsampled sliding windows. * This doesn't need explicit folding any more. * Fix new simplifier rules. * Fix simplifier div rule * Remove ancient brittle test. * Fix simplify rule again * More LT -> EQ rules for mod * Fix nested sliding windows with upsamples. * Replace hack with better solution. * Add missing override * Don't rewrite loop variable if the min doesn't change. * Refactor sliding window lowering. * Fixed bounds growing redundantly for independent producers. * Don't take the union unless possibly needed. * Respect conditional provide/required. * Add missing overrides * Much better schedule. * Use a smaller image for blur benchmarking so that different schedules have different perf * Replace Interval with ConstantInterval for is_monotonic. * Don't try to handle unsigned deltas. * Add failing test. * Remove unused new code. * Remove weird debugging code. * Avoid expanding bounds of split producers * Remove stray likely_if_innermost. * Remove old autotune tests. * Update test for guarded producers. * Reenable test. * Update trace for guarding producers. * Don't overwrite required.used * Handle LE/LT in bounds of lanes in vectorize * Fix acquire and release of warmups * Earlier fix for multiply cloned acquires was wrong. * Handle nested vectorization. * clang-format * Remove autotune_bug_* tests * Fix shadowing error on some compilers. * Appease overzealous clang-tidy warning. * clang-format * Don't use silly hack. * clang-tidy... * It's no longer safe to assume monotonic means bounds_of_expr_in_scope is exact * Address review comments * Add comment * Add missing override. * Fix constant interval issues. * Revert and remove empty interval * Fix multiply!? * Reduce need for simplifications. * Simplifications from dsharletg/sliding-window branch * Don't learn likely(x) and x. * Add comment * Add some min/max rules. * Also substitute facts from asserts * Remove is_empty from header too. * More rules * Add double stairstep rule. * Disable rule that uncovers bugs. * Consider anded expressions as if they were independent nested ifs. * Add promise_clamped to producer guards. * Revert "Consider anded expressions as if they were independent nested ifs." This reverts commit 03efb3f784b3078b64961c98edde383f4de04fb4. * Don't combine ifs, split them instead. * Update trace * clang-tidy/clang-format * Remove splitting of ifs, it breaks brittle tests. * Safer check on old conditions. * Fix producer guard condition. * Interval fixes. * Handle sliding backwards * Handle transitive dependencies. * Backport abadams' fix from abadams/slide_over_split_loop * Fix select visitor. * More simplifier rules. * Bring back old logic as a fallback. * Avoid specializations corrupting sliding * Fix boneheaded rule errors. * Fix slightly conservative bounds at the max for split case. * This pattern is too sensitive to the simplifier. In a real use case, it's just a sum, and the result can be subtracted after doing a reduction. * Add missing clamp rule * Don't count unlikely loops as inner loops for likely_if_innermost * Use <= instead of == to solve for the new loop min Useful when the warmup is a partial vector or something * Verify simplifier changes and add variants as suggested by synthesizer * Make implicit assumption explicit, for clarity * Use find_constant_bounds * Guard against expanded bounds more effectively. * Update tracing test * Small cleanup. * Don't simplify/prove using lets that might change value. * Stronger solving without expanding lets. * New simplifier rule for alignment * Fix case where no warmup needed * Add some useful rules. * Add safety check on when we can use the new loop min. * Better proof to avoid hacky condition that is hard to prove. * Small cleanup and use the nice new folding factors. * Bring back unrolled producer test. * clang-format * Expand comment. * Fix sliding backwards condition. * min(new_loop_min, loop_min) isn't needed any more. * We need that min, but we can be more conservative about it. * Stronger handling of previous loop mins. * Remove unused is_monotonic_strong. * Remove ConstantInterval::make_intersection. * Avoid need to handle uint specially. * Add cache for depends_on. * Reduce unnecessarily large cache scope * The first part of the key is always the same Co-authored-by: Andrew Adams <andrew.b.adams@gmail.com> 11 March 2021, 23:14:36 UTC
c2a0db1 Fix correctness_memoize on arm32 (#5799) Conversion from a pointer to an integer is "implementation defined"; in general, conversion to `uintptr_t` is reliable, but other conversions aren't. It turns out that for our arm32 compiler, casting a pointer to a `uint64_t` sign-extends, rather than zero-extends, causing unexpected behavior in memoize cache eviction *if* the pointer is in the top half of memory. The fix here is simple (cast to `uintptr_t` directly), but it brings up the question of where else in our codebase we might be doing direct conversions elsewhere in our code without noticing the potential UB. 10 March 2021, 23:39:27 UTC
7a8c771 Fix wasm-interp ucon error (#5797) This is a subtle error that only shows up in builds with assertions enabled in wabt; the issue is that we ask for an int64 value from a wabt `Value` struct, but that struct only has an int32, so an assertion can fail. (Note that there is always storage for both, and the values are constrained and unused inside the JIT anyway, so this is mostly just a cosmetic fix.) Also added a .gitignore entry. 10 March 2021, 22:53:40 UTC
34c402f Fix bug found by asan from #5784 (#5798) 10 March 2021, 21:54:51 UTC
c67f486 Move where intrinsic function attributes are set (#5795) * Make declare_intrin_overload return LLVM function * Make names same as elsewhere * Remove unneeded enum name * Set moved attributes in Hexagon backend * Use declare_intrin_overload for ARM vabdl * Fix ARM vabdl intrinsic types * Format and clang-tidy * Rename intrinsic to widening_absd 10 March 2021, 18:51:33 UTC
78ff307 Display the GPU device code as a string in the C/C++ backend (#5757) * Added support for a OpenCL backend through the C backend No correctly handles assert of kernel call Fixed clang-formatter Clang-tidy fixes * Remove references to alternative OpenclHost code * Only initialize GPU context once (removes name conflicts) squash commit * (Pretty) Print the kernels in the C backend clang formatting remove virtual * Fixes after review fixes after review 2.0 fixes 10 March 2021, 18:35:50 UTC
e75d9fb Fix out of bounds reads in strided ARM loads (#5784) * Safer version of vldN code generation. * Only be more conservative with alignment for external buffers. * Add tolerance to allocation size tests. * Remove old comments. * Improve ARM alignment and vldN code generation. * Remove merge straggler * Fix alignment condition (again). * Fix alignment. * Avoid divide by zero. * Move CodeGen_ARM's logic for strided loads to CodeGen_LLVM. * Fix comment. * clang-format. * Remove sketchy alignment check. 09 March 2021, 22:16:21 UTC
a83ab23 Simplifier rules for nested broadcasts (#5794) * Handle some reassociation when simplifying nested broadcasts. * clang-format. 08 March 2021, 19:24:07 UTC
89f5ee7 Add missing min/max/+/- rules (#5788) These are almost all of the rules in <= four in terms of min/max/add/sub ops leaves that simplify to something with <= three leaves. I left out things of the form: max(max(x, -x), 0) -> max(x - x) While correct, that transformation actually hurts our ability to analyze that expression. 05 March 2021, 20:00:57 UTC
199b873 Upgrade WABT version to 1.0.21 (#5782) 05 March 2021, 17:45:48 UTC
06b208f Move codegen backends into anonymous namespaces in source files and don't build them if not enabled (#5776) * Remove unused vertex buffer parameters. * Offload GPU code in a lowering pass instead of via CodeGen_GPU_Host. Fixes #5650, fixes #2797, fixes #2084, now #1971 is more relevant. * clang-format. * clang-format sorting is case sensitive!? * clang-tidy * Move codegen backends into anonymous namespaces in source files. * clang-format * Pass type arguments correctly. * Update OffloadGPULoops.cpp * trigger buildbots * trigger buildbots * Hack around tests that rely on the IR for offloaded GPU loops. * Fix missing include. * Remove unused include. * clang-tidy * Use custom lowering pass to see code before GPU offloading * Speculative fix for segfault * Fix const correctness * Fix error on unused variables in generated code. Co-authored-by: Steven Johnson <srj@google.com> 04 March 2021, 18:11:57 UTC
abf0f69 cmake: respect find_package QUIET option (#5785) Co-authored-by: Steven Johnson <srj@google.com> 04 March 2021, 08:16:21 UTC
a0c5380 Fixes for macos on arm (#5787) * Remove type checking on ARM-64 instructions in simd op check * Makefile fixes for M1 * Don't assume OSX is x86 * Change xml2 linking flag * M1 has a really fast div instruction 04 March 2021, 06:29:04 UTC
74f40fd Track time spent in malloc/free when profiling (#5763) * Track time spent in malloc/free when profiling * Appease clang tidy * Remove unnecessary asserts 03 March 2021, 20:26:03 UTC
acebd50 Various simplifier improvements from dsharletg/sliding-window (#5771) * Pull simplifier changes from dsharletg/sliding-window * Bring over test changes too. * Fix typo * Remove done TODO. * trigger buildbots * This pattern is too sensitive to the simplifier. In a real use case, it's just a sum, and the result can be subtracted after doing a reduction. Co-authored-by: Steven Johnson <srj@google.com> 03 March 2021, 20:14:29 UTC
5da8044 Various bug fixes and improvements from dsharletg/sliding-window (#5772) * Fix bug when a semaphore is cloned more than once. * (Originally by abadams) Don't count unlikely loops as inner loops for likely_if_innermost. * Ignore promise_clamped when solving. * Acquires are not no-ops. * Fix test name * Handle nested vectors in bounds_of_lanes and (by abadams) Handle LE/LT in bounds of lanes in vectorize * Fix test name. * Allow any level of nested vectorization. * trigger buildbots * Grammar Co-authored-by: Steven Johnson <srj@google.com> 03 March 2021, 20:14:11 UTC
7493c09 Move CodeGen_GPU_Host to a lowering pass (#5775) * Remove unused vertex buffer parameters. * Offload GPU code in a lowering pass instead of via CodeGen_GPU_Host. Fixes #5650, fixes #2797, fixes #2084, now #1971 is more relevant. * clang-format. * clang-format sorting is case sensitive!? * clang-tidy * Pass type arguments correctly. * Update OffloadGPULoops.cpp * trigger buildbots * Hack around tests that rely on the IR for offloaded GPU loops. * clang-tidy * Use custom lowering pass to see code before GPU offloading * Speculative fix for segfault * Fix const correctness * Fix error on unused variables in generated code. * Remove unnecessary space * Use helper. Co-authored-by: Steven Johnson <srj@google.com> 02 March 2021, 21:07:14 UTC
01f5e73 Update simd_op_check for the final wasm simd128 spec (#5779) The final revision of the wasm simd128 spec (https://github.com/WebAssembly/simd/blob/master/proposals/simd/SIMD.md) added some ops and tweaked some others. This just augments the wasm section of simd_op_check so that all the ops are referenced, with the new (or still-unimplemented) ops commented out with TODOs. Most of the new ops will likely be implemented via pattern matching in Codegen_WebAssembly, but before that can happen in any efficient way, WABT needs to be updated to recognize all the ops in the final spec (it doesn't currently, and so we can't even load code with those ops without cratering). See https://github.com/WebAssembly/wabt/issues/1617 for tracking bug. (Also: drive-by fix in CodegenLLVM to fix prettyprinting of some debug output) 02 March 2021, 18:06:11 UTC
47d0594 Fix for trunk LLVM (#5778) 28 February 2021, 19:35:27 UTC
fdfb393 Fix for trunk LLVM (#5774) 26 February 2021, 18:05:34 UTC
d48fbde Fix for top-of-tree LLVM (#5768) * Fix for top-of-tree LLVM * oops 25 February 2021, 00:35:34 UTC
6bf0c0a Fix autoscheduler breakage from #5766 (Fix #5769) (#5770) 25 February 2021, 00:35:15 UTC
14bab78 Constrain calling convention for PyStub (#5761) There are two calling conventions that are sensible when calling a Generator from Python: - Inputs are specified via keyword arguments (verbose but more readable) - Inputs are specified via positional arguments (terse but more like C++) However, the current PyStub implementation also allows for mixing positional and keyword forms of input (as long as no input is specified multiple times). This change removes that possibility (requiring either all-keyword or all-positional for inputs). I'm not sure why I thought this was a good option originally, but upon reflection (and examination of existing code), calls written in this way tend to be a confusing mess that require too much care when reading, and I think we'd be better off just forbidding this entirely. Note that this technically is a breaking API change; any user code that used this technique previously would now throw an exception and need a (trivial) update. That said, this is (AFAIK) a very rarely used API, and I rather suspect that existing code that relies on being able to mix positional and keyword inputs in this way is likely to be doing so inadvertently rather than deliberately. 24 February 2021, 19:14:54 UTC
1be92c2 Grab-bag of minor Generator-related cleanups (#5766) * Grab-bag of minor Generator-related cleanups * Appease clang-tidy; rename local 'cerr' * Update PyStubImpl.cpp * Update PyStub.cpp * appease clang-tidy 23 February 2021, 00:27:18 UTC
a6ba311 Ensure that PyStub handles dynamically-added inputs and outputs, and add test for it (#5760) 22 February 2021, 23:26:13 UTC
b7da2df Fix misspelling of an ABI name (#5764) named ABIs are defined in https://github.com/riscv/riscv-elf-psabi-doc/blob/master/riscv-elf.md#named-abis 22 February 2021, 18:20:38 UTC
4394fa2 Fix a few warnings from lgtm.com (#5765) 21 February 2021, 20:19:05 UTC
5545e61 Add initial support for RISCV RVV features (#5758) 1. `mabi` is introduced because target triple doesn't encode ABI info. 2. Use "+m,+a,+f,+d,+c" as default RISCV mattrs if it is not explicitly speicified. It is equivalent to "rv[32|64]gc" configuration. 3. HL_LLVM_ARGS="-riscv-v-vector-bits-min 128" is required to enable fixed-length vectors to RVV codegen. 19 February 2021, 18:03:40 UTC
a0fde42 Eradicate reinterpret() from runtime_internal.h (#5748) * Eradicate reinterpret() from runtime_internal.h The reinterpret<>() function in runtime_internal.h is dangerous and should not be allowed to exist: it copies from one type to another, copying the minimum of the two sizes, but with no allowance for ensuring that any "extra" bits in the destination are filled in with, well, anything. I have no evidence of a bug specifically caused by this, but a usage such as `reinterpret<uint64>t>(some_ptr)` on a 32-bit runtime will leave the upper 32 bits of the result undefined. Maybe they'll never get used, but this function is a bug waiting to happen. Fortunately, the only usages of this evil thing are all in hexagon_host.cpp, and all are used to convert between a uint64_t and an ion_device_handle* (which is sometimes represented as a void*). Replacing the handful of usages with bespoke functions to do these conversions seems much safer. 17 February 2021, 22:48:03 UTC
be8c727 Enable bugprone-macro-parentheses in clang-tidy, adding parens or annotating exceptions with NOLINT where necessary. (#5749) 17 February 2021, 22:16:39 UTC
75f2da2 Enable bugprone-incorrect-roundings for clang-tidy (#5750) * Enable bugprone-incorrect-roundings for clang-tidy, and fix the offending code. * Fix * lround -> llround in a couple of places 17 February 2021, 22:16:31 UTC
df28da9 Allow device copies inside a loop (#5741) Enable schedules that do device copies inside loops inside the compute_at location, for example due to sliding windows. 17 February 2021, 18:58:29 UTC
4f10827 Enable modernize-redundant-void-arg in clang-tidy and apply autofixes (#5752) 17 February 2021, 18:21:11 UTC
703531d Workaround for issue #5738 (#5739) * Don't run test on flaky bot * Add TODO 17 February 2021, 17:43:58 UTC
9586b93 Enable modernize-use-bool-literals for clang-tidy and apply automatic fixes. (#5751) 17 February 2021, 17:30:47 UTC
d053463 Fix subsetted build issues with arm32/64, hexagon. Fixes #5744. Fixes #5628 (#5745) 17 February 2021, 06:12:43 UTC
5a8432b Fix clang-tidy warnings in src/autoschedulers (#5746) * Upgrade clang-tidy rules to use v11 rather than v10. * Fix clang-tidy warnings in src/autoschedulers A handful of minor fixes allows us to remove the custom .clang-tidy for src/autoschedulers entirely. (Note that this PR is additive to https://github.com/halide/Halide/pull/5743, which must land first.) 17 February 2021, 01:58:38 UTC
336a623 Upgrade clang-tidy rules to use v11 rather than v10. (#5743) 17 February 2021, 01:57:36 UTC
2daef0c Upgrade clang-format rules to use v11 rather than v10. Reformat code as necessary due to trivial differences in v10 vs v11. (#5742) 16 February 2021, 23:27:55 UTC
11e7946 Fix downsample boundary condition, optimize schedule, generate other outputs. (#5737) 16 February 2021, 18:52:10 UTC
222776f Do something like sqrt-free Cholesky for BGU (#5281) * Do something like sqrt-free Cholesky for BGU This produces fewer update definitions and less total math in the solve step, saving some code size. No significant impact on performance for this app, because the solve step is done at low res, but it's theoretically more satisfying, and it's nice to have the symmetric version of the matrix solve available as a reference. 16 February 2021, 06:04:08 UTC
b4c4c73 Fix goofy local laplacian upsample (#5736) 15 February 2021, 21:10:27 UTC
f819d73 Better handling of sparse blurs using udot/sdot (#5735) Tweak vector deinterleaving to recognize a new pattern that comes up when you want to use udot/sdot in sparse convolution 14 February 2021, 17:37:18 UTC
dfbe346 A few small debug output improvements (#5732) * A few small debug output improvements. * Fix dumb mistake * clang-format * Trying to avoid wiping out my compiler cache is a bad idea 13 February 2021, 00:10:30 UTC
39e81fd Simplify some Generator code (#5731) * Simplify some Generator code * fix typo 11 February 2021, 22:32:51 UTC
584a1f4 Remove unnecessary .PHONY from python_bindings/Makefile (#5729) * Remove unnecessary .PHONY from python_bindings/Makefile * Silence compiler warning 11 February 2021, 17:37:18 UTC
fa92b83 Minor refactoring in Generator code (#5728) 11 February 2021, 01:42:45 UTC
d92ad17 Add support for AV512 VNNI instructions (#5725) 11 February 2021, 00:57:16 UTC
c8ca344 Treat the cost of a mux as just the cost of evaluating the args. (#5727) This pessimistically assumes the mux doesn't get unrolled. 10 February 2021, 22:23:51 UTC
d9cd751 Fix regression in 32d5f71cef5ecea883c6fc26305181e555716d01 (#5726) 10 February 2021, 20:37:12 UTC
935b91e runtime: fix pointer calculations to avoid overflows. Fixes #5713. (#5716) * runtime: fix pointer calculations to avoid overflows. Fixes #5713. * HalideBuffer: add explicit parenthesis in pointer arithmetic Not needed for correctness, but @dsharletg feels it adds clarity. 10 February 2021, 18:06:18 UTC
aa67665 Fix apps/HelloPyTorch (#5722) 10 February 2021, 07:26:02 UTC
4f61b2e Add more debug output to mul_div_mod in hopes of tracking down #5634) (#5724) 10 February 2021, 04:53:04 UTC
fe0888b Refactor code for dealing with default values of scalar Params (#5720) The "default" value for a scalar param is rarely used -- it's currently only possible to specify foran Input in a Generator, and that value only shows up in the generated metadata for AOT compilation. This PR refactors this so that instead of being maintained solely as a hack in Generator data structures, it's moved into Parameter as its own field. This seems like a lot of work some something of marginal use, but I'm reluctant to suggest removing it entirely (it's possible it could break someone's code), and refactoring it in this way will make some subsequent Generator refactoring easier to understand and review. 09 February 2021, 18:00:58 UTC
3fbb12a Add support for AVX512 BF16 dot product (#5712) * Add support for AVX512 BF16 dot product * Match on f32*f32 * Remove f32 check 09 February 2021, 18:00:02 UTC
3e034d6 Remove unnecessary #include from RDom.cpp (#5718) 08 February 2021, 23:05:30 UTC
1b22dfe Add support for AVX512 f32x32 to bf16x32 conversion (#5711) The vcvtne2ps2bf16 instruction combines two f32x16 vectors and converts them to one bf16x32 vector. We can use this to support converting a f32x32 vector to bf16x32 vector by splitting the input vector into two. 05 February 2021, 20:06:34 UTC
8ee7f4c Add mux intrinsic (#5707) Add mux intrinsic 05 February 2021, 20:01:32 UTC
27be859 Deprecate old-style realize() methods (#5676) * Deprecate old-style realize() methods We had 5 extra variants of realize() (for 0-dim thru 4-dim cases); these are a holdover from both having a limit of 4 dimensions (ie, pre-halide_buffer_t) and also from pre-C++11 (ie, passing in initializer-lists of int was less convenient). Let's deprecated these for Halide 12 and remove them in Halide 13. 05 February 2021, 19:58:54 UTC
32d5f71 Don't make new IR nodes if nothing changed when simplifying (#5698) * Don't make new IR nodes if nothing changed Some statements and expressions get crafted anew each time when repeatedly resimplified. This PR fixes all the cases I could find. 05 February 2021, 00:44:07 UTC
3c0b9e4 Provide wrapper around 128 bit cvtneps2bf16 (#5704) * Provide wrapper around 128bit cvtneps2bf16 * Include module with AVX512 feature Co-authored-by: Steven Johnson <srj@google.com> 04 February 2021, 18:43:22 UTC
793b7f6 HalideBuffer: cast offset operand to ptrdiff_t to avoid overflow (#5706) * HalideBuffer: cast offset operand to ptrdiff_t to avoid overflow * HalideBuffer: cast offset_of operand to ptrdiff_t to avoid overflow * HalideRuntime: ptrdiff_t casts to avoid overflow 04 February 2021, 17:40:41 UTC
a0fc7fa Add fixes to overflow analysis in bounds inference (#5618) * add fixes to overflow analysis in bounds inference Co-authored-by: Steven Johnson <srj@google.com> 04 February 2021, 01:13:53 UTC
dadbcbf Check Sapphire Rapids AVX512BF16 support in runtime (#5702) The Sapphire Rapids target feature controls whether BF16 and VVNI X86 instructions are emitted. Support for both of these is checked in the Halide compiler, but BF16 is not checked in the runtime as it required extending the cpuid functionality. Now we have that cpuid functionality we can add the BF16 check. 03 February 2021, 21:58:00 UTC
f64525e [HVX] Correct simd-op-check-hvx (#5703) Add "+hvxv6x" to mattrs Correct isa_version in simd_op_check_hvx Co-authored-by: Ankit Aggarwal <aankit@quicinc.com> 03 February 2021, 19:29:29 UTC
d8c95dd Capture Exprs by ref in IRMatch (#5696) * Capture Exprs by ref in IRMatch * Forbid rvalue Exprs passed to IRMatcher nodes * Clarify what is_const refers to * Fix min int constant * Add explanatory comment * Remove assert that was blowing up simplifier stack frames 03 February 2021, 17:38:21 UTC
c3cb54b Add missing headers (#5700) 03 February 2021, 16:49:38 UTC
bdfa994 async deserves its own line (#5701) 03 February 2021, 16:47:16 UTC
3ba8691 fix lens_blur estimates (#5694) 03 February 2021, 00:21:48 UTC
265f2c7 Add initial support for Sapphire Rapids AVX512 features (#5677) * Add Sapphire Rapids target feature * Add initial avx512_sr test * Guard against earlier LLVM versions * Move feature to other AVX512 features * Set earlier features when SapphireRapids selected * trigger buildbots * Add SapphireRapids to get_runtime_compatible_target * Add issue link to TODO comments * Add user errors if using unsupported feature 02 February 2021, 22:04:41 UTC
89329d3 Disable generator_aot_gpu_multi_context_threaded under wasm (#5692) Re-enabled wasm testing, which revealed breakage of the generator_aot_gpu_multi_context_threaded target (the runtime isn't being linked properly). Since this test isn't useful under wasm at the present time anyway (relies on GPU support), just skipping it entirely for those targets. 02 February 2021, 19:36:21 UTC
99c5583 [adams2019] Restructure autoscheduler + add timer (#5654) * add feature caching and block caching to adams2019 autoscheduler * added caching verification for feautures * clean up TODOs and commented out src * add docstring * clean up final TODOs * rm double declaration * make clang format happy * rm stats that caused linker error * fix cmake * fix clang format too * remove caching from adams2019 restructuring * move unnecessary member functions + add descriptions for State member functions * fix clang tidy in LoopNest.* * add top level comment to State struct 02 February 2021, 17:49:31 UTC
100bc76 Add ecx support for runtime X86 cpuid (#5684) * Add ecx support for runtime X86 cpuid Some newer X86 extensions require setting ecx when calling cpuid, for example AVX512BF16 support is queried using cpuid(eax=7, ecx=1). * trigger buildbots Co-authored-by: Steven Johnson <srj@google.com> 02 February 2021, 17:47:34 UTC
f941376 Lower halving_* intrinsics without widening (#5686) * Lower halving_* intrinsics without widening * Fix typo * Extend the test to check for correctness of the halving_* lowering * Change vector to array 02 February 2021, 04:12:21 UTC
25eeea1 ScopedFile in write_debug_image needs sanity check (#5689) Specifically, don't call fclose() on a null ptr, as that can crash. Also added some code to avoid implicit int->bool conversions. 01 February 2021, 23:11:24 UTC
a471e59 Add a spin before the cond_wait in the thread pool (#5408) * Add a spin before the cond_wait in the thread pool * Spin on a mutex even if someone is parked 01 February 2021, 21:01:56 UTC
9743fca Make GPU kernel compilation caching consistent across GPU backends. (#5546) Make GPU context handling more consistent and use a common compilation cache for kernels. Introduces a finalization routine for kernel compilation to indicate when kernels are not strictly required to be defined. Thus allowing them to be unloaded or discarded, but not when they are needed. Co-authored-by: Steven Johnson <srj@google.com> Co-authored-by: Marcos Slomp <slomp@adobe.com> 01 February 2021, 19:50:40 UTC
e517946 Delete llvm_builder.yml (#5685) 01 February 2021, 19:27:24 UTC
1155bda Avoid bogus out-of-memory error for multiple_scatter under wasm (#5681) 29 January 2021, 21:31:06 UTC
6118a62 Disable a few more wasm-simd ops in simd_op_check (#5679) Recent changes to the final wasm-simd spec means that some instructions aren't being generated (and may not even exist in the same form). Commented out for now; we need to revisit this once the LLVM backend for wasm gets closer to up-to-date with the final spec. 29 January 2021, 17:27:30 UTC
288526c Encapsulate a few more symbols (#5672) * Encapsulate more symbols. 28 January 2021, 17:52:16 UTC
f427ad1 Remove deprecated variants of infer_input_bounds() in the Python bindings (#5673) * Remove deprecated variants of infer_input_bounds() in the Python bindings The C++ versions were removed for Halide 12 already, but I missed the Python wrappers. * trigger buildbots 28 January 2021, 17:51:11 UTC
813eadc Fix target detection for i686 (#5675) 28 January 2021, 09:41:44 UTC
a8299b5 Allow LLVM-13 and Clang-13 (#5674) 27 January 2021, 21:22:40 UTC
6e3fb56 FIx intermittent OSX Python crash (#5667) * FIx intermittent OSX Python crash The OSX buildbot has been crashing intermittently on some python tests; debugging showed that in some situations, Introspection's calls to `backtrace()` include bogus addresses (eg 0x08), which cause segfaults when you try to inspect memory near them. The reasons for this aren't entirely clear -- for instance, it only seems to repeat reliably when using the Makefile rather than CMake, and only when doing an 'out-of-tree' build. Rather than try to run this to ground further, this PR just checks for address fields that seem obviously unreasonable (first 256 bytes of address space) and ignore them. * Add -fno-omit-frame-pointer, update sanity check * Update Introspection.cpp 26 January 2021, 04:09:53 UTC
d4c27ca Lower saturating arithmetic without widening (#5662) * Lower saturating arithmetic without widening, and handle it in lower_intrinsic. * clang-format, fix saturating sub * cout -> cerr * trigger buildbots Co-authored-by: Alex Reinking <alex.reinking@gmail.com> Co-authored-by: Steven Johnson <srj@google.com> 25 January 2021, 21:51:57 UTC
38be3e3 Add rounding shift right instructions (#5664) Co-authored-by: Ankit Aggarwal <aankit@quicinc.com> 23 January 2021, 01:06:20 UTC
7cff481 Fix VSX min/max intrinsics. Fixes #5661. (#5663) 22 January 2021, 21:18:47 UTC
6b398a3 Better codegen for switch-statement-like if-else chains (#5595) Better codegen for switch-statement-like if-else chains And added a test that demonstrates writing a little interpreter in Halide and scheduling it. 22 January 2021, 18:23:48 UTC
8c57a1a Use linker tools on OSX & Linux to limit exports (#4651) (#5659) * Use linker scripts on OSX & Linux to limit exports * Write script to detect appropriate linker flags. Co-authored-by: Alex Reinking <alex.reinking@gmail.com> 21 January 2021, 22:07:20 UTC
be7a6a3 is_positive_const and is_negative_const broken for (some) casts (#5615) * let signed_const checkers fail for non-widening integral casts Co-authored-by: Steven Johnson <srj@google.com> 21 January 2021, 21:23:43 UTC
0ca0415 Remove all deprecated methods for Halide 12 (#5656) * Remove all deprecated methods for Halide 12 These were all marked as deprecated in Halide 11 (and probably Halide 10 too); let's go ahead and remove them in Halide 12. * Remove function bodies too 21 January 2021, 01:59:10 UTC
back to top