6a6d3ab | Steven Johnson | 20 September 2021, 19:50:17 UTC | foo | 20 September 2021, 19:50:17 UTC |
0c9f853 | Steven Johnson | 20 September 2021, 19:50:10 UTC | foo | 20 September 2021, 19:50:10 UTC |
4f9e6cc | Steven Johnson | 20 September 2021, 19:47:55 UTC | Revert "wip" This reverts commit 96cac00961fa73daff23b1b2ab4fc2eaae3bb2f0. | 20 September 2021, 19:47:55 UTC |
96cac00 | Steven Johnson | 20 September 2021, 19:30:24 UTC | wip | 20 September 2021, 19:30:24 UTC |
fc322d8 | Steven Johnson | 20 September 2021, 19:29:01 UTC | Update run-iwyu.sh | 20 September 2021, 19:29:01 UTC |
9885ae3 | Steven Johnson | 20 September 2021, 19:28:30 UTC | Update run-iwyu.sh | 20 September 2021, 19:28:30 UTC |
f638c71 | Steven Johnson | 20 September 2021, 19:20:27 UTC | Update PyHalide.h | 20 September 2021, 19:20:27 UTC |
180bf94 | Steven Johnson | 20 September 2021, 19:16:17 UTC | Update Generator.h | 20 September 2021, 19:16:17 UTC |
4ae6aa3 | Steven Johnson | 20 September 2021, 19:07:24 UTC | Update WasmExecutor.cpp | 20 September 2021, 19:07:24 UTC |
b2071f1 | Steven Johnson | 20 September 2021, 19:06:17 UTC | Update Generator.h | 20 September 2021, 19:06:17 UTC |
66e7557 | Steven Johnson | 20 September 2021, 19:01:33 UTC | Update halide_iwyu_mapping.imp | 20 September 2021, 19:01:33 UTC |
6f13843 | Steven Johnson | 20 September 2021, 18:59:53 UTC | wer | 20 September 2021, 18:59:53 UTC |
c19367b | Steven Johnson | 20 September 2021, 18:50:15 UTC | Update halide_iwyu_mapping.imp | 20 September 2021, 18:50:15 UTC |
c6209ad | Steven Johnson | 20 September 2021, 18:46:36 UTC | Update halide_iwyu_mapping.imp | 20 September 2021, 18:46:36 UTC |
169e673 | Steven Johnson | 20 September 2021, 18:38:00 UTC | Update halide_iwyu_mapping.imp | 20 September 2021, 18:38:00 UTC |
c89f092 | Steven Johnson | 20 September 2021, 18:35:42 UTC | Update halide_iwyu_mapping.imp | 20 September 2021, 18:35:42 UTC |
b127d71 | Steven Johnson | 20 September 2021, 18:33:22 UTC | Update WasmExecutor.cpp | 20 September 2021, 18:33:22 UTC |
8c76872 | Steven Johnson | 20 September 2021, 18:26:05 UTC | Update halide_iwyu_mapping.imp | 20 September 2021, 18:26:05 UTC |
350c0d2 | Steven Johnson | 20 September 2021, 18:24:59 UTC | Update InjectHostDevBufferCopies.cpp | 20 September 2021, 18:24:59 UTC |
4e09994 | Steven Johnson | 20 September 2021, 18:07:53 UTC | Update LLVM_Headers.h | 20 September 2021, 18:07:53 UTC |
d8ce121 | Steven Johnson | 20 September 2021, 18:07:44 UTC | Update halide_iwyu_mapping.imp | 20 September 2021, 18:07:44 UTC |
c883b37 | Steven Johnson | 20 September 2021, 18:01:40 UTC | wer | 20 September 2021, 18:01:40 UTC |
4dbc4fb | Steven Johnson | 20 September 2021, 17:31:15 UTC | Update halide_iwyu_mapping.imp | 20 September 2021, 17:31:15 UTC |
0c4d1d4 | Steven Johnson | 20 September 2021, 17:19:51 UTC | wer | 20 September 2021, 17:19:51 UTC |
aae6baa | Steven Johnson | 18 September 2021, 02:05:51 UTC | Update LLVM_Headers.h | 18 September 2021, 02:05:51 UTC |
6d00ed2 | Steven Johnson | 18 September 2021, 02:04:16 UTC | Update LLVM_Headers.h | 18 September 2021, 02:04:16 UTC |
56c3f7b | Steven Johnson | 18 September 2021, 01:55:24 UTC | Update run-iwyu.sh | 18 September 2021, 01:55:24 UTC |
1ffea30 | Steven Johnson | 18 September 2021, 01:52:37 UTC | Update run-iwyu.sh | 18 September 2021, 01:52:37 UTC |
c26b0fc | Steven Johnson | 18 September 2021, 01:41:47 UTC | Update iwyu.imp | 18 September 2021, 01:41:47 UTC |
c4ba20d | Steven Johnson | 18 September 2021, 01:41:22 UTC | Update iwyu.imp | 18 September 2021, 01:41:22 UTC |
9e7e41f | Steven Johnson | 18 September 2021, 01:39:34 UTC | Update run-iwyu.sh | 18 September 2021, 01:39:34 UTC |
8f93f88 | Steven Johnson | 18 September 2021, 01:33:09 UTC | Update iwyu.imp | 18 September 2021, 01:33:09 UTC |
c1d1e35 | Steven Johnson | 18 September 2021, 01:26:57 UTC | Update run-iwyu.sh | 18 September 2021, 01:26:57 UTC |
ca31b52 | Steven Johnson | 18 September 2021, 01:24:31 UTC | Update run-iwyu.sh | 18 September 2021, 01:24:31 UTC |
ec3071c | Steven Johnson | 18 September 2021, 01:23:15 UTC | Update run-iwyu.sh | 18 September 2021, 01:23:15 UTC |
b0dde51 | Steven Johnson | 18 September 2021, 01:20:17 UTC | sfd | 18 September 2021, 01:20:17 UTC |
6655301 | Steven Johnson | 18 September 2021, 00:42:33 UTC | Update run-iwyu.sh | 18 September 2021, 00:42:33 UTC |
2963816 | Steven Johnson | 18 September 2021, 00:37:41 UTC | Update run-iwyu.sh | 18 September 2021, 00:37:41 UTC |
288892a | Steven Johnson | 18 September 2021, 00:21:42 UTC | Update run-iwyu.sh | 18 September 2021, 00:21:42 UTC |
52a4f38 | Steven Johnson | 18 September 2021, 00:07:18 UTC | Create run-iwyu.sh | 18 September 2021, 00:07:18 UTC |
11ec1dc | Steven Johnson | 17 September 2021, 21:14:21 UTC | Remove remaining bits of LLVM10 support (#6245) Update various parts of the source that still assume LLVM10 is supported. (We probably need to update the buildbots as well before this lands.) | 17 September 2021, 21:14:21 UTC |
95198a7 | Michael Gharbi | 15 September 2021, 21:47:31 UTC | Remove torch/extension dependency and fix PyTorch example, adding the correct (GPU) autoscheduler (#6234) * fix static op warning * add backward * removes "torch/extension.h" dependency * suppress grad check warnings * simplifies CUDA PyTorch bindings, remove HL_PT_CUDA define * fix gradient for scalar parameter * fix style issues * Update CodeGen_PyTorch.cpp | 15 September 2021, 21:47:31 UTC |
2d0ac72 | Steven Johnson | 14 September 2021, 23:41:33 UTC | Modify hexagon_remote/Makefile to allow defining C++ #defines on the command line (#6243) * Modify hexagon_remote/Makefile to allow defining C++ #defines on the command line * Update Makefile | 14 September 2021, 23:41:33 UTC |
fdcf140 | Dillon Sharlet | 14 September 2021, 20:27:54 UTC | Improve the code quality of scalarized code (#6218) * Improve the quality of scalarized predicated loads. * Add assert * clang-format * We can't assume this assert won't be hit. * Fix implementation of single lane shuffles in deinterleave. | 14 September 2021, 20:27:54 UTC |
10e4e8e | Dillon Sharlet | 13 September 2021, 18:27:59 UTC | Fall through to LLVM for unknown shuffles on Hexagon (#6237) * Fall through to LLVM for unknown shuffles on Hexagon. * Don't fall back to LLVM, use vdelta instead. * Fix vdelta patterns | 13 September 2021, 18:27:59 UTC |
d80e5d8 | Andrew Adams | 10 September 2021, 19:31:17 UTC | Join generator watchdog timer if exception thrown (#6240) Make sure to join the generator watchdog timer when an exception is thrown. Otherwise the error message gets swallowed entirely and you just get "terminate called without an active exception" Also rejiggered the watchdog to do a timed wait on a condition variable instead of polling every 100ms. | 10 September 2021, 19:31:17 UTC |
b78b205 | Steven Johnson | 02 September 2021, 17:58:52 UTC | Upgrade clang-format and clang-tidy to LLVM-12 (#6233) | 02 September 2021, 17:58:52 UTC |
24d6bd6 | Steven Johnson | 29 August 2021, 20:02:34 UTC | Hoist unrolled prefetches to top of the block (#6230) * Hoist unrolled prefetches to top of the block When a loop with prefetch is unrolled, the prefetch instructions getting scattered through the loop can cause LLVM codegen issues in some cases (see https://bugs.llvm.org/show_bug.cgi?id=51172). As a partial mitigation for that issue, this PR adds a pass to hoist all prefetch instructions to the top of their loop. This is still a bit experimental; it definitely addresses the codegen issues we see, but makes the use of prefetch potentially less effective (since the hoisted prefetch may be too far from the eventual use to be effective). * appease clang-tidy * Avoid quadratic behavior * Use template instead of std::function * Require prefetch offset to be pure | 29 August 2021, 20:02:34 UTC |
085e11e | Andrew Adams | 27 August 2021, 21:00:14 UTC | Rename inner version of bounds_of_expr_in_scope (#6232) It's not in the explicit namespace that it's requested in (Halide::Internal), so turning on that debugging code results in compile failures. I just gave it a different name to disambiguate. | 27 August 2021, 21:00:14 UTC |
c860cab | Steven Johnson | 25 August 2021, 17:36:14 UTC | Add modernize-make-shared and modernize-make-unique to .clang-tidy and fix warnings (#6222) * Add modernize-make-shared and modernize-make-unique to .clang-tidy and fix warnings * std::initializer_list instead of std::vector * Update Pipeline.h | 25 August 2021, 17:36:14 UTC |
f43f016 | Steven Johnson | 24 August 2021, 22:25:57 UTC | More prefetch fixes (#6226) * More prefetch fixes - Arguments to Call::prefetch() must be scalars, not vectors - Add more testcases to correctness_prefetch Addresses more of #6219 (but still not the title issue, i.e. ignoring offset) * Fix horrific bug * Have CodeGen_C emit the same arguments for __builtin_prefetch() as the runtime module * Minor cleanup * Explicitly pass target thru * Fix correctness_prefetch for host-hvx * Add comments | 24 August 2021, 22:25:57 UTC |
d507b9a | Steven Johnson | 23 August 2021, 21:46:26 UTC | Fix bug in prefetch() (#6225) In #6155, we incorrectly assume that we can qualify the 'from' prefetch var by just adding 'prefix'; this isn't true if (e.g.) there are any splits involved. Instead, we need to walk through the active loops to find a suitable match. In addition, if no match is found, we now fail with an error (rather than quietly doing something undefined), as the 'from' var is required to be from an active loop. (Addresses some-but-not-all of #6219) | 23 August 2021, 21:46:26 UTC |
30040cd | Steven Johnson | 23 August 2021, 16:31:16 UTC | Prefetch cleanup (#6220) * Use std::move where appropriate * Prefetch cleanup This is (mostly) a cleanup pass to make the flow of Prefetch injection & lowering more obvious to the reader of the code (via commenting and minor code restructuring). Notable exception: the HVX backend processing of Call::prefetch (and relevant runtime code) was refactored to make it (IMHO) less janky. (Also some drive-by insertions of std::move where appropriate) | 23 August 2021, 16:31:16 UTC |
7c437e4 | Alexander Root | 22 August 2021, 20:27:39 UTC | fix #6207 (#6214) Co-authored-by: Steven Johnson <srj@google.com> | 22 August 2021, 20:27:39 UTC |
7aafbb9 | Steven Johnson | 20 August 2021, 22:53:06 UTC | Fix wasm simd issues (#6217) - f64x2.convert_low_i32x4_s/u are now generating proerly at top-of-tree, so re-enable them - f64x2.promote_low_f32x4 is temporarily broken for larger vector widths, so disable it for now (issue is reported and fix is underway) Also, driveby change to .gitignore. | 20 August 2021, 22:53:06 UTC |
ed5e1e1 | Steven Johnson | 20 August 2021, 22:25:50 UTC | Use C++17 structured binding instead of std::tie (#6213) * Use C++17 structured binding instead of std::tie * appease clang-tidy | 20 August 2021, 22:25:50 UTC |
7079ff2 | Steven Johnson | 20 August 2021, 19:11:41 UTC | Fix for upcoming LLVM API change (#6212) | 20 August 2021, 19:11:41 UTC |
06e8865 | Dillon Sharlet | 20 August 2021, 02:11:04 UTC | Fix issues with predicated interleaving stores on Hexagon (#6211) * Fix issues with predicated interleaving stores on Hexagon. * Fix buffer API usage issue. * Add default device API support for Hexagon. * More DeviceAPI support | 20 August 2021, 02:11:04 UTC |
c61a930 | Alexander Root | 19 August 2021, 17:06:04 UTC | Fix unroll failures from adams2019 when the Expr depends on estimates (#6200) * track depends_on_estimate in BoundsInfo - fix bounds_are_constant | 19 August 2021, 17:06:04 UTC |
d653a73 | Steven Johnson | 17 August 2021, 21:15:38 UTC | Add IRMutator::mutate_exprs() (#6203) * Add IRMutator::mutate_exprs() There's a common pattern in many IRMutators that is "mutate a vector<Expr> and optionally let me know if anything is different". (Note, this uses C++17 structured-binding syntax, which we previously weren't using in Halide. Objections?) This adds a shared utility method (well, two, thanks to VariadicVisitor) and plus in the usage in all the places that seemed obvious. I doubt this moves the needle on speed in either direction, but makes for smaller code. * Silence warnings * Update Inline.cpp * Update ParallelRVar.cpp * Update SplitTuples.cpp * Update StorageFlattening.cpp * Revisions | 17 August 2021, 21:15:38 UTC |
d811a3f | Steven Johnson | 16 August 2021, 20:34:58 UTC | More augmentation of debugging code (#6185) * More augmentation of debugging code This expands on #6182 by added tracking for BoxesTouched, and integrating the nesting levels with the previous code. This allows a more complete vision of what's happening during bounds calculation. * Minor fixes * Unexpose indent | 16 August 2021, 20:34:58 UTC |
72284a2 | Steven Johnson | 13 August 2021, 19:12:07 UTC | unsafe_promise_clamped() should be pure (#6199) As discussed in https://github.com/halide/Halide/pull/6189, this intrinsic should probably be Pure. | 13 August 2021, 19:12:07 UTC |
a081660 | Zalman Stern | 12 August 2021, 21:06:38 UTC | Add information to comment on ```align_loads```. (#6196) * Add information to comment. * Wording improvement. | 12 August 2021, 21:06:38 UTC |
3b7e1ba | Steven Johnson | 12 August 2021, 20:55:14 UTC | [hannk] Remove alignment requirements for shallow DepthwiseConv ops (#6198) * [hannk] Remove alignment requirements for shallow DepthwiseConv ops * Update depthwise_conv_generator.cpp | 12 August 2021, 20:55:14 UTC |
6229afa | Steven Johnson | 12 August 2021, 20:15:28 UTC | Upgrade apps/hannk to TFLite 2.6 (#6197) * Upgrade apps/hannk to TFLite 2.6 * Remove scalpel left in patient | 12 August 2021, 20:15:28 UTC |
69075b4 | Steven Johnson | 12 August 2021, 19:51:15 UTC | Internal::promise_clamped() should be pure (Fixes #6186) (#6189) * ApplySplit should use pure promise_clamped() (Fixes #6186) * Make all promise_clamped calls pure * pure_promise_clamped -> promise_clamped | 12 August 2021, 19:51:15 UTC |
2394250 | aankit-ca | 12 August 2021, 17:06:00 UTC | [Hexagon] Do not pattern match inside if_then_else block (#6194) * Do not pattern match inside if_then_else block Resolves the compilation below compilation error while generating hannk::upsample_channels_uint8 from hannk/depthwise_conv.generator: Unknown intrinsic dynamic_shuffle The problem occurs when we pattern match hvx instrinsics inside if_then_else nodes and try to scalarize them later. In the patch we prevent matching these intrinsics inside if_then_else blocks. * Do not match for only vector types * pattern match for scalars and scalar-broadcasts Co-authored-by: Ankit Aggarwal <aankit@quicinc.com> | 12 August 2021, 17:06:00 UTC |
43b412b | Steven Johnson | 12 August 2021, 17:02:52 UTC | Update WABT version to latest release (1.0.24) (#6193) | 12 August 2021, 17:02:52 UTC |
b7fa882 | Steven Johnson | 11 August 2021, 21:28:10 UTC | Fix unused-variable warning-as-error (#6192) The latest Emscripten compilers will complain about this. | 11 August 2021, 21:28:10 UTC |
67802cf | Steven Johnson | 11 August 2021, 18:53:36 UTC | Add memmove to WasmExecutor callbacks (#6191) Some not-yet-landed variants of the wasm toolchain+runtime environments need this. | 11 August 2021, 18:53:36 UTC |
e8b5837 | Steven Johnson | 11 August 2021, 17:33:13 UTC | Add a watchdog timer to Generator (#6184) * Add a watchdog timer to Generator In degenerate conditions (eg, bugs in Halide or LLVM, or pathological user code), running a Generator can take arbitrarily long times (we recently found some buildbots that had Generators that had been running for several days). This adds a simple background thread to generator_main() to ensure that compilations don't take unreasonable lengths of time. It defaults to 15 minutes of wall-clock time, but can be customized by the -t flag. * Update Generator.cpp | 11 August 2021, 17:33:13 UTC |
fb44637 | Steven Johnson | 10 August 2021, 22:26:10 UTC | Augment debugging code (#6182) * Augment debugging code I upgraded some debugging code in AddImageChecks and Bounds while tracking down a bug, and I think the upgrades are worth keeping for future use. * clang-format * Minor changes per comments * clang-format | 10 August 2021, 22:26:10 UTC |
d249fa0 | Andrew Adams | 06 August 2021, 19:24:14 UTC | Update tutorial todos (#6161) This is based on our discussion in the dev meeting. Feel free to suggest changes. | 06 August 2021, 19:24:14 UTC |
451cfa8 | Steven Johnson | 06 August 2021, 15:58:33 UTC | Add argv and metadata support to C++ backend (Issue #2071) (#6179) * Add argv and metadata support to C++ backend (Issue #2071) * legalize_name-> c_print_name * Fix user_context handling | 06 August 2021, 15:58:33 UTC |
2e229f5 | Evan Lee | 04 August 2021, 04:54:30 UTC | Rewrite Rules Evaluation Project - Merging Relevant Synthesized Rewrite Rules (#6174) Conducted experiments to analyze the performance effects of adding 4000+ synthesized rewrite rules to Halide. Narrowed down the rules to 11 rewrite rules whose associative & commutative variants are added in this PR. With these rewrite rules, Halide achieves >10% peak memory reductions in 192 cases in apps including camera_pipe, harris, nl_means, and stencil_chain, which is similar to the results (with all 4000+ rules) from this paper - https://dl.acm.org/doi/pdf/10.1145/3428234 | 04 August 2021, 04:54:30 UTC |
8b26454 | Steven Johnson | 03 August 2021, 20:06:53 UTC | Add more fine-grained prefetch() directive (Issue #3735) (#6155) Add more fine-grained prefetch() directive (Issue #3735) | 03 August 2021, 20:06:53 UTC |
4f8629c | Steven Johnson | 03 August 2021, 00:49:54 UTC | Fix broken wasm-simd extmul instructions due to changes from https://reviews.llvm.org/D106724 (#6177) | 03 August 2021, 00:49:54 UTC |
0a09bfb | Steven Johnson | 02 August 2021, 21:14:12 UTC | Fix for trunk LLVM (#6176) * Fix for trunk LLVM * More Fixes | 02 August 2021, 21:14:12 UTC |
e52d6ca | Alex Reinking | 31 July 2021, 04:43:06 UTC | Fix Xcode issue that requires at least one source file when building a library from objects. (#6175) * Fix Xcode issue that requires at least one source file when building a library from objects. Fixes #6167 * add newline to end of file | 31 July 2021, 04:43:06 UTC |
a7e8c43 | Dillon Sharlet | 29 July 2021, 15:53:11 UTC | Partial revert of 8f849ae6514e83f8bf94d05e452a467df352f74c (only (#6173) reverting halide_remote.cpp). | 29 July 2021, 15:53:11 UTC |
36f6b8c | Alex Reinking | 28 July 2021, 17:53:33 UTC | Use generic build command instead of make. Fixes #6163 (#6169) | 28 July 2021, 17:53:33 UTC |
2b8ec44 | Steven Johnson | 27 July 2021, 14:57:41 UTC | Remove deprecated realize() Python wrapprs (#6162) The C++ versions were removed in #6122, but the Python equivalents were overlooked. | 27 July 2021, 14:57:41 UTC |
a5585cb | Alexander Root | 27 July 2021, 02:07:40 UTC | Add various bounds-related simplifier rules (#6160) * add simplifier rules | 27 July 2021, 02:07:40 UTC |
2ab9a56 | Shoaib Kamil | 24 July 2021, 13:55:32 UTC | De-predicate loads and stores in Metal/OpenCL/D3D12 backend (#6158) * Depredicate loads and stores in Metal backend * Fix typo. * Mark override, add additional using * float_t -> float * Update CMakeLists.txt * clang-format * Also scalarize in D3D12 and OpenCL * use const_true() helper | 24 July 2021, 13:55:32 UTC |
b68393c | Steven Johnson | 21 July 2021, 23:08:12 UTC | [hannk] Add a --csv flag to compare_vs_tflite (#6149) * [hannk] Add optional taskset support to the run_on_device scripts * [hannk] Add a --csv flag to compare_vs_tflite This lets us output results in CSV format for easy copy/paste into (eg) spreadsheets. | 21 July 2021, 23:08:12 UTC |
025a9b9 | Dillon Sharlet | 21 July 2021, 22:11:55 UTC | Handle depth_multiplier != 1 in a separate op (#6154) * Implement depth_multiplier != 1 in a separate op. * Fix build on GCC * Remove stale comment * clang-format * Add more comments to inv_depth_multiplier | 21 July 2021, 22:11:55 UTC |
9d7284b | Dillon Sharlet | 20 July 2021, 20:50:14 UTC | Move quantization to a helper function depending on the target (#6150) * Move quantization + relu to a helper function depending on the target. * clang-format * x86 has these too actually * Fix typo | 20 July 2021, 20:50:14 UTC |
5ca8cdf | Dillon Sharlet | 20 July 2021, 16:43:26 UTC | Generalize Conv2D to be a Conv of any dimensionality (#6146) * Generalize Conv2D to be a Conv of any dimensionality. * clang-format | 20 July 2021, 16:43:26 UTC |
5812f33 | Volodymyr Kysenko | 20 July 2021, 15:45:49 UTC | Configurable minimum size for alignment in align_loads (#6143) Co-authored-by: Steven Johnson <srj@google.com> | 20 July 2021, 15:45:49 UTC |
b457d3c | Steven Johnson | 20 July 2021, 02:16:25 UTC | Add support for int16 output in Conv2D (#6145) This allows us to convert all (currently supported) FC ops into Conv2D ops. Remove all the FC-specific Halide and Op code. | 20 July 2021, 02:16:25 UTC |
9d1e1e3 | Steven Johnson | 20 July 2021, 00:33:52 UTC | [hannk] Rewrite FC in terms of Conv2D (#6144) * [hannk] Rewrite FC in terms of Conv2D FullyConnected is very similar to Conv2D, so rather than maintaining multiple similar implementations, let's translate a FullyConnected node into a Conv2D node (with some Reshape nodes as necessary). Note that we keep the old FC logic for int16 outputs, as Conv2D doesn't support those yet; if this PR is landed, a followup PR will add that ability to Conv2D, and the existing FC support will be removed entirely. | 20 July 2021, 00:33:52 UTC |
bd7ebf5 | Steven Johnson | 19 July 2021, 19:27:44 UTC | Fix for top-of-tree LLVM (#6142) * Fix for top-of-tree LLVM | 19 July 2021, 19:27:44 UTC |
557c8e4 | Dillon Sharlet | 16 July 2021, 17:56:05 UTC | Fix Hexagon vrmpy with 16-bit results (#4248) (#6137) * Fix #4248 * clang-format | 16 July 2021, 17:56:05 UTC |
769b855 | Dillon Sharlet | 15 July 2021, 16:22:28 UTC | Add optimization for corner case in conv (#6139) * Add silly optimization for weird cases. * Use transpose | 15 July 2021, 16:22:28 UTC |
42e1d45 | Steven Johnson | 15 July 2021, 16:02:00 UTC | [hannk] Allow aliasing of Reshape tensors (#6138) * Allow aliasing of Reshape tensors Previously we didn't allow this because aliased tensors had to have the same rank, which is ~never the case for Reshape. Aliasing for Reshape is a huge win because it essentially becomes a no-op rather than a memcpy. Running against standard set of models shows no regression in differences vs. tflite. | 15 July 2021, 16:02:00 UTC |
19f2bc7 | Dillon Sharlet | 14 July 2021, 00:20:25 UTC | Reduce verbosity of compare_vs_tflite further (#6136) | 14 July 2021, 00:20:25 UTC |
802c22a | Andrew Adams | 13 July 2021, 23:13:02 UTC | Don't reinterpret cast when codegenning vector concat (#6125) It confuses the HVX LLVM backend, and shouldn't be necessary anyway. | 13 July 2021, 23:13:02 UTC |
77207a5 | Dillon Sharlet | 13 July 2021, 23:04:21 UTC | Optimize shallow depthwise convolutions (#6134) * Add TailStrategy::PredicateLoads and TailStrategy::PredicateStores * Different compilers * PredicateStores is faster than specialize + ShiftInwards * Update comments. * Allow PredicateStores for RVars * Fix test to avoid realize bounds query issues. * Add comments. * clang-format * predicate* is not pure * Fix documentation bugs * Don't allow PredicateStores for reductions. * Substitute more strongly around Provide * Change these back to pure for now to satisfy some logic in ScheduleFunctions * Fix use after free of pred. * Update comments. * Refactor implementation of predication * Visit predicates * Partition loops with predicated loads/stores. * Clean up ApplySplit * Fix inappropriate predicated vectorization of VectorReduce * De-dup GuardWithIf and Predicate * These also handle scalar predicated loads/stores. * Print provide predicates * Don't allow predicated non-innermost splits. * Remove debugging code * Forgot to add new file * Add test to CMake build * Fix bug in simplification of extract_element * Fix issue with mixing uses of guarded expressions inside and outside calls. * Don't lift impure exprs. * clang-format * clang-format again * Add "shallow" version of depthwise for small numbers of channels. * Better name for input_stride_x * Fix performance regression in deep case. * Update performance * Missed rename * Enable tiling of shallow case. * Require x be a dummy dim for shallow depthwise * Small cleanup to avoid ternary * clang-format * Can't use shallow depthwise when stride_x != 1 | 13 July 2021, 23:04:21 UTC |
a762c34 | Dillon Sharlet | 13 July 2021, 21:54:11 UTC | Add TailStrategy::PredicateLoads and TailStrategy::PredicateStores (#6126) * Add TailStrategy::PredicateLoads and TailStrategy::PredicateStores * Different compilers * PredicateStores is faster than specialize + ShiftInwards * Update comments. * Allow PredicateStores for RVars * Fix test to avoid realize bounds query issues. * Add comments. * clang-format * predicate* is not pure * Fix documentation bugs * Don't allow PredicateStores for reductions. * Substitute more strongly around Provide * Change these back to pure for now to satisfy some logic in ScheduleFunctions * Fix use after free of pred. * Update comments. * Refactor implementation of predication * Visit predicates * Partition loops with predicated loads/stores. * Clean up ApplySplit * Fix inappropriate predicated vectorization of VectorReduce * De-dup GuardWithIf and Predicate * These also handle scalar predicated loads/stores. * Print provide predicates * Don't allow predicated non-innermost splits. * Remove debugging code * Forgot to add new file * Add test to CMake build * Fix bug in simplification of extract_element * Fix issue with mixing uses of guarded expressions inside and outside calls. * Don't lift impure exprs. * clang-format * clang-format again | 13 July 2021, 21:54:11 UTC |
867b6c8 | Steven Johnson | 13 July 2021, 20:01:21 UTC | [hannk] Make compare_vs_tflite with --verbose 0 less noisy (#6135) Minor fixes to eliminate noise. | 13 July 2021, 20:01:21 UTC |