https://github.com/halide/Halide

sort by:
Revision Author Date Message Commit Date
9f049d4 Update Util.cpp 21 October 2021, 16:51:27 UTC
ae0d0d5 Merge branch 'master' into srj/iwyu 21 October 2021, 16:29:27 UTC
ecf69b0 Add support for CUDA capability 8.6 (#6334) * Add support for CUDA capability 8.6 * add assertion to guard LLVM version * fallback to sm80 if LLVM < 13.0 21 October 2021, 04:16:55 UTC
27f975f Add ability to pass a user context in JIT mode (#6313) * Change type of first arg to all JITHandlers and expose struct to users * Make it possible to pass a custom JITUserContext per realize call * More comments * Fix type in python bindings * Fix type in python bindings * Fix more types in python bindings * Add user_context-accepting variants of other realize-like functions * Revert tests back to the way they are on master but with comments explaining why they are the way they are * Add example of passing a custom context to copy_to_host * Revert test to be closer to master. It was that way for a reason * The first arg to get_library_symbol isn't actually a user_context * Add copy_to_device example too * Fix python * Make bad_buf even worse * Comment clarifications 20 October 2021, 22:32:08 UTC
c3641b6 [hannk] augment L2NormOp to allow specifying axis (#6335) 20 October 2021, 00:16:04 UTC
d80bb23 Add a new unsigned division method (#6322) * Add a new unsigned division method It uses averages rounding up instead of averages rounding down, to reduce instruction count on x86. Division by 7 before: vpmulhuw .LCPI0_1(%rip), %ymm0, %ymm1 vpsubw %ymm1, %ymm0, %ymm0 vpsrlw $1, %ymm0, %ymm0 vpaddw %ymm1, %ymm0, %ymm0 vpsrlw $2, %ymm0, %ymm0 Division by 7 after: vpmulhuw .LCPI0_1(%rip), %ymm0, %ymm1 vpavgw %ymm0, %ymm1, %ymm0 vpsrlw $2, %ymm0, %ymm0 * Remove debugging code * Add comment elaborating on why this is a good idea 19 October 2021, 16:43:07 UTC
deeb6bc Rewrite double/triple narrowing from float on ARM (#6305) * Rewrite double/triple narrowing from float on ARM 19 October 2021, 16:40:16 UTC
7613f9d [hannk] Improve GatherOp (#6328) We (mostly) implemented GatherOp for TFLite's Gather op, but missed some things: - There's a batch_dim param for Gather that we were ignoring. I added code to fill it in, but we punt for values != 0, because I haven't yet found a test case that handles it. Should be easy to fill in when we do. - TFLite's GatherNd op (and NNAPI's GATHER op) allow for the indices arg to be multidimensional; I rewrote the code to handle this and it's passing the acceptance tests for NNAPI's cases. (It doesn't yet handle the GatherNd op because, again, I haven't found a good test case. Should be simple to do when we do.) 18 October 2021, 16:55:35 UTC
8d098de [hannk] Restructure BinaryOp to allow adding more temporary types (#6326) Change is a no-op as written, but I'd like to land it so this change doesn't get lost -- it's handy for debugging pipelines that happen to use op/type variants we don't yet support (eg arithmetic on floats), which can unblock the ability to run more tests (albeit not efficiently). 18 October 2021, 16:52:49 UTC
cd8146d [hannk] Fix override annotation in hannk (#6315) Minor hygiene: add explicit override annotations and enable the compiler warnings. (I was about to tweak some of the virtual functions and this has been bothering me for a while.) 18 October 2021, 16:47:48 UTC
923025a [hannk] Fix assert in dconv (#6320) 14 October 2021, 20:27:00 UTC
071f5f7 [hannk] Improve Op::dump() (#6314) * [hannk] Improve Op::dump() Rewrite the Op::dump() methods to be more verbose, so that we can determine all the details of the tensors used, and the hierarchy of OpGroups; also add a post-transform dump when verbosity >= 2. (I'm using this to track down a subtle bug, but landing this separate from other fixes seems appropriate) * Fixes 13 October 2021, 18:23:34 UTC
63cfd9d Substitute in all widening lets prior to find_intrinsics (#6307) * Look through lets in find_intrinsics If an Expr like: narrow((widen(x) + y + 1)/2) gets lifted into a let, the simplifier will then substitute things in like so: let foo = widen(x) + y in narrow(foo + 1)/2, potentially breaking a pattern. This is a general problem for patterns that widen, do some math, and then narrow. They will always get cut at the widening operation, so this PR just substitutes in all widening operations. This can't cause combinatorial blow-up, because each substitution has a wider type than the values that it depends on, so the chains can be at most 2-3 lets deep. * Make substituting in widening lets a prepass instead * Move find_intrinsics a little earlier in lowering * Handle impure subexpressions by leaving them behind at the original let site * FindIntrinsics must be after the last simplification pass 12 October 2021, 20:52:24 UTC
a351021 Demosaic should be done unsigned (#6308) So that we can use pavgw instructions and the like. Speeds it up slightly on x86 (5% or so) 12 October 2021, 19:20:33 UTC
3931213 [hannk] SpaceDepthOp isn't limited to u8 Tensors (#6311) The code as written should work on all Tensor types; we just need to require the input and output types match. 12 October 2021, 18:57:02 UTC
d4e45bd [hannk] Fix > and >= op implementations (#6312) a>b should be b<=a (not b <a) a>=b should be b<a (not b<=a) 12 October 2021, 18:56:43 UTC
89b36b4 [hannk] Fix faulty 'shallow' logic in dconv2d (#6309) * [hannk] Fix faulty 'shallow' logic in dconv2d 12 October 2021, 18:00:09 UTC
e058532 store_in(MemoryType::Stack) should use alloca if the size is small (#6289) * Test using a real alloca call instead of the pseudostack * Improve test and remove debugging prints * Fix test * Switch to heap based on cumulative size rather than current size and add a test case that illustrates why this matters. * Fix test that requires actual heap allocations * Make test actually test more than one trip through the loop * Fix alignment of stack allocation * Branching is cheaper than alloca(0) * Tweak test pass condition * Move shared constant to a single locations * Namespace shuffling * Fix comment location 11 October 2021, 21:56:51 UTC
1e40a71 Fix for top-of-tree LLVM (#6306) * Fix for top-of-tree LLVM * drive-by fix for other bad LLVM_VERSION checks 11 October 2021, 20:48:35 UTC
2a2c4b0 At some point llvm re-added pavgw intrinsics (#6302) * At some point llvm re-added pavgw intrinsics This is a good thing, because these do not reliably trigger from the pattern in runtime/x86.ll * Delete more dead code 10 October 2021, 21:06:57 UTC
2bfa567 Add ClampUnsafeAccesses pass. (#6294) * Add ClampUnsafeAccesses pass. Fixes #6131 Inject clamps around func calls h(...) when all the following conditions hold: 1. The call flows into an indexing context, such as: `f(x) = g(h(x))` or `let y = h(x) in f(x) = g(y)` 2. The FuncValueBounds of h are smaller than those of its type 3. h's allocation bounds might be wider than its compute bounds Condition (3) is not yet implemented see #6297. 08 October 2021, 18:52:29 UTC
c6529ed Modernize loops, part 4/final (#6296) * Modernize loops, part 4/final Final part getting code ready for clang-tidy's modernize-loop check, plus enabling the check * Update Module.cpp 08 October 2021, 01:21:42 UTC
0b297f2 Modernize loops, part 3 (#6295) * Modernize loops, part 3 Part 3 of getting code ready for clang-tidy's modernize-loop check * Update Func.cpp * Address review comments 07 October 2021, 20:04:57 UTC
e27db6f Don't set environment for RISCV Linux as apparently it is not (#6282) used. Should not change anything. Per issue: https://github.com/halide/Halide/issues/6281 07 October 2021, 16:26:52 UTC
ed87acb Modernize loops, part 2 (#6293) Part 2 of getting code ready for clang-tidy's modernize-loop check 07 October 2021, 00:50:51 UTC
9169734 [hannk] Add specialization for broadcast of input 0 (#6291) * [hannk] Add specialization for broadcast of input 0 Alternate fix for https://github.com/halide/Halide/pull/6290 that is Halide-only. * Update elementwise_generator.cpp * Oops, do Mul as well 06 October 2021, 20:51:06 UTC
71c47b3 Modernize loops, part 1 (#6292) Part 1 of getting code ready for clang-tidy's modernize-loop check: src/autoschedulers and src/runtime 06 October 2021, 20:37:26 UTC
81b34e2 Remove unbound variable in documentation (#6287) In the example for RDom::where, the simplified case contains a free occurence of `r.x`, which should be replaced with `10` since we are in the case `r.x == 10`. 05 October 2021, 16:32:59 UTC
da7c66e Make parking_control (etc) use vtables (#6275) * Make parking_control (etc) use vtables This class hierarchy is clearly best modeled with virtual methods (rather than fn ptrs), but was not; we *think* this was due to COMDAT issues that have been resolved by other means. I refactored this to use virtual methods instead (and removed the unused unpark_all function); it seems to work locally. * Add -fno-rtti to runtime compile flags (needed to allow vtables in runtime code) * make all overrides 'final' * Make virtual methods protected * Make structs final too * pacify clang-tidy 04 October 2021, 16:33:15 UTC
2495bcc Remove hopefully dead code. (#6280) 02 October 2021, 17:38:40 UTC
81ad45e compiler stack usage improvements (#6239) * Reduce compiler stack usage, and grant more control over stack usage I found some code in the wild that needs 9mb of stack to lower. It's a pain to even diagnose the problem definitively, because it requires plumbing platform-specific linker flags to grant more stack. This commit: - Reduces peak stack usage of similar code in the repo (the FFT) - Increases the stack size for lowering and codegen to 32mb on all platforms, using stack switching techniques. We started doing this on Windows a while ago and it hasn't bitten us, so let's try on more platforms. - Gives user control over the amount of stack used for lowering and codegen. It shouldn't be necessary except when diagnosing problems like this in future. Using the control I was able to determine that the correctness tests all pass with 500k of stack, and the apps all pass with 1MB, so 32MB ought to be enough for anybody. I found a never-checked-in test for the mux helper which uses 10MB of stack and really shouldn't need to, so I added that (and opened an issue) as an example of how to grant more stack when necessary, even though 10MB is less than our default now. Also fixed an incorrect comment on the Block node. * Fixes for macos * Add test to cmake * Fix type of temporary * Reduce number of exprs in the mux * Fix quadratic memory usage in new test * Better comment * Variable name fix * Try giving windows a little more stack * Clarify why we want a live Stmt in scope * Review comments * Check some return values * tickle buildbots * Fixes for arm macos * Remove stray character * clang-tidy had some reasonable concerns * Comment fix * Maybe windows needs yet more stack Co-authored-by: Steven Johnson <srj@google.com> 01 October 2021, 21:41:29 UTC
4b9f728 Remove more obsolete MachO/COMDAT workarounds (#6274) * Remove more obsolete MachO/COMDAT workarounds (Followup to #6272) * Update metal.cpp * A few more fixes 30 September 2021, 17:16:12 UTC
ef387ad Minor cleanups in thread_pool_common.h (#6276) Minor hygiene noticed when doing other patches: - prefer `constexpr int` over `#define`, since we can now use C++17 in runtime code - remove redundant def of MAX_THREADS - use `do .. while (0)` idiom for functional macros 30 September 2021, 17:03:07 UTC
a8d7013 Remove the runtime/ssp module (#6277) * Remove the runtime/ssp module It doesn't get included via *any* path in the runtime linker, and removing it doesn't seem to affect any tests. (I haven't looked at the revision history to see when it was added and/or when inclusion of it was removed.) * Update LLVM_Runtime_Linker.cpp * Update LLVM_Runtime_Linker.cpp 30 September 2021, 02:09:12 UTC
e092c01 Fix alignment issues in synchronization_common.h (#6272) * Fix alignment issues in synchronization_common.h To work around old COMDAT issues, we allocated the table as a char array and cast it to what we want; unfortunately this doesn't guarantee the right alignment for the table and in some environments (eg wasm) we can get unaligned-access failures. We could fix this by forcing the right alignment, but since we fixed COMDAT issues in another way a while back (adding smarts to LLVM_Runtime_Linker), let's just remove the hack and declare it normally. Also added some drive-by changes to ensure that the hashtable size and HASH_TABLE_BITS were safe (this happened to be the case before but wasn't enforced), and also to init all the fields in hash_bucket. (Q: do we really need `check_hash()` to exist? With the mods in place above, is it possible for addr_hash() to return a bad index?) * Always use HASH_TABLE_BITS in addr_hash() * Only use check_hash() in DEBUG_RUNTIME builds 29 September 2021, 23:08:38 UTC
4307645 [Hexagon] Remove qurt_init_fini (#6271) Including qurt_init_fini generates the below error: dlopenbuf failed: undefined symbol #140 __DTOR_LIST__ Including qurt_init_fini was needed when the pipeline was loaded using mmap. This is not needed now. Co-authored-by: Ankit Aggarwal <aankit@quicinc.com> 29 September 2021, 16:56:34 UTC
e836ea6 CMake: install docs into halide subdirectory of doc dir (#6267) 28 September 2021, 08:36:38 UTC
8fbc788 [hannk] Add Make target to rebuild just the Halide-generated code. (#6265) * [hannk] Add Make target to rebuild just the Halide-generated code. Also, drive-by comment fix about disabling Fortran (!) when building for Android. * More changes 27 September 2021, 17:12:25 UTC
ebb9f19 Usage of C++ `<thread>` header requires linking to threading library (#6257) * Usage of C++ `<thread>` header requires linking to threading library `Generator.cpp` and `ThreadPool.h` both `#include <thread>`, but don't link to the threading implementation. This fixes build for me on debian sid, which is failing otherwise with: ``` $ ninja [ 0% 2/1480][ 0% 0:00:00 + 0:33:06] Linking CXX executable src/autoschedulers/adams2019/get_host_target FAILED: src/autoschedulers/adams2019/get_host_target : && /usr/local/bin/clang++ -pipe -O3 -DNDEBUG src/autoschedulers/adams2019/CMakeFiles/get_host_target.dir/get_host_target.cpp.o -o src/autoschedulers/adams2019/get_host_target -Wl,-rpath,/repositories/halide/build/src: src/libHalide.so.13.0.0 && : ld: error: /usr/lib/gcc/x86_64-linux-gnu/11/libstdc++.so: undefined reference to pthread_create [--no-allow-shlib-undefined] clang: error: linker command failed with exit code 1 (use -v to see invocation) [ 0% 5/1480][ 0% 0:00:00 + 0:12:47] Linking CXX executable src/autoschedulers/adams2019/test_apps_autoscheduler FAILED: src/autoschedulers/adams2019/test_apps_autoscheduler : && /usr/local/bin/clang++ -pipe -O3 -DNDEBUG src/autoschedulers/adams2019/CMakeFiles/test_apps_autoscheduler.dir/test.cpp.o -o src/autoschedulers/adams2019/test_apps_autoscheduler -Wl,-rpath,/repositories/halide/build/src src/libHalide.so.13.0.0 -ldl && : ld: error: /usr/lib/gcc/x86_64-linux-gnu/11/libstdc++.so: undefined reference to pthread_create [--no-allow-shlib-undefined] clang: error: linker command failed with exit code 1 (use -v to see invocation) [ 0% 7/1480][ 0% 0:00:00 + 0:08:53] Linking CXX executable src/autoschedulers/adams2019/test_function_dag FAILED: src/autoschedulers/adams2019/test_function_dag : && /usr/local/bin/clang++ -pipe -O3 -DNDEBUG src/autoschedulers/adams2019/CMakeFiles/test_function_dag.dir/test_function_dag.cpp.o src/autoschedulers/adams2019/CMakeFiles/test_function_dag.dir/FunctionDAG.cpp.o src/autoschedulers/adams2019/CMakeFiles/test_function_dag.dir/ASLog.cpp.o -o src/autoschedulers/adams2019/test_function_dag -Wl,-rpath,/repositories/halide/build/src src/libHalide.so.13.0.0 && : ld: error: /usr/lib/gcc/x86_64-linux-gnu/11/libstdc++.so: undefined reference to pthread_create [--no-allow-shlib-undefined] clang: error: linker command failed with exit code 1 (use -v to see invocation) [ 0% 9/1480][ 0% 0:00:00 + 0:06:41] Linking CXX executable src/autoschedulers/li2018/gradient_autoscheduler_test_cpp FAILED: src/autoschedulers/li2018/gradient_autoscheduler_test_cpp : && /usr/local/bin/clang++ -pipe -O3 -DNDEBUG src/autoschedulers/li2018/CMakeFiles/gradient_autoscheduler_test_cpp.dir/test.cpp.o -o src/autoschedulers/li2018/gradient_autoscheduler_test_cpp -Wl,-rpath,/repositories/halide/build/src src/libHalide.so.13.0.0 && : ld: error: /usr/lib/gcc/x86_64-linux-gnu/11/libstdc++.so: undefined reference to pthread_create [--no-allow-shlib-undefined] clang: error: linker command failed with exit code 1 (use -v to see invocation) [ 2% 35/1480][ 0% 0:00:00 + 0:05:09] Generating included_schedule_file.runtime.o ninja: build stopped: subcommand failed. ``` * Dommy commit to retrigger bots 24 September 2021, 16:13:58 UTC
256d57a Update halide_iwyu_mapping.imp 20 September 2021, 22:52:05 UTC
87087b0 Update halide_iwyu_mapping.imp 20 September 2021, 21:59:48 UTC
c80473c clang-tidy 20 September 2021, 21:37:05 UTC
7ee377b fixes 20 September 2021, 21:28:16 UTC
5ac40b1 Update halide_iwyu_mapping.imp 20 September 2021, 20:42:43 UTC
ff2b2dc Update Util.cpp 20 September 2021, 20:39:43 UTC
9a6efc7 Update halide_iwyu_mapping.imp 20 September 2021, 20:34:31 UTC
7ab0e07 Create helper script to run IWYU on Halide 20 September 2021, 20:21:43 UTC
11ec1dc Remove remaining bits of LLVM10 support (#6245) Update various parts of the source that still assume LLVM10 is supported. (We probably need to update the buildbots as well before this lands.) 17 September 2021, 21:14:21 UTC
95198a7 Remove torch/extension dependency and fix PyTorch example, adding the correct (GPU) autoscheduler (#6234) * fix static op warning * add backward * removes "torch/extension.h" dependency * suppress grad check warnings * simplifies CUDA PyTorch bindings, remove HL_PT_CUDA define * fix gradient for scalar parameter * fix style issues * Update CodeGen_PyTorch.cpp 15 September 2021, 21:47:31 UTC
2d0ac72 Modify hexagon_remote/Makefile to allow defining C++ #defines on the command line (#6243) * Modify hexagon_remote/Makefile to allow defining C++ #defines on the command line * Update Makefile 14 September 2021, 23:41:33 UTC
fdcf140 Improve the code quality of scalarized code (#6218) * Improve the quality of scalarized predicated loads. * Add assert * clang-format * We can't assume this assert won't be hit. * Fix implementation of single lane shuffles in deinterleave. 14 September 2021, 20:27:54 UTC
10e4e8e Fall through to LLVM for unknown shuffles on Hexagon (#6237) * Fall through to LLVM for unknown shuffles on Hexagon. * Don't fall back to LLVM, use vdelta instead. * Fix vdelta patterns 13 September 2021, 18:27:59 UTC
d80e5d8 Join generator watchdog timer if exception thrown (#6240) Make sure to join the generator watchdog timer when an exception is thrown. Otherwise the error message gets swallowed entirely and you just get "terminate called without an active exception" Also rejiggered the watchdog to do a timed wait on a condition variable instead of polling every 100ms. 10 September 2021, 19:31:17 UTC
b78b205 Upgrade clang-format and clang-tidy to LLVM-12 (#6233) 02 September 2021, 17:58:52 UTC
24d6bd6 Hoist unrolled prefetches to top of the block (#6230) * Hoist unrolled prefetches to top of the block When a loop with prefetch is unrolled, the prefetch instructions getting scattered through the loop can cause LLVM codegen issues in some cases (see https://bugs.llvm.org/show_bug.cgi?id=51172). As a partial mitigation for that issue, this PR adds a pass to hoist all prefetch instructions to the top of their loop. This is still a bit experimental; it definitely addresses the codegen issues we see, but makes the use of prefetch potentially less effective (since the hoisted prefetch may be too far from the eventual use to be effective). * appease clang-tidy * Avoid quadratic behavior * Use template instead of std::function * Require prefetch offset to be pure 29 August 2021, 20:02:34 UTC
085e11e Rename inner version of bounds_of_expr_in_scope (#6232) It's not in the explicit namespace that it's requested in (Halide::Internal), so turning on that debugging code results in compile failures. I just gave it a different name to disambiguate. 27 August 2021, 21:00:14 UTC
c860cab Add modernize-make-shared and modernize-make-unique to .clang-tidy and fix warnings (#6222) * Add modernize-make-shared and modernize-make-unique to .clang-tidy and fix warnings * std::initializer_list instead of std::vector * Update Pipeline.h 25 August 2021, 17:36:14 UTC
f43f016 More prefetch fixes (#6226) * More prefetch fixes - Arguments to Call::prefetch() must be scalars, not vectors - Add more testcases to correctness_prefetch Addresses more of #6219 (but still not the title issue, i.e. ignoring offset) * Fix horrific bug * Have CodeGen_C emit the same arguments for __builtin_prefetch() as the runtime module * Minor cleanup * Explicitly pass target thru * Fix correctness_prefetch for host-hvx * Add comments 24 August 2021, 22:25:57 UTC
d507b9a Fix bug in prefetch() (#6225) In #6155, we incorrectly assume that we can qualify the 'from' prefetch var by just adding 'prefix'; this isn't true if (e.g.) there are any splits involved. Instead, we need to walk through the active loops to find a suitable match. In addition, if no match is found, we now fail with an error (rather than quietly doing something undefined), as the 'from' var is required to be from an active loop. (Addresses some-but-not-all of #6219) 23 August 2021, 21:46:26 UTC
30040cd Prefetch cleanup (#6220) * Use std::move where appropriate * Prefetch cleanup This is (mostly) a cleanup pass to make the flow of Prefetch injection & lowering more obvious to the reader of the code (via commenting and minor code restructuring). Notable exception: the HVX backend processing of Call::prefetch (and relevant runtime code) was refactored to make it (IMHO) less janky. (Also some drive-by insertions of std::move where appropriate) 23 August 2021, 16:31:16 UTC
7c437e4 fix #6207 (#6214) Co-authored-by: Steven Johnson <srj@google.com> 22 August 2021, 20:27:39 UTC
7aafbb9 Fix wasm simd issues (#6217) - f64x2.convert_low_i32x4_s/u are now generating proerly at top-of-tree, so re-enable them - f64x2.promote_low_f32x4 is temporarily broken for larger vector widths, so disable it for now (issue is reported and fix is underway) Also, driveby change to .gitignore. 20 August 2021, 22:53:06 UTC
ed5e1e1 Use C++17 structured binding instead of std::tie (#6213) * Use C++17 structured binding instead of std::tie * appease clang-tidy 20 August 2021, 22:25:50 UTC
7079ff2 Fix for upcoming LLVM API change (#6212) 20 August 2021, 19:11:41 UTC
06e8865 Fix issues with predicated interleaving stores on Hexagon (#6211) * Fix issues with predicated interleaving stores on Hexagon. * Fix buffer API usage issue. * Add default device API support for Hexagon. * More DeviceAPI support 20 August 2021, 02:11:04 UTC
c61a930 Fix unroll failures from adams2019 when the Expr depends on estimates (#6200) * track depends_on_estimate in BoundsInfo - fix bounds_are_constant 19 August 2021, 17:06:04 UTC
d653a73 Add IRMutator::mutate_exprs() (#6203) * Add IRMutator::mutate_exprs() There's a common pattern in many IRMutators that is "mutate a vector<Expr> and optionally let me know if anything is different". (Note, this uses C++17 structured-binding syntax, which we previously weren't using in Halide. Objections?) This adds a shared utility method (well, two, thanks to VariadicVisitor) and plus in the usage in all the places that seemed obvious. I doubt this moves the needle on speed in either direction, but makes for smaller code. * Silence warnings * Update Inline.cpp * Update ParallelRVar.cpp * Update SplitTuples.cpp * Update StorageFlattening.cpp * Revisions 17 August 2021, 21:15:38 UTC
d811a3f More augmentation of debugging code (#6185) * More augmentation of debugging code This expands on #6182 by added tracking for BoxesTouched, and integrating the nesting levels with the previous code. This allows a more complete vision of what's happening during bounds calculation. * Minor fixes * Unexpose indent 16 August 2021, 20:34:58 UTC
72284a2 unsafe_promise_clamped() should be pure (#6199) As discussed in https://github.com/halide/Halide/pull/6189, this intrinsic should probably be Pure. 13 August 2021, 19:12:07 UTC
a081660 Add information to comment on ```align_loads```. (#6196) * Add information to comment. * Wording improvement. 12 August 2021, 21:06:38 UTC
3b7e1ba [hannk] Remove alignment requirements for shallow DepthwiseConv ops (#6198) * [hannk] Remove alignment requirements for shallow DepthwiseConv ops * Update depthwise_conv_generator.cpp 12 August 2021, 20:55:14 UTC
6229afa Upgrade apps/hannk to TFLite 2.6 (#6197) * Upgrade apps/hannk to TFLite 2.6 * Remove scalpel left in patient 12 August 2021, 20:15:28 UTC
69075b4 Internal::promise_clamped() should be pure (Fixes #6186) (#6189) * ApplySplit should use pure promise_clamped() (Fixes #6186) * Make all promise_clamped calls pure * pure_promise_clamped -> promise_clamped 12 August 2021, 19:51:15 UTC
2394250 [Hexagon] Do not pattern match inside if_then_else block (#6194) * Do not pattern match inside if_then_else block Resolves the compilation below compilation error while generating hannk::upsample_channels_uint8 from hannk/depthwise_conv.generator: Unknown intrinsic dynamic_shuffle The problem occurs when we pattern match hvx instrinsics inside if_then_else nodes and try to scalarize them later. In the patch we prevent matching these intrinsics inside if_then_else blocks. * Do not match for only vector types * pattern match for scalars and scalar-broadcasts Co-authored-by: Ankit Aggarwal <aankit@quicinc.com> 12 August 2021, 17:06:00 UTC
43b412b Update WABT version to latest release (1.0.24) (#6193) 12 August 2021, 17:02:52 UTC
b7fa882 Fix unused-variable warning-as-error (#6192) The latest Emscripten compilers will complain about this. 11 August 2021, 21:28:10 UTC
67802cf Add memmove to WasmExecutor callbacks (#6191) Some not-yet-landed variants of the wasm toolchain+runtime environments need this. 11 August 2021, 18:53:36 UTC
e8b5837 Add a watchdog timer to Generator (#6184) * Add a watchdog timer to Generator In degenerate conditions (eg, bugs in Halide or LLVM, or pathological user code), running a Generator can take arbitrarily long times (we recently found some buildbots that had Generators that had been running for several days). This adds a simple background thread to generator_main() to ensure that compilations don't take unreasonable lengths of time. It defaults to 15 minutes of wall-clock time, but can be customized by the -t flag. * Update Generator.cpp 11 August 2021, 17:33:13 UTC
fb44637 Augment debugging code (#6182) * Augment debugging code I upgraded some debugging code in AddImageChecks and Bounds while tracking down a bug, and I think the upgrades are worth keeping for future use. * clang-format * Minor changes per comments * clang-format 10 August 2021, 22:26:10 UTC
d249fa0 Update tutorial todos (#6161) This is based on our discussion in the dev meeting. Feel free to suggest changes. 06 August 2021, 19:24:14 UTC
451cfa8 Add argv and metadata support to C++ backend (Issue #2071) (#6179) * Add argv and metadata support to C++ backend (Issue #2071) * legalize_name-> c_print_name * Fix user_context handling 06 August 2021, 15:58:33 UTC
2e229f5 Rewrite Rules Evaluation Project - Merging Relevant Synthesized Rewrite Rules (#6174) Conducted experiments to analyze the performance effects of adding 4000+ synthesized rewrite rules to Halide. Narrowed down the rules to 11 rewrite rules whose associative & commutative variants are added in this PR. With these rewrite rules, Halide achieves >10% peak memory reductions in 192 cases in apps including camera_pipe, harris, nl_means, and stencil_chain, which is similar to the results (with all 4000+ rules) from this paper - https://dl.acm.org/doi/pdf/10.1145/3428234 04 August 2021, 04:54:30 UTC
8b26454 Add more fine-grained prefetch() directive (Issue #3735) (#6155) Add more fine-grained prefetch() directive (Issue #3735) 03 August 2021, 20:06:53 UTC
4f8629c Fix broken wasm-simd extmul instructions due to changes from https://reviews.llvm.org/D106724 (#6177) 03 August 2021, 00:49:54 UTC
0a09bfb Fix for trunk LLVM (#6176) * Fix for trunk LLVM * More Fixes 02 August 2021, 21:14:12 UTC
e52d6ca Fix Xcode issue that requires at least one source file when building a library from objects. (#6175) * Fix Xcode issue that requires at least one source file when building a library from objects. Fixes #6167 * add newline to end of file 31 July 2021, 04:43:06 UTC
a7e8c43 Partial revert of 8f849ae6514e83f8bf94d05e452a467df352f74c (only (#6173) reverting halide_remote.cpp). 29 July 2021, 15:53:11 UTC
36f6b8c Use generic build command instead of make. Fixes #6163 (#6169) 28 July 2021, 17:53:33 UTC
2b8ec44 Remove deprecated realize() Python wrapprs (#6162) The C++ versions were removed in #6122, but the Python equivalents were overlooked. 27 July 2021, 14:57:41 UTC
a5585cb Add various bounds-related simplifier rules (#6160) * add simplifier rules 27 July 2021, 02:07:40 UTC
2ab9a56 De-predicate loads and stores in Metal/OpenCL/D3D12 backend (#6158) * Depredicate loads and stores in Metal backend * Fix typo. * Mark override, add additional using * float_t -> float * Update CMakeLists.txt * clang-format * Also scalarize in D3D12 and OpenCL * use const_true() helper 24 July 2021, 13:55:32 UTC
b68393c [hannk] Add a --csv flag to compare_vs_tflite (#6149) * [hannk] Add optional taskset support to the run_on_device scripts * [hannk] Add a --csv flag to compare_vs_tflite This lets us output results in CSV format for easy copy/paste into (eg) spreadsheets. 21 July 2021, 23:08:12 UTC
025a9b9 Handle depth_multiplier != 1 in a separate op (#6154) * Implement depth_multiplier != 1 in a separate op. * Fix build on GCC * Remove stale comment * clang-format * Add more comments to inv_depth_multiplier 21 July 2021, 22:11:55 UTC
9d7284b Move quantization to a helper function depending on the target (#6150) * Move quantization + relu to a helper function depending on the target. * clang-format * x86 has these too actually * Fix typo 20 July 2021, 20:50:14 UTC
5ca8cdf Generalize Conv2D to be a Conv of any dimensionality (#6146) * Generalize Conv2D to be a Conv of any dimensionality. * clang-format 20 July 2021, 16:43:26 UTC
5812f33 Configurable minimum size for alignment in align_loads (#6143) Co-authored-by: Steven Johnson <srj@google.com> 20 July 2021, 15:45:49 UTC
b457d3c Add support for int16 output in Conv2D (#6145) This allows us to convert all (currently supported) FC ops into Conv2D ops. Remove all the FC-specific Halide and Op code. 20 July 2021, 02:16:25 UTC
9d1e1e3 [hannk] Rewrite FC in terms of Conv2D (#6144) * [hannk] Rewrite FC in terms of Conv2D FullyConnected is very similar to Conv2D, so rather than maintaining multiple similar implementations, let's translate a FullyConnected node into a Conv2D node (with some Reshape nodes as necessary). Note that we keep the old FC logic for int16 outputs, as Conv2D doesn't support those yet; if this PR is landed, a followup PR will add that ability to Conv2D, and the existing FC support will be removed entirely. 20 July 2021, 00:33:52 UTC
bd7ebf5 Fix for top-of-tree LLVM (#6142) * Fix for top-of-tree LLVM 19 July 2021, 19:27:44 UTC
557c8e4 Fix Hexagon vrmpy with 16-bit results (#4248) (#6137) * Fix #4248 * clang-format 16 July 2021, 17:56:05 UTC
back to top