44f174c | Steven Johnson | 23 October 2020, 19:45:29 UTC | foo | 23 October 2020, 19:45:29 UTC |
bfe89ba | Steven Johnson | 23 October 2020, 18:32:13 UTC | Update CMakeLists.txt | 23 October 2020, 18:32:13 UTC |
32a72d1 | Steven Johnson | 23 October 2020, 18:27:46 UTC | wef | 23 October 2020, 18:27:46 UTC |
5d6f814 | Steven Johnson | 23 October 2020, 18:20:42 UTC | Update WasmExecutor.cpp | 23 October 2020, 18:20:42 UTC |
0aec45e | Steven Johnson | 23 October 2020, 18:14:28 UTC | Update LLVM_Runtime_Linker.cpp | 23 October 2020, 18:14:28 UTC |
6b3797d | Steven Johnson | 23 October 2020, 18:09:58 UTC | Enable some tests | 23 October 2020, 18:09:58 UTC |
080fd13 | Steven Johnson | 23 October 2020, 18:02:32 UTC | wip | 23 October 2020, 18:02:32 UTC |
7142fa7 | Mark Glines | 22 October 2020, 16:34:13 UTC | Add Stage.gpu_lanes to Python bindings. | 23 October 2020, 17:19:50 UTC |
fef108b | Steven Johnson | 23 October 2020, 16:56:28 UTC | Upgrade WABT version to 1.0.19 | 23 October 2020, 16:56:44 UTC |
2d1aee9 | Zalman Stern | 22 October 2020, 20:39:39 UTC | Add a test case for multiple argument memoize_tag to demonstrate usage. (#5393) * Add a test case for multiple argument memoize_tag usage to demonstrate how it works. * Fix formatting typo. | 22 October 2020, 20:39:39 UTC |
475c7a0 | Dillon Sharlet | 21 October 2020, 21:52:17 UTC | Use locally declared type. | 22 October 2020, 16:35:54 UTC |
4f072bc | Dillon Sharlet | 21 October 2020, 21:34:49 UTC | Fix bounds resulting in vector types. | 22 October 2020, 16:35:54 UTC |
75077ed | Steven Johnson | 21 October 2020, 22:33:14 UTC | Add missing quotes in run-clang-format.sh | 21 October 2020, 23:16:52 UTC |
0cc8c30 | Steven Johnson | 21 October 2020, 18:35:14 UTC | Tickle Buildbots | 21 October 2020, 22:34:15 UTC |
8d1784f | Steven Johnson | 20 October 2020, 21:31:44 UTC | Change NULL -> nullptr enable the modernize-use-nullptr check in clang-tidy and fix all complaints wer | 21 October 2020, 22:34:15 UTC |
31f1937 | Pranav Bhandarkar | 21 October 2020, 20:36:54 UTC | Merge pull request #5365 from halide/pdb_remove_hvx_v64 Issue #3925 : Remove hvx_64 | 21 October 2020, 20:36:54 UTC |
d94e7a7 | Steven Johnson | 21 October 2020, 16:44:25 UTC | Update CodeGen_Hexagon.cpp | 21 October 2020, 16:44:25 UTC |
e520503 | Steven Johnson | 21 October 2020, 16:33:44 UTC | Merge branch 'master' into pdb_remove_hvx_v64 | 21 October 2020, 16:33:44 UTC |
fc959e7 | Steven Johnson | 21 October 2020, 16:26:49 UTC | Merge pull request #5382 from halide/srj/readability Enable the useful readability-* checks in clang-tidy | 21 October 2020, 16:26:49 UTC |
ce2f41d | Steven Johnson | 21 October 2020, 16:25:54 UTC | Merge pull request #5384 from dragly/dragly/python-negate-operator Add `logical_not` function for Python | 21 October 2020, 16:25:54 UTC |
235abe4 | Steven Johnson | 21 October 2020, 01:03:30 UTC | Tickle Buildbots | 21 October 2020, 16:24:22 UTC |
acbc69a | Steven Johnson | 20 October 2020, 20:47:06 UTC | Enable modernize-use-equals-default/delete in clang-tidy | 21 October 2020, 16:24:22 UTC |
61792d8 | Svenn-Arne Dragly | 20 October 2020, 22:06:58 UTC | Add logical_not function for Python This change introduces `logical_not` as a free function and member function that calls `operator!`. The reason why a new function is added is because there is no `operator!` in Python and the `not` keyword cannot be overloaded. Hence, there was currently no way to call the C++ `operator!` in Python. | 21 October 2020, 08:35:48 UTC |
e2820e2 | Steven Johnson | 20 October 2020, 21:11:46 UTC | Enable the useful readability-* checks in clang-tidy | 20 October 2020, 21:11:46 UTC |
b2c9769 | Pranav Bhandarkar | 20 October 2020, 20:57:09 UTC | Merge branch 'master' into pdb_remove_hvx_v64 | 20 October 2020, 20:57:09 UTC |
00f50a1 | Steven Johnson | 20 October 2020, 20:18:09 UTC | Merge pull request #5379 from halide/srj/mod2 Enable clang-tidy's modernize-use-default-member-init check | 20 October 2020, 20:18:09 UTC |
c2ed326 | Steven Johnson | 20 October 2020, 20:17:53 UTC | Enable clang-tidy's modernize-use-default-member-init check | 20 October 2020, 20:17:53 UTC |
c2c35b3 | Pranav Bhandarkar | 20 October 2020, 20:08:13 UTC | remove hvx_64 from Halide/Makefile | 20 October 2020, 20:08:13 UTC |
b5db7fd | Steven Johnson | 20 October 2020, 18:53:44 UTC | Merge pull request #5381 from halide/srj/perfchecks Enable interesting performance-* clang-tidy checks | 20 October 2020, 18:53:44 UTC |
83d52ab | Steven Johnson | 20 October 2020, 18:44:03 UTC | Enable interesting performance-* clang-tidy checks | 20 October 2020, 18:44:03 UTC |
8221d6c | Steven Johnson | 20 October 2020, 18:39:01 UTC | Merge pull request #5378 from halide/srj/misc Enable the interesting misc-* clang-tidy checks | 20 October 2020, 18:39:01 UTC |
8f3ecb4 | Steven Johnson | 20 October 2020, 18:38:45 UTC | Enable the interesting misc-* clang-tidy checks | 20 October 2020, 18:38:45 UTC |
1e8505e | Steven Johnson | 20 October 2020, 18:25:01 UTC | Merge pull request #5377 from halide/srj/modernize Enable clang-tidy's modernize-deprecated-headers check and apply fixes. | 20 October 2020, 18:25:01 UTC |
a3ef417 | Steven Johnson | 20 October 2020, 18:18:15 UTC | Enable clang-tidy's modernize-deprecated-headers check and apply fixes. | 20 October 2020, 18:18:15 UTC |
5e91d6f | Steven Johnson | 20 October 2020, 16:26:26 UTC | clang-format | 20 October 2020, 16:26:26 UTC |
7fdd42c | Steven Johnson | 20 October 2020, 16:26:14 UTC | clang-format | 20 October 2020, 16:26:14 UTC |
00ae979 | Steven Johnson | 20 October 2020, 16:16:20 UTC | Merge branch 'master' into pdb_remove_hvx_v64 | 20 October 2020, 16:16:20 UTC |
a2934d4 | Steven Johnson | 19 October 2020, 22:24:40 UTC | Update d3d12compute.cpp | 20 October 2020, 16:03:17 UTC |
f7e77e2 | Steven Johnson | 19 October 2020, 22:17:17 UTC | Extend clang-tidy checks to src/runtime (and fix resulting errors) | 20 October 2020, 16:03:17 UTC |
a9e3941 | Dillon Sharlet | 20 October 2020, 06:45:49 UTC | Merge pull request #5372 from halide/simplify-vectorreduce Add simplification rules for vectorreduce of broadcasts | 20 October 2020, 06:45:49 UTC |
0ca44db | Steven Johnson | 19 October 2020, 22:12:09 UTC | Merge pull request #5358 from halide/srj/tidy-all Extend clang-tidy checks into tools, utils, and python_bindings | 19 October 2020, 22:12:09 UTC |
85f143c | Pranav Bhandarkar | 19 October 2020, 21:27:54 UTC | Address review comments | 19 October 2020, 21:27:54 UTC |
14dd26a | Steven Johnson | 19 October 2020, 20:58:12 UTC | Extend clang-tidy checks into tools, utils, and python_bindings | 19 October 2020, 20:58:12 UTC |
af57921 | Steven Johnson | 19 October 2020, 20:21:55 UTC | Drop support for LLVM9 (#5121) Drop support for LLVM9 | 19 October 2020, 20:21:55 UTC |
7a68888 | Andrew Adams | 19 October 2020, 18:06:36 UTC | Makefile tweaks to work on ubuntu | 19 October 2020, 19:58:25 UTC |
99f01a8 | Steven Johnson | 19 October 2020, 18:00:06 UTC | Update Makefile | 19 October 2020, 19:58:25 UTC |
bc615b0 | Steven Johnson | 19 October 2020, 17:21:45 UTC | Update run-clang-format.sh | 19 October 2020, 19:58:25 UTC |
a9975c8 | Steven Johnson | 16 October 2020, 18:20:27 UTC | Update Makefile | 19 October 2020, 19:58:25 UTC |
dc5e171 | Steven Johnson | 16 October 2020, 18:15:38 UTC | Update Makefile | 19 October 2020, 19:58:25 UTC |
f3c47ae | Steven Johnson | 14 October 2020, 22:55:16 UTC | Fix LLVM_DIR value | 19 October 2020, 19:58:25 UTC |
6a8e292 | Steven Johnson | 12 October 2020, 18:51:05 UTC | Move clang-tidy logic into script | 19 October 2020, 19:58:25 UTC |
1a17dbe | Steven Johnson | 12 October 2020, 17:17:58 UTC | Move the clang-format logic into a shell script This puts the truth for our clang-format logic into a shell script rather than the Makefile, in hopes of making it slightly easier for CMake users to use. | 19 October 2020, 19:58:25 UTC |
d8dac07 | Dillon Sharlet | 19 October 2020, 19:56:01 UTC | Merge pull request #5370 from halide/likely-if Make loop partitioning a bit more robust for if statements | 19 October 2020, 19:56:01 UTC |
d049a83 | Dillon Sharlet | 19 October 2020, 18:40:50 UTC | Merge branch 'master' of https://github.com/halide/Halide into simplify-vectorreduce | 19 October 2020, 18:40:50 UTC |
8e3262b | John Laxson | 19 October 2020, 17:04:13 UTC | OpenCL Texture Support (#5297) Add OpenCL Texture Support (https://github.com/halide/Halide/pull/5297) | 19 October 2020, 17:04:13 UTC |
e93f81a | Dillon Sharlet | 19 October 2020, 06:10:21 UTC | Use has_uncaptured_likely_tag instead. | 19 October 2020, 06:10:21 UTC |
e164866 | Dillon Sharlet | 19 October 2020, 06:03:49 UTC | Add simplifications for vectorreduce of broadcasts. | 19 October 2020, 06:03:49 UTC |
a1d0201 | Dillon Sharlet | 17 October 2020, 05:38:12 UTC | Fix likely for if when the likely is not the outermost expression. | 17 October 2020, 05:38:12 UTC |
2cde234 | Pranav Bhandarkar | 16 October 2020, 19:43:57 UTC | prefer using HVX over HVX_128 | 16 October 2020, 19:43:57 UTC |
5286a68 | Steven Johnson | 15 October 2020, 20:57:20 UTC | Fix wasm-related glitches in our timing/benchmarking code | 16 October 2020, 17:40:04 UTC |
16604ae | Pranav Bhandarkar | 16 October 2020, 00:07:35 UTC | Set vector_size to 128. rule out vector sizes that made sense on HVX_64 now that HVX_128 is the only mode for HVX | 16 October 2020, 00:07:35 UTC |
8151b77 | Pranav Bhandarkar | 16 October 2020, 00:04:47 UTC | Check only for Target::HVX | 16 October 2020, 00:04:47 UTC |
04cd8dd | Pranav Bhandarkar | 15 October 2020, 23:17:17 UTC | Remove hvx_64 and hvx to python bindings | 15 October 2020, 23:17:17 UTC |
4d1a4bb | Pranav Bhandarkar | 15 October 2020, 21:42:15 UTC | Fix bad merge of test/correctness/mul_div_mod.cpp | 15 October 2020, 21:42:15 UTC |
13a4eb1 | Pranav Bhandarkar | 15 October 2020, 21:12:26 UTC | Merge branch 'master' into pdb_remove_hvx_v64 | 15 October 2020, 21:12:26 UTC |
09f9eda | Pranav Bhandarkar | 15 October 2020, 21:08:34 UTC | [camera_pipe] - In hvx_128 we need 4 threads to saturate hvx with work | 15 October 2020, 21:08:34 UTC |
9a96a46 | Pranav Bhandarkar | 15 October 2020, 21:03:53 UTC | Clean up some nonsensical code related to hvx in apps | 15 October 2020, 21:03:53 UTC |
609840c | Pranav Bhandarkar | 15 October 2020, 20:58:37 UTC | Look for Target::HVX too, everywhere that we look for Target::HVX_128 | 15 October 2020, 20:58:37 UTC |
3316566 | Pranav Bhandarkar | 15 October 2020, 20:38:51 UTC | Remove all definitions of hvx_64 | 15 October 2020, 20:38:51 UTC |
574b6bb | Andrew Adams | 15 October 2020, 18:48:38 UTC | Merge pull request #5359 from halide/abadams/less_scalarization_of_vectorized_atomics Handle more types of nested vectorized += without scalarizing | 15 October 2020, 18:48:38 UTC |
2554e60 | Pranav Bhandarkar | 15 October 2020, 18:18:05 UTC | fix intrinsic ids | 15 October 2020, 18:18:05 UTC |
98b9074 | Andrew Adams | 14 October 2020, 23:37:33 UTC | Merge remote-tracking branch 'origin/master' into abadams/less_scalarization_of_vectorized_atomics | 14 October 2020, 23:37:33 UTC |
a33b3fc | Pranav Bhandarkar | 14 October 2020, 22:56:41 UTC | remove use of hvx_64 from Target.cpp | 14 October 2020, 22:56:41 UTC |
18b704b | Pranav Bhandarkar | 14 October 2020, 22:54:03 UTC | Remove use of hvx_64 from apps | 14 October 2020, 22:54:03 UTC |
b121015 | Pranav Bhandarkar | 14 October 2020, 22:46:09 UTC | Remove use of hvx_64 from Halide/test | 14 October 2020, 22:46:09 UTC |
6371ef9 | Pranav Bhandarkar | 14 October 2020, 22:31:08 UTC | remove the uses of hvx_64 from Halide/src | 14 October 2020, 22:31:08 UTC |
048999c | Pranav Bhandarkar | 14 October 2020, 22:15:19 UTC | Remove MAKE_ID_PAIR and IdPair | 14 October 2020, 22:15:19 UTC |
2531d11 | Alexander Root | 14 October 2020, 17:24:04 UTC | bounds inference (mod) pessimistic on unsigned 0 interval (#5350) * move conditions to catch unsigned interval >= 1 case first * use is_int_or_uint() | 14 October 2020, 17:24:04 UTC |
f89060b | Andrew Adams | 14 October 2020, 17:07:39 UTC | Restrict last test-case to x86 | 14 October 2020, 17:07:39 UTC |
fb0f632 | Alex Reinking | 14 October 2020, 07:29:32 UTC | Add vcpkg instructions to README.md | 14 October 2020, 15:46:37 UTC |
9f13e60 | Steven Johnson | 13 October 2020, 17:46:42 UTC | C backend must use memcpy for load/store | 14 October 2020, 15:41:24 UTC |
3f06f37 | Steven Johnson | 09 October 2020, 18:38:25 UTC | Update CodeGen_C.cpp | 14 October 2020, 15:41:24 UTC |
2d2b651 | Steven Johnson | 09 October 2020, 18:26:18 UTC | Replace concat() with a union approach. | 14 October 2020, 15:41:24 UTC |
b8d889d | Steven Johnson | 09 October 2020, 17:56:34 UTC | Fix broken concat() | 14 October 2020, 15:41:24 UTC |
d161660 | Steven Johnson | 09 October 2020, 00:04:03 UTC | Update CodeGen_C.cpp | 14 October 2020, 15:41:24 UTC |
7bec6c8 | Steven Johnson | 09 October 2020, 00:03:00 UTC | Fix relops | 14 October 2020, 15:41:24 UTC |
fe88544 | Steven Johnson | 08 October 2020, 20:39:35 UTC | Update CodeGen_C.cpp | 14 October 2020, 15:41:24 UTC |
3b2d424 | Steven Johnson | 07 October 2020, 21:16:32 UTC | Update CodeGen_C.cpp | 14 October 2020, 15:41:24 UTC |
c63a80a | Steven Johnson | 07 October 2020, 19:14:15 UTC | Update CodeGen_C.cpp | 14 October 2020, 15:41:24 UTC |
6e76f2e | Steven Johnson | 07 October 2020, 19:10:24 UTC | Rewrite internal glue code for vectors in C++ backend. The existing approach (wrapping native vectors in a helper struct) caused too many unnecessary spills. Rewrote so that the underlying Native Vector type is the value being passed around, and improved some of the specializations (with checking of generated code on x64 under clang and gcc). Clang is actually quite good at recognizing patterns and generating appropriate code (at least for x64); gcc is much less so, at least as far as I can tell. | 14 October 2020, 15:41:24 UTC |
f97c4be | Steven Johnson | 14 October 2020, 15:35:58 UTC | Fix for trunk LLVM | 14 October 2020, 15:40:21 UTC |
1e841d6 | Dillon Sharlet | 14 October 2020, 04:27:47 UTC | Merge pull request #5229 from aankit-ca/aankit_long_div Add unsigned long division to CodeGen_Internal. | 14 October 2020, 04:27:47 UTC |
474b9e1 | Andrew Adams | 13 October 2020, 23:01:42 UTC | Use a binary reduction tree for outer iterations Also it's better to vectorize at the native width, for both the new and baseline schedules. New inner loop (which does the same number of multiply-adds, but in a 16x4 tile instead of an 8x8 tile): ``` vmovdqu64 (%rcx,%rdx,8), %zmm5 vmovq (%r14,%rdx,8), %xmm6 # xmm6 = mem[0],zero vpermw %zmm5, %zmm0, %zmm7 vpermw %zmm5, %zmm1, %zmm5 vpermd %zmm6, %zmm2, %zmm8 vpermd %zmm8, %zmm3, %zmm8 vpmaddwd %zmm8, %zmm7, %zmm7 vpbroadcastd %xmm6, %zmm6 vpmaddwd %zmm6, %zmm5, %zmm5 vpaddd %zmm5, %zmm4, %zmm4 vpaddd %zmm7, %zmm4, %zmm4 incq %rdx cmpq $32, %rdx ``` | 13 October 2020, 23:01:42 UTC |
21b7492 | Andrew Adams | 13 October 2020, 22:10:29 UTC | Fix print | 13 October 2020, 22:10:29 UTC |
f029301 | Andrew Adams | 13 October 2020, 21:00:39 UTC | Handle more types of nested vectorized += without scalarizing With this change, in a nesting of vectorized vars in an associative/commutative reduction (e.g. +=), we can now have a reduction var outermost and a reduction var innermost and get good codegen. This is still not fully general - there can only be one pure var in the vectorized stack for it to work. In general is_interleaved_ramp should be an is_tensor_contraction pass that knows how to do clever codegen for those. For the following schedule: ``` prod.compute_at(result, x) .vectorize(x) .update() .split(r, ro, ri, 8) .split(ri, rio, rii, 2) .reorder(rii, x, rio, ro) .vectorize(x) .atomic() .vectorize(rio) .vectorize(rii); ``` We get the following IR: ``` let t2262 = (int32x32)vector_reduce(Add, (int32x64(shuffle((int16x15)p10_im_global_wrapper$0[ramp(0, 1, 15)], 0, 1, 1, 2, 2, 3, 3, 4, 4, 5, 5, 6, 6, 7, 7, 8, 2, 3, 3, 4, 4, 5, 5, 6, 6, 7, 7, 8, 8, 9, 9, 10, 4, 5, 5, 6, 6, 7, 7, 8, 8, 9, 9, 10, 10, 11, 11, 12, 6, 7, 7, 8, 8, 9, 9, 10, 10, 11, 11, 12, 12, 13, 13, 14))*int32x64(shuffle((int16x8)p11_im_global_wrapper$0[ramp(0, 1, 8)], 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 2, 3, 2, 3, 2, 3, 2, 3, 2, 3, 2, 3, 2, 3, 2, 3, 4, 5, 4, 5, 4, 5, 4, 5, 4, 5, 4, 5, 4, 5, 4, 5, 6, 7, 6, 7, 6, 7, 6, 7, 6, 7, 6, 7, 6, 7, 6, 7)))) f58[ramp(0, 1, 8)] = slice_vectors(t2262, 8, 1, 8) + (slice_vectors(t2262, 0, 1, 8) + (slice_vectors(t2262, 16, 1, 8) + (slice_vectors(t2262, 24, 1, 8) + f58[ramp(0, 1, 8)]))) ``` The vector_reduce is from rii, and the sum of slice_vectors is from rio. The generated asm (for avx512) is: ``` leal (%rbx,%rcx), %edx movslq %edx, %rdx subq %r14, %rdx vmovdqu (%r15,%rcx,2), %xmm6 vbroadcasti64x4 (%r12,%rdx,2), %zmm7 # zmm7 = mem[0,1,2,3,0,1,2,3] vpermw %zmm7, %zmm0, %zmm8 vpermw %zmm7, %zmm1, %zmm7 vpermd %zmm6, %zmm2, %zmm6 vpermd %zmm6, %zmm3, %zmm9 vpmaddwd %zmm9, %zmm8, %zmm8 vpermd %zmm6, %zmm4, %zmm6 vpmaddwd %zmm6, %zmm7, %zmm6 vextracti64x4 $1, %zmm6, %ymm7 vextracti64x4 $1, %zmm8, %ymm9 vpaddd %ymm5, %ymm8, %ymm5 vpaddd %ymm6, %ymm5, %ymm5 vpaddd %ymm5, %ymm9, %ymm5 vpaddd %ymm7, %ymm5, %ymm5 addq $8, %rcx cmpq $128, %rcx ``` The pmaddwds are from rii, and the vpaddds are from rio. This is 3.5x faster than the best schedule that only vectorizes the pure var, and about 2x faster than the best schedule that only vectorizes an unsplit reduction variable. | 13 October 2020, 21:00:39 UTC |
47a5a44 | Andrew Adams | 12 October 2020, 23:32:57 UTC | min and max in the algorithm was confusing loop partitioning Because if any likely tag at all existed on a side of the min/max, even if captured, the other side wasn't getting mutated. This should only happen for uncaptured likelies, where simplifications in the unlikely path are irrelevant. | 13 October 2020, 16:03:51 UTC |
e8430ea | xndcn | 13 October 2020, 12:03:41 UTC | Append missing newline "\n" for camera_pipe usage. | 13 October 2020, 16:00:06 UTC |
57cac82 | xndcn | 13 October 2020, 10:42:55 UTC | Remove redundant last ")" for HalideTraceViz usage | 13 October 2020, 16:00:06 UTC |
b83cad5 | Ankit Aggarwal | 13 October 2020, 09:15:18 UTC | Add calls to simplify | 13 October 2020, 09:15:18 UTC |
a5d91d5 | Dillon Sharlet | 13 October 2020, 03:40:45 UTC | Merge pull request #5349 from halide/abadams/nested_vectorization_compile_time_regression_fix Fix compile-time regression relating to nested vectorization change | 13 October 2020, 03:40:45 UTC |