Revision history - refs/heads/pdb_remove_hvx_v64 - origin: https://github.com/halide/Halide

visit type:

Revision	Author	Date	Message	Commit Date
d94e7a7	Steven Johnson	21 October 2020, 16:44:25 UTC	Update CodeGen_Hexagon.cpp	21 October 2020, 16:44:25 UTC
e520503	Steven Johnson	21 October 2020, 16:33:44 UTC	Merge branch 'master' into pdb_remove_hvx_v64	21 October 2020, 16:33:44 UTC
fc959e7	Steven Johnson	21 October 2020, 16:26:49 UTC	Merge pull request #5382 from halide/srj/readability Enable the useful readability-* checks in clang-tidy	21 October 2020, 16:26:49 UTC
ce2f41d	Steven Johnson	21 October 2020, 16:25:54 UTC	Merge pull request #5384 from dragly/dragly/python-negate-operator Add `logical_not` function for Python	21 October 2020, 16:25:54 UTC
235abe4	Steven Johnson	21 October 2020, 01:03:30 UTC	Tickle Buildbots	21 October 2020, 16:24:22 UTC
acbc69a	Steven Johnson	20 October 2020, 20:47:06 UTC	Enable modernize-use-equals-default/delete in clang-tidy	21 October 2020, 16:24:22 UTC
61792d8	Svenn-Arne Dragly	20 October 2020, 22:06:58 UTC	Add logical_not function for Python This change introduces `logical_not` as a free function and member function that calls `operator!`. The reason why a new function is added is because there is no `operator!` in Python and the `not` keyword cannot be overloaded. Hence, there was currently no way to call the C++ `operator!` in Python.	21 October 2020, 08:35:48 UTC
e2820e2	Steven Johnson	20 October 2020, 21:11:46 UTC	Enable the useful readability-* checks in clang-tidy	20 October 2020, 21:11:46 UTC
b2c9769	Pranav Bhandarkar	20 October 2020, 20:57:09 UTC	Merge branch 'master' into pdb_remove_hvx_v64	20 October 2020, 20:57:09 UTC
00f50a1	Steven Johnson	20 October 2020, 20:18:09 UTC	Merge pull request #5379 from halide/srj/mod2 Enable clang-tidy's modernize-use-default-member-init check	20 October 2020, 20:18:09 UTC
c2ed326	Steven Johnson	20 October 2020, 20:17:53 UTC	Enable clang-tidy's modernize-use-default-member-init check	20 October 2020, 20:17:53 UTC
c2c35b3	Pranav Bhandarkar	20 October 2020, 20:08:13 UTC	remove hvx_64 from Halide/Makefile	20 October 2020, 20:08:13 UTC
b5db7fd	Steven Johnson	20 October 2020, 18:53:44 UTC	Merge pull request #5381 from halide/srj/perfchecks Enable interesting performance-* clang-tidy checks	20 October 2020, 18:53:44 UTC
83d52ab	Steven Johnson	20 October 2020, 18:44:03 UTC	Enable interesting performance-* clang-tidy checks	20 October 2020, 18:44:03 UTC
8221d6c	Steven Johnson	20 October 2020, 18:39:01 UTC	Merge pull request #5378 from halide/srj/misc Enable the interesting misc-* clang-tidy checks	20 October 2020, 18:39:01 UTC
8f3ecb4	Steven Johnson	20 October 2020, 18:38:45 UTC	Enable the interesting misc-* clang-tidy checks	20 October 2020, 18:38:45 UTC
1e8505e	Steven Johnson	20 October 2020, 18:25:01 UTC	Merge pull request #5377 from halide/srj/modernize Enable clang-tidy's modernize-deprecated-headers check and apply fixes.	20 October 2020, 18:25:01 UTC
a3ef417	Steven Johnson	20 October 2020, 18:18:15 UTC	Enable clang-tidy's modernize-deprecated-headers check and apply fixes.	20 October 2020, 18:18:15 UTC
5e91d6f	Steven Johnson	20 October 2020, 16:26:26 UTC	clang-format	20 October 2020, 16:26:26 UTC
7fdd42c	Steven Johnson	20 October 2020, 16:26:14 UTC	clang-format	20 October 2020, 16:26:14 UTC
00ae979	Steven Johnson	20 October 2020, 16:16:20 UTC	Merge branch 'master' into pdb_remove_hvx_v64	20 October 2020, 16:16:20 UTC
a2934d4	Steven Johnson	19 October 2020, 22:24:40 UTC	Update d3d12compute.cpp	20 October 2020, 16:03:17 UTC
f7e77e2	Steven Johnson	19 October 2020, 22:17:17 UTC	Extend clang-tidy checks to src/runtime (and fix resulting errors)	20 October 2020, 16:03:17 UTC
a9e3941	Dillon Sharlet	20 October 2020, 06:45:49 UTC	Merge pull request #5372 from halide/simplify-vectorreduce Add simplification rules for vectorreduce of broadcasts	20 October 2020, 06:45:49 UTC
0ca44db	Steven Johnson	19 October 2020, 22:12:09 UTC	Merge pull request #5358 from halide/srj/tidy-all Extend clang-tidy checks into tools, utils, and python_bindings	19 October 2020, 22:12:09 UTC
85f143c	Pranav Bhandarkar	19 October 2020, 21:27:54 UTC	Address review comments	19 October 2020, 21:27:54 UTC
14dd26a	Steven Johnson	19 October 2020, 20:58:12 UTC	Extend clang-tidy checks into tools, utils, and python_bindings	19 October 2020, 20:58:12 UTC
af57921	Steven Johnson	19 October 2020, 20:21:55 UTC	Drop support for LLVM9 (#5121) Drop support for LLVM9	19 October 2020, 20:21:55 UTC
7a68888	Andrew Adams	19 October 2020, 18:06:36 UTC	Makefile tweaks to work on ubuntu	19 October 2020, 19:58:25 UTC
99f01a8	Steven Johnson	19 October 2020, 18:00:06 UTC	Update Makefile	19 October 2020, 19:58:25 UTC
bc615b0	Steven Johnson	19 October 2020, 17:21:45 UTC	Update run-clang-format.sh	19 October 2020, 19:58:25 UTC
a9975c8	Steven Johnson	16 October 2020, 18:20:27 UTC	Update Makefile	19 October 2020, 19:58:25 UTC
dc5e171	Steven Johnson	16 October 2020, 18:15:38 UTC	Update Makefile	19 October 2020, 19:58:25 UTC
f3c47ae	Steven Johnson	14 October 2020, 22:55:16 UTC	Fix LLVM_DIR value	19 October 2020, 19:58:25 UTC
6a8e292	Steven Johnson	12 October 2020, 18:51:05 UTC	Move clang-tidy logic into script	19 October 2020, 19:58:25 UTC
1a17dbe	Steven Johnson	12 October 2020, 17:17:58 UTC	Move the clang-format logic into a shell script This puts the truth for our clang-format logic into a shell script rather than the Makefile, in hopes of making it slightly easier for CMake users to use.	19 October 2020, 19:58:25 UTC
d8dac07	Dillon Sharlet	19 October 2020, 19:56:01 UTC	Merge pull request #5370 from halide/likely-if Make loop partitioning a bit more robust for if statements	19 October 2020, 19:56:01 UTC
d049a83	Dillon Sharlet	19 October 2020, 18:40:50 UTC	Merge branch 'master' of https://github.com/halide/Halide into simplify-vectorreduce	19 October 2020, 18:40:50 UTC
8e3262b	John Laxson	19 October 2020, 17:04:13 UTC	OpenCL Texture Support (#5297) Add OpenCL Texture Support (https://github.com/halide/Halide/pull/5297)	19 October 2020, 17:04:13 UTC
e93f81a	Dillon Sharlet	19 October 2020, 06:10:21 UTC	Use has_uncaptured_likely_tag instead.	19 October 2020, 06:10:21 UTC
e164866	Dillon Sharlet	19 October 2020, 06:03:49 UTC	Add simplifications for vectorreduce of broadcasts.	19 October 2020, 06:03:49 UTC
a1d0201	Dillon Sharlet	17 October 2020, 05:38:12 UTC	Fix likely for if when the likely is not the outermost expression.	17 October 2020, 05:38:12 UTC
2cde234	Pranav Bhandarkar	16 October 2020, 19:43:57 UTC	prefer using HVX over HVX_128	16 October 2020, 19:43:57 UTC
5286a68	Steven Johnson	15 October 2020, 20:57:20 UTC	Fix wasm-related glitches in our timing/benchmarking code	16 October 2020, 17:40:04 UTC
16604ae	Pranav Bhandarkar	16 October 2020, 00:07:35 UTC	Set vector_size to 128. rule out vector sizes that made sense on HVX_64 now that HVX_128 is the only mode for HVX	16 October 2020, 00:07:35 UTC
8151b77	Pranav Bhandarkar	16 October 2020, 00:04:47 UTC	Check only for Target::HVX	16 October 2020, 00:04:47 UTC
04cd8dd	Pranav Bhandarkar	15 October 2020, 23:17:17 UTC	Remove hvx_64 and hvx to python bindings	15 October 2020, 23:17:17 UTC
4d1a4bb	Pranav Bhandarkar	15 October 2020, 21:42:15 UTC	Fix bad merge of test/correctness/mul_div_mod.cpp	15 October 2020, 21:42:15 UTC
13a4eb1	Pranav Bhandarkar	15 October 2020, 21:12:26 UTC	Merge branch 'master' into pdb_remove_hvx_v64	15 October 2020, 21:12:26 UTC
09f9eda	Pranav Bhandarkar	15 October 2020, 21:08:34 UTC	[camera_pipe] - In hvx_128 we need 4 threads to saturate hvx with work	15 October 2020, 21:08:34 UTC
9a96a46	Pranav Bhandarkar	15 October 2020, 21:03:53 UTC	Clean up some nonsensical code related to hvx in apps	15 October 2020, 21:03:53 UTC
609840c	Pranav Bhandarkar	15 October 2020, 20:58:37 UTC	Look for Target::HVX too, everywhere that we look for Target::HVX_128	15 October 2020, 20:58:37 UTC
3316566	Pranav Bhandarkar	15 October 2020, 20:38:51 UTC	Remove all definitions of hvx_64	15 October 2020, 20:38:51 UTC
574b6bb	Andrew Adams	15 October 2020, 18:48:38 UTC	Merge pull request #5359 from halide/abadams/less_scalarization_of_vectorized_atomics Handle more types of nested vectorized += without scalarizing	15 October 2020, 18:48:38 UTC
2554e60	Pranav Bhandarkar	15 October 2020, 18:18:05 UTC	fix intrinsic ids	15 October 2020, 18:18:05 UTC
98b9074	Andrew Adams	14 October 2020, 23:37:33 UTC	Merge remote-tracking branch 'origin/master' into abadams/less_scalarization_of_vectorized_atomics	14 October 2020, 23:37:33 UTC
a33b3fc	Pranav Bhandarkar	14 October 2020, 22:56:41 UTC	remove use of hvx_64 from Target.cpp	14 October 2020, 22:56:41 UTC
18b704b	Pranav Bhandarkar	14 October 2020, 22:54:03 UTC	Remove use of hvx_64 from apps	14 October 2020, 22:54:03 UTC
b121015	Pranav Bhandarkar	14 October 2020, 22:46:09 UTC	Remove use of hvx_64 from Halide/test	14 October 2020, 22:46:09 UTC
6371ef9	Pranav Bhandarkar	14 October 2020, 22:31:08 UTC	remove the uses of hvx_64 from Halide/src	14 October 2020, 22:31:08 UTC
048999c	Pranav Bhandarkar	14 October 2020, 22:15:19 UTC	Remove MAKE_ID_PAIR and IdPair	14 October 2020, 22:15:19 UTC
2531d11	Alexander Root	14 October 2020, 17:24:04 UTC	bounds inference (mod) pessimistic on unsigned 0 interval (#5350) * move conditions to catch unsigned interval >= 1 case first * use is_int_or_uint()	14 October 2020, 17:24:04 UTC
f89060b	Andrew Adams	14 October 2020, 17:07:39 UTC	Restrict last test-case to x86	14 October 2020, 17:07:39 UTC
fb0f632	Alex Reinking	14 October 2020, 07:29:32 UTC	Add vcpkg instructions to README.md	14 October 2020, 15:46:37 UTC
9f13e60	Steven Johnson	13 October 2020, 17:46:42 UTC	C backend must use memcpy for load/store	14 October 2020, 15:41:24 UTC
3f06f37	Steven Johnson	09 October 2020, 18:38:25 UTC	Update CodeGen_C.cpp	14 October 2020, 15:41:24 UTC
2d2b651	Steven Johnson	09 October 2020, 18:26:18 UTC	Replace concat() with a union approach.	14 October 2020, 15:41:24 UTC
b8d889d	Steven Johnson	09 October 2020, 17:56:34 UTC	Fix broken concat()	14 October 2020, 15:41:24 UTC
d161660	Steven Johnson	09 October 2020, 00:04:03 UTC	Update CodeGen_C.cpp	14 October 2020, 15:41:24 UTC
7bec6c8	Steven Johnson	09 October 2020, 00:03:00 UTC	Fix relops	14 October 2020, 15:41:24 UTC
fe88544	Steven Johnson	08 October 2020, 20:39:35 UTC	Update CodeGen_C.cpp	14 October 2020, 15:41:24 UTC
3b2d424	Steven Johnson	07 October 2020, 21:16:32 UTC	Update CodeGen_C.cpp	14 October 2020, 15:41:24 UTC
c63a80a	Steven Johnson	07 October 2020, 19:14:15 UTC	Update CodeGen_C.cpp	14 October 2020, 15:41:24 UTC
6e76f2e	Steven Johnson	07 October 2020, 19:10:24 UTC	Rewrite internal glue code for vectors in C++ backend. The existing approach (wrapping native vectors in a helper struct) caused too many unnecessary spills. Rewrote so that the underlying Native Vector type is the value being passed around, and improved some of the specializations (with checking of generated code on x64 under clang and gcc). Clang is actually quite good at recognizing patterns and generating appropriate code (at least for x64); gcc is much less so, at least as far as I can tell.	14 October 2020, 15:41:24 UTC
f97c4be	Steven Johnson	14 October 2020, 15:35:58 UTC	Fix for trunk LLVM	14 October 2020, 15:40:21 UTC
1e841d6	Dillon Sharlet	14 October 2020, 04:27:47 UTC	Merge pull request #5229 from aankit-ca/aankit_long_div Add unsigned long division to CodeGen_Internal.	14 October 2020, 04:27:47 UTC
474b9e1	Andrew Adams	13 October 2020, 23:01:42 UTC	Use a binary reduction tree for outer iterations Also it's better to vectorize at the native width, for both the new and baseline schedules. New inner loop (which does the same number of multiply-adds, but in a 16x4 tile instead of an 8x8 tile): ``` vmovdqu64 (%rcx,%rdx,8), %zmm5 vmovq (%r14,%rdx,8), %xmm6 # xmm6 = mem[0],zero vpermw %zmm5, %zmm0, %zmm7 vpermw %zmm5, %zmm1, %zmm5 vpermd %zmm6, %zmm2, %zmm8 vpermd %zmm8, %zmm3, %zmm8 vpmaddwd %zmm8, %zmm7, %zmm7 vpbroadcastd %xmm6, %zmm6 vpmaddwd %zmm6, %zmm5, %zmm5 vpaddd %zmm5, %zmm4, %zmm4 vpaddd %zmm7, %zmm4, %zmm4 incq %rdx cmpq $32, %rdx ```	13 October 2020, 23:01:42 UTC
21b7492	Andrew Adams	13 October 2020, 22:10:29 UTC	Fix print	13 October 2020, 22:10:29 UTC
f029301	Andrew Adams	13 October 2020, 21:00:39 UTC	Handle more types of nested vectorized += without scalarizing With this change, in a nesting of vectorized vars in an associative/commutative reduction (e.g. +=), we can now have a reduction var outermost and a reduction var innermost and get good codegen. This is still not fully general - there can only be one pure var in the vectorized stack for it to work. In general is_interleaved_ramp should be an is_tensor_contraction pass that knows how to do clever codegen for those. For the following schedule: ``` prod.compute_at(result, x) .vectorize(x) .update() .split(r, ro, ri, 8) .split(ri, rio, rii, 2) .reorder(rii, x, rio, ro) .vectorize(x) .atomic() .vectorize(rio) .vectorize(rii); ``` We get the following IR: ``` let t2262 = (int32x32)vector_reduce(Add, (int32x64(shuffle((int16x15)p10_im_global_wrapper$0[ramp(0, 1, 15)], 0, 1, 1, 2, 2, 3, 3, 4, 4, 5, 5, 6, 6, 7, 7, 8, 2, 3, 3, 4, 4, 5, 5, 6, 6, 7, 7, 8, 8, 9, 9, 10, 4, 5, 5, 6, 6, 7, 7, 8, 8, 9, 9, 10, 10, 11, 11, 12, 6, 7, 7, 8, 8, 9, 9, 10, 10, 11, 11, 12, 12, 13, 13, 14))*int32x64(shuffle((int16x8)p11_im_global_wrapper$0[ramp(0, 1, 8)], 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 2, 3, 2, 3, 2, 3, 2, 3, 2, 3, 2, 3, 2, 3, 2, 3, 4, 5, 4, 5, 4, 5, 4, 5, 4, 5, 4, 5, 4, 5, 4, 5, 6, 7, 6, 7, 6, 7, 6, 7, 6, 7, 6, 7, 6, 7, 6, 7)))) f58[ramp(0, 1, 8)] = slice_vectors(t2262, 8, 1, 8) + (slice_vectors(t2262, 0, 1, 8) + (slice_vectors(t2262, 16, 1, 8) + (slice_vectors(t2262, 24, 1, 8) + f58[ramp(0, 1, 8)]))) ``` The vector_reduce is from rii, and the sum of slice_vectors is from rio. The generated asm (for avx512) is: ``` leal (%rbx,%rcx), %edx movslq %edx, %rdx subq %r14, %rdx vmovdqu (%r15,%rcx,2), %xmm6 vbroadcasti64x4 (%r12,%rdx,2), %zmm7 # zmm7 = mem[0,1,2,3,0,1,2,3] vpermw %zmm7, %zmm0, %zmm8 vpermw %zmm7, %zmm1, %zmm7 vpermd %zmm6, %zmm2, %zmm6 vpermd %zmm6, %zmm3, %zmm9 vpmaddwd %zmm9, %zmm8, %zmm8 vpermd %zmm6, %zmm4, %zmm6 vpmaddwd %zmm6, %zmm7, %zmm6 vextracti64x4 $1, %zmm6, %ymm7 vextracti64x4 $1, %zmm8, %ymm9 vpaddd %ymm5, %ymm8, %ymm5 vpaddd %ymm6, %ymm5, %ymm5 vpaddd %ymm5, %ymm9, %ymm5 vpaddd %ymm7, %ymm5, %ymm5 addq $8, %rcx cmpq $128, %rcx ``` The pmaddwds are from rii, and the vpaddds are from rio. This is 3.5x faster than the best schedule that only vectorizes the pure var, and about 2x faster than the best schedule that only vectorizes an unsplit reduction variable.	13 October 2020, 21:00:39 UTC
47a5a44	Andrew Adams	12 October 2020, 23:32:57 UTC	min and max in the algorithm was confusing loop partitioning Because if any likely tag at all existed on a side of the min/max, even if captured, the other side wasn't getting mutated. This should only happen for uncaptured likelies, where simplifications in the unlikely path are irrelevant.	13 October 2020, 16:03:51 UTC
e8430ea	xndcn	13 October 2020, 12:03:41 UTC	Append missing newline "\n" for camera_pipe usage.	13 October 2020, 16:00:06 UTC
57cac82	xndcn	13 October 2020, 10:42:55 UTC	Remove redundant last ")" for HalideTraceViz usage	13 October 2020, 16:00:06 UTC
b83cad5	Ankit Aggarwal	13 October 2020, 09:15:18 UTC	Add calls to simplify	13 October 2020, 09:15:18 UTC
a5d91d5	Dillon Sharlet	13 October 2020, 03:40:45 UTC	Merge pull request #5349 from halide/abadams/nested_vectorization_compile_time_regression_fix Fix compile-time regression relating to nested vectorization change	13 October 2020, 03:40:45 UTC
396cf0c	Shoaib Kamil	13 October 2020, 00:04:58 UTC	Remove erroneous sentence in comment about GCD (#5354) Grand Central Dispatch isn't used anymore; this comment is outdated.	13 October 2020, 00:04:58 UTC
13ae382	Andrew Adams	12 October 2020, 23:02:02 UTC	Fix comment for halide_error_code_device_dirty_with_no_device_support (#5352)	12 October 2020, 23:02:02 UTC
facb69d	Andrew Adams	12 October 2020, 20:16:43 UTC	Fix for unbounded lanes	12 October 2020, 20:16:43 UTC
50c61b9	Andrew Adams	12 October 2020, 20:04:10 UTC	Fix compile-time regression relating to nested vectorization change The min_lane expression could grow very large, and required simplifying once per lane	12 October 2020, 20:04:10 UTC
96772ae	Dillon Sharlet	12 October 2020, 18:09:10 UTC	Stronger simplification of fused ramps (#5343) * Improve simplifier for fused ramp expressions. * Add test checking that fused ramps simplify away. * clang-format * Use custom lowering pass instead of file system to check for mod. Co-authored-by: Dillon Sharlet <dsharlet@gmail.com>	12 October 2020, 18:09:10 UTC
cc34401	Andrew Adams	10 October 2020, 00:45:52 UTC	Better simplification and codegen for nested vectorization (#5325) * Improvements to nested vectorization simplification and codegen * Add test for nested vectorization perf * Add detection for transpose-shuffles * more ramp-of-ramp simplification * Better pmaddwd recognition for VectorReduce in x86 backend * Add pmaddwd for avx512	10 October 2020, 00:45:52 UTC
fa54197	Steven Johnson	08 October 2020, 21:23:52 UTC	Remove the ADD_[U]INT64_T_SUFFIX macros Modify codegen for C-like backends to just emit integer constants with the correct suffix for the backend, rather than wrapping in a giant macro; the macro approach worked but lordy was it painful to read.	09 October 2020, 16:24:11 UTC
a1c9d89	Steven Johnson	08 October 2020, 17:39:19 UTC	Fix for trunk LLVM	08 October 2020, 22:50:01 UTC
3db6e2c	Ankit Aggarwal	08 October 2020, 21:10:40 UTC	Add braces around while	08 October 2020, 21:10:40 UTC
a189fd4	Andrew Adams	07 October 2020, 18:36:52 UTC	Merge pull request #5335 from jlaxson/gpu-test-workgroup-size Reduce GPU Tile Size in Tests	07 October 2020, 18:36:52 UTC
018d70e	Ankit Aggarwal	07 October 2020, 06:37:14 UTC	Modify mod_round_to_zero to use remainder from long_div	07 October 2020, 06:38:48 UTC
9d48b63	Ankit Aggarwal	06 October 2020, 19:25:00 UTC	Run clang-format	06 October 2020, 19:25:00 UTC
f99018c	Ankit Aggarwal	06 October 2020, 19:15:18 UTC	Add long div/mod to CodeGen_Internal.	06 October 2020, 19:15:18 UTC
b1fd538	Steven Johnson	05 October 2020, 22:41:32 UTC	Make C++ backend requirement for C++11 explicit Some C++11 features had crept into our C++ backend codegen; make this explicit and check for the correct version at the top of the generated file. (Then remove the stray checks for C++11 version elsewhere.)	06 October 2020, 16:56:23 UTC
d012df7	Andrew Adams	06 October 2020, 16:35:49 UTC	Merge pull request #5331 from rootjalex/master fix bounds inference bug for bounded interval / unbounded interval	06 October 2020, 16:35:49 UTC
2d15069	Andrew Adams	06 October 2020, 16:34:55 UTC	Merge pull request #5334 from halide/abadams/braces_around_statements Require braces around if/while bodies.	06 October 2020, 16:34:55 UTC

Newer
Older