sort by:
Revision Author Date Message Commit Date
32ab4a9 Merge branch 'master' of https://github.com/halide/Halide into interpret_nn 16 October 2020, 21:17:22 UTC
2cde234 prefer using HVX over HVX_128 16 October 2020, 19:43:57 UTC
ba278fe Merge branch 'master' into interpret_nn 16 October 2020, 18:24:43 UTC
6f52482 Add TODO on optimizing sum_input 16 October 2020, 17:57:45 UTC
5286a68 Fix wasm-related glitches in our timing/benchmarking code 16 October 2020, 17:40:04 UTC
6986c8f Fix boundary condition and optimize convolution. 16 October 2020, 16:58:59 UTC
a9be7f0 Add max pool operator. 16 October 2020, 07:26:20 UTC
b9e1ee5 Fix split of AveragePool. 16 October 2020, 06:33:39 UTC
2aa8573 Alphabetize ops. 16 October 2020, 06:06:02 UTC
ce6da60 Rename Add4D -> Add 16 October 2020, 06:02:55 UTC
ce8ca6d Basic optimization and cleanup. 16 October 2020, 06:02:34 UTC
586a1b0 A few more minor cleanups. 16 October 2020, 04:03:15 UTC
28d0979 Minor cleanups. 16 October 2020, 03:51:24 UTC
424b810 Fix bugs in convolution test. 16 October 2020, 02:43:33 UTC
4e72324 Remove unnecessary hexagon complexity. 16 October 2020, 02:24:37 UTC
d2319de Merge branch 'simplify-nested-broadcasts' into interpret_nn 16 October 2020, 01:58:47 UTC
44eadb6 Add simplification of broadcasted ramp comparisons, and vector reductions of broadcasts. 16 October 2020, 01:55:57 UTC
f9364f0 Add convolution_test 16 October 2020, 00:57:38 UTC
16604ae Set vector_size to 128. rule out vector sizes that made sense on HVX_64 now that HVX_128 is the only mode for HVX 16 October 2020, 00:07:35 UTC
8151b77 Check only for Target::HVX 16 October 2020, 00:04:47 UTC
04cd8dd Remove hvx_64 and hvx to python bindings 15 October 2020, 23:17:17 UTC
9299f07 Enable mobilenet_v2_1.0_224_quant.tflite to run. 15 October 2020, 22:10:21 UTC
4d1a4bb Fix bad merge of test/correctness/mul_div_mod.cpp 15 October 2020, 21:42:15 UTC
13a4eb1 Merge branch 'master' into pdb_remove_hvx_v64 15 October 2020, 21:12:26 UTC
09f9eda [camera_pipe] - In hvx_128 we need 4 threads to saturate hvx with work 15 October 2020, 21:08:34 UTC
9a96a46 Clean up some nonsensical code related to hvx in apps 15 October 2020, 21:03:53 UTC
609840c Look for Target::HVX too, everywhere that we look for Target::HVX_128 15 October 2020, 20:58:37 UTC
3316566 Remove all definitions of hvx_64 15 October 2020, 20:38:51 UTC
805dbd4 Update Makefile 15 October 2020, 19:37:20 UTC
9b594ab Merge branch 'interpret_nn' of https://github.com/halide/Halide into interpret_nn 15 October 2020, 19:35:13 UTC
ff1b237 Add Makefile, fix warnings 15 October 2020, 19:34:52 UTC
4a4653c Some comment and naming cleanup. 15 October 2020, 19:14:18 UTC
574b6bb Merge pull request #5359 from halide/abadams/less_scalarization_of_vectorized_atomics Handle more types of nested vectorized += without scalarizing 15 October 2020, 18:48:38 UTC
2554e60 fix intrinsic ids 15 October 2020, 18:18:05 UTC
d113aa6 Initial push 15 October 2020, 01:46:07 UTC
98b9074 Merge remote-tracking branch 'origin/master' into abadams/less_scalarization_of_vectorized_atomics 14 October 2020, 23:37:33 UTC
a33b3fc remove use of hvx_64 from Target.cpp 14 October 2020, 22:56:41 UTC
18b704b Remove use of hvx_64 from apps 14 October 2020, 22:54:03 UTC
b121015 Remove use of hvx_64 from Halide/test 14 October 2020, 22:46:09 UTC
6371ef9 remove the uses of hvx_64 from Halide/src 14 October 2020, 22:31:08 UTC
048999c Remove MAKE_ID_PAIR and IdPair 14 October 2020, 22:15:19 UTC
2531d11 bounds inference (mod) pessimistic on unsigned 0 interval (#5350) * move conditions to catch unsigned interval >= 1 case first * use is_int_or_uint() 14 October 2020, 17:24:04 UTC
f89060b Restrict last test-case to x86 14 October 2020, 17:07:39 UTC
fb0f632 Add vcpkg instructions to README.md 14 October 2020, 15:46:37 UTC
9f13e60 C backend must use memcpy for load/store 14 October 2020, 15:41:24 UTC
3f06f37 Update CodeGen_C.cpp 14 October 2020, 15:41:24 UTC
2d2b651 Replace concat() with a union approach. 14 October 2020, 15:41:24 UTC
b8d889d Fix broken concat() 14 October 2020, 15:41:24 UTC
d161660 Update CodeGen_C.cpp 14 October 2020, 15:41:24 UTC
7bec6c8 Fix relops 14 October 2020, 15:41:24 UTC
fe88544 Update CodeGen_C.cpp 14 October 2020, 15:41:24 UTC
3b2d424 Update CodeGen_C.cpp 14 October 2020, 15:41:24 UTC
c63a80a Update CodeGen_C.cpp 14 October 2020, 15:41:24 UTC
6e76f2e Rewrite internal glue code for vectors in C++ backend. The existing approach (wrapping native vectors in a helper struct) caused too many unnecessary spills. Rewrote so that the underlying Native Vector type is the value being passed around, and improved some of the specializations (with checking of generated code on x64 under clang and gcc). Clang is actually quite good at recognizing patterns and generating appropriate code (at least for x64); gcc is much less so, at least as far as I can tell. 14 October 2020, 15:41:24 UTC
f97c4be Fix for trunk LLVM 14 October 2020, 15:40:21 UTC
1e841d6 Merge pull request #5229 from aankit-ca/aankit_long_div Add unsigned long division to CodeGen_Internal. 14 October 2020, 04:27:47 UTC
474b9e1 Use a binary reduction tree for outer iterations Also it's better to vectorize at the native width, for both the new and baseline schedules. New inner loop (which does the same number of multiply-adds, but in a 16x4 tile instead of an 8x8 tile): ``` vmovdqu64 (%rcx,%rdx,8), %zmm5 vmovq (%r14,%rdx,8), %xmm6 # xmm6 = mem[0],zero vpermw %zmm5, %zmm0, %zmm7 vpermw %zmm5, %zmm1, %zmm5 vpermd %zmm6, %zmm2, %zmm8 vpermd %zmm8, %zmm3, %zmm8 vpmaddwd %zmm8, %zmm7, %zmm7 vpbroadcastd %xmm6, %zmm6 vpmaddwd %zmm6, %zmm5, %zmm5 vpaddd %zmm5, %zmm4, %zmm4 vpaddd %zmm7, %zmm4, %zmm4 incq %rdx cmpq $32, %rdx ``` 13 October 2020, 23:01:42 UTC
21b7492 Fix print 13 October 2020, 22:10:29 UTC
f029301 Handle more types of nested vectorized += without scalarizing With this change, in a nesting of vectorized vars in an associative/commutative reduction (e.g. +=), we can now have a reduction var outermost and a reduction var innermost and get good codegen. This is still not fully general - there can only be one pure var in the vectorized stack for it to work. In general is_interleaved_ramp should be an is_tensor_contraction pass that knows how to do clever codegen for those. For the following schedule: ``` prod.compute_at(result, x) .vectorize(x) .update() .split(r, ro, ri, 8) .split(ri, rio, rii, 2) .reorder(rii, x, rio, ro) .vectorize(x) .atomic() .vectorize(rio) .vectorize(rii); ``` We get the following IR: ``` let t2262 = (int32x32)vector_reduce(Add, (int32x64(shuffle((int16x15)p10_im_global_wrapper$0[ramp(0, 1, 15)], 0, 1, 1, 2, 2, 3, 3, 4, 4, 5, 5, 6, 6, 7, 7, 8, 2, 3, 3, 4, 4, 5, 5, 6, 6, 7, 7, 8, 8, 9, 9, 10, 4, 5, 5, 6, 6, 7, 7, 8, 8, 9, 9, 10, 10, 11, 11, 12, 6, 7, 7, 8, 8, 9, 9, 10, 10, 11, 11, 12, 12, 13, 13, 14))*int32x64(shuffle((int16x8)p11_im_global_wrapper$0[ramp(0, 1, 8)], 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 2, 3, 2, 3, 2, 3, 2, 3, 2, 3, 2, 3, 2, 3, 2, 3, 4, 5, 4, 5, 4, 5, 4, 5, 4, 5, 4, 5, 4, 5, 4, 5, 6, 7, 6, 7, 6, 7, 6, 7, 6, 7, 6, 7, 6, 7, 6, 7)))) f58[ramp(0, 1, 8)] = slice_vectors(t2262, 8, 1, 8) + (slice_vectors(t2262, 0, 1, 8) + (slice_vectors(t2262, 16, 1, 8) + (slice_vectors(t2262, 24, 1, 8) + f58[ramp(0, 1, 8)]))) ``` The vector_reduce is from rii, and the sum of slice_vectors is from rio. The generated asm (for avx512) is: ``` leal (%rbx,%rcx), %edx movslq %edx, %rdx subq %r14, %rdx vmovdqu (%r15,%rcx,2), %xmm6 vbroadcasti64x4 (%r12,%rdx,2), %zmm7 # zmm7 = mem[0,1,2,3,0,1,2,3] vpermw %zmm7, %zmm0, %zmm8 vpermw %zmm7, %zmm1, %zmm7 vpermd %zmm6, %zmm2, %zmm6 vpermd %zmm6, %zmm3, %zmm9 vpmaddwd %zmm9, %zmm8, %zmm8 vpermd %zmm6, %zmm4, %zmm6 vpmaddwd %zmm6, %zmm7, %zmm6 vextracti64x4 $1, %zmm6, %ymm7 vextracti64x4 $1, %zmm8, %ymm9 vpaddd %ymm5, %ymm8, %ymm5 vpaddd %ymm6, %ymm5, %ymm5 vpaddd %ymm5, %ymm9, %ymm5 vpaddd %ymm7, %ymm5, %ymm5 addq $8, %rcx cmpq $128, %rcx ``` The pmaddwds are from rii, and the vpaddds are from rio. This is 3.5x faster than the best schedule that only vectorizes the pure var, and about 2x faster than the best schedule that only vectorizes an unsplit reduction variable. 13 October 2020, 21:00:39 UTC
47a5a44 min and max in the algorithm was confusing loop partitioning Because if any likely tag at all existed on a side of the min/max, even if captured, the other side wasn't getting mutated. This should only happen for uncaptured likelies, where simplifications in the unlikely path are irrelevant. 13 October 2020, 16:03:51 UTC
e8430ea Append missing newline "\n" for camera_pipe usage. 13 October 2020, 16:00:06 UTC
57cac82 Remove redundant last ")" for HalideTraceViz usage 13 October 2020, 16:00:06 UTC
b83cad5 Add calls to simplify 13 October 2020, 09:15:18 UTC
a5d91d5 Merge pull request #5349 from halide/abadams/nested_vectorization_compile_time_regression_fix Fix compile-time regression relating to nested vectorization change 13 October 2020, 03:40:45 UTC
396cf0c Remove erroneous sentence in comment about GCD (#5354) Grand Central Dispatch isn't used anymore; this comment is outdated. 13 October 2020, 00:04:58 UTC
13ae382 Fix comment for halide_error_code_device_dirty_with_no_device_support (#5352) 12 October 2020, 23:02:02 UTC
facb69d Fix for unbounded lanes 12 October 2020, 20:16:43 UTC
50c61b9 Fix compile-time regression relating to nested vectorization change The min_lane expression could grow very large, and required simplifying once per lane 12 October 2020, 20:04:10 UTC
96772ae Stronger simplification of fused ramps (#5343) * Improve simplifier for fused ramp expressions. * Add test checking that fused ramps simplify away. * clang-format * Use custom lowering pass instead of file system to check for mod. Co-authored-by: Dillon Sharlet <dsharlet@gmail.com> 12 October 2020, 18:09:10 UTC
cc34401 Better simplification and codegen for nested vectorization (#5325) * Improvements to nested vectorization simplification and codegen * Add test for nested vectorization perf * Add detection for transpose-shuffles * more ramp-of-ramp simplification * Better pmaddwd recognition for VectorReduce in x86 backend * Add pmaddwd for avx512 10 October 2020, 00:45:52 UTC
fa54197 Remove the ADD_[U]INT64_T_SUFFIX macros Modify codegen for C-like backends to just emit integer constants with the correct suffix for the backend, rather than wrapping in a giant macro; the macro approach worked but lordy was it painful to read. 09 October 2020, 16:24:11 UTC
a1c9d89 Fix for trunk LLVM 08 October 2020, 22:50:01 UTC
3db6e2c Add braces around while 08 October 2020, 21:10:40 UTC
a189fd4 Merge pull request #5335 from jlaxson/gpu-test-workgroup-size Reduce GPU Tile Size in Tests 07 October 2020, 18:36:52 UTC
018d70e Modify mod_round_to_zero to use remainder from long_div 07 October 2020, 06:38:48 UTC
9d48b63 Run clang-format 06 October 2020, 19:25:00 UTC
f99018c Add long div/mod to CodeGen_Internal. 06 October 2020, 19:15:18 UTC
b1fd538 Make C++ backend requirement for C++11 explicit Some C++11 features had crept into our C++ backend codegen; make this explicit and check for the correct version at the top of the generated file. (Then remove the stray checks for C++11 version elsewhere.) 06 October 2020, 16:56:23 UTC
d012df7 Merge pull request #5331 from rootjalex/master fix bounds inference bug for bounded interval / unbounded interval 06 October 2020, 16:35:49 UTC
2d15069 Merge pull request #5334 from halide/abadams/braces_around_statements Require braces around if/while bodies. 06 October 2020, 16:34:55 UTC
8c12c67 Reduce GPU Tile Size 06 October 2020, 02:00:54 UTC
ecad269 Use switch statement instead of if sequence 05 October 2020, 23:56:18 UTC
325490b Require braces around if/while bodies. 05 October 2020, 23:53:09 UTC
eb279fc update comment on bounds bug 05 October 2020, 22:18:10 UTC
408a0e2 fix bounds inference bug for bounded interval / unbounded interval 05 October 2020, 21:45:48 UTC
68f66fe Fix 32-bit Windows vcvars command (#5330) `x64_x86` = build using 64-bit compiler, targeting 32-bit x86 (https://docs.microsoft.com/en-us/cpp/build/building-on-the-command-line?view=vs-2019). We had it backwards before. 05 October 2020, 17:55:33 UTC
86fa657 registerPassBuilderCallbacks is only available in LLVM 12+ 04 October 2020, 18:16:12 UTC
2d3ebec Formatting. 04 October 2020, 18:16:12 UTC
ea583f3 Add TargetMachine::registerPassBuilderCallbacks Add call to TargetMachine::registerPassBuilderCallbacks to allow targets to add passes to the pass pipeline using the New Pass Manager 04 October 2020, 18:16:12 UTC
fa9da79 Merge pull request #5324 from halide/srj/inject_buffer InjectBufferCopiesForInputsAndOutputs should check for unexpected Call nodes. 03 October 2020, 17:16:31 UTC
3d19071 Merge pull request #5315 from halide/srj/abort Kill halide_abort() 03 October 2020, 04:18:10 UTC
fa4d11b InjectBufferCopiesForInputsAndOutputs should have an assertion for Call nodes 03 October 2020, 00:33:29 UTC
908e626 Kill halide_abort() This was added long ago as an attempt to work around issues with the Windows Debug runtime (in which calling `abort()` would produce an "Abort, Retry, Ignore" dialog). We no long do debug builds of any sort on our buildbots, so let's lose all this mess to simplify our world a bit. 02 October 2020, 22:14:00 UTC
a1206b5 Merge pull request #5322 from halide/srj/runtime-warn Make sure the runtime compiler settings in CMake match those in Make 02 October 2020, 21:11:49 UTC
bacd284 Make sure the runtime compiler settings in CMake match those in Make Mainly, we weren't setting any of the warning flags, so CMake builds were more forgiving than Make. 02 October 2020, 21:11:28 UTC
14b567e Merge pull request #5312 from halide/vksnk/align_loads Don't try to align loads if alignment is not divisible by the size of the load 02 October 2020, 18:49:18 UTC
ac56ec9 Merge pull request #5308 from halide/build/shared-llvm-fix Fix linking to shared LLVM. 02 October 2020, 16:56:02 UTC
ff5c2ad Merge branch 'master' into vksnk/align_loads 02 October 2020, 01:07:11 UTC
08ab4a5 Merge pull request #5321 from halide/docs/readme-homebrew Add package manager info to README.md 01 October 2020, 22:35:18 UTC
2768ee3 Add package manager info to README.md 01 October 2020, 21:52:23 UTC
back to top