9fd37eb | Dillon Sharlet | 19 October 2020, 06:08:13 UTC | Addresses TODO. | 19 October 2020, 06:08:13 UTC |
3960cfe | Dillon Sharlet | 19 October 2020, 02:36:49 UTC | Rewrite VectorReduce(Shuffle(x, ...)) -> Shuffle(VectorReduce(x), ...) | 19 October 2020, 02:36:49 UTC |
5286a68 | Steven Johnson | 15 October 2020, 20:57:20 UTC | Fix wasm-related glitches in our timing/benchmarking code | 16 October 2020, 17:40:04 UTC |
574b6bb | Andrew Adams | 15 October 2020, 18:48:38 UTC | Merge pull request #5359 from halide/abadams/less_scalarization_of_vectorized_atomics Handle more types of nested vectorized += without scalarizing | 15 October 2020, 18:48:38 UTC |
98b9074 | Andrew Adams | 14 October 2020, 23:37:33 UTC | Merge remote-tracking branch 'origin/master' into abadams/less_scalarization_of_vectorized_atomics | 14 October 2020, 23:37:33 UTC |
2531d11 | Alexander Root | 14 October 2020, 17:24:04 UTC | bounds inference (mod) pessimistic on unsigned 0 interval (#5350) * move conditions to catch unsigned interval >= 1 case first * use is_int_or_uint() | 14 October 2020, 17:24:04 UTC |
f89060b | Andrew Adams | 14 October 2020, 17:07:39 UTC | Restrict last test-case to x86 | 14 October 2020, 17:07:39 UTC |
fb0f632 | Alex Reinking | 14 October 2020, 07:29:32 UTC | Add vcpkg instructions to README.md | 14 October 2020, 15:46:37 UTC |
9f13e60 | Steven Johnson | 13 October 2020, 17:46:42 UTC | C backend must use memcpy for load/store | 14 October 2020, 15:41:24 UTC |
3f06f37 | Steven Johnson | 09 October 2020, 18:38:25 UTC | Update CodeGen_C.cpp | 14 October 2020, 15:41:24 UTC |
2d2b651 | Steven Johnson | 09 October 2020, 18:26:18 UTC | Replace concat() with a union approach. | 14 October 2020, 15:41:24 UTC |
b8d889d | Steven Johnson | 09 October 2020, 17:56:34 UTC | Fix broken concat() | 14 October 2020, 15:41:24 UTC |
d161660 | Steven Johnson | 09 October 2020, 00:04:03 UTC | Update CodeGen_C.cpp | 14 October 2020, 15:41:24 UTC |
7bec6c8 | Steven Johnson | 09 October 2020, 00:03:00 UTC | Fix relops | 14 October 2020, 15:41:24 UTC |
fe88544 | Steven Johnson | 08 October 2020, 20:39:35 UTC | Update CodeGen_C.cpp | 14 October 2020, 15:41:24 UTC |
3b2d424 | Steven Johnson | 07 October 2020, 21:16:32 UTC | Update CodeGen_C.cpp | 14 October 2020, 15:41:24 UTC |
c63a80a | Steven Johnson | 07 October 2020, 19:14:15 UTC | Update CodeGen_C.cpp | 14 October 2020, 15:41:24 UTC |
6e76f2e | Steven Johnson | 07 October 2020, 19:10:24 UTC | Rewrite internal glue code for vectors in C++ backend. The existing approach (wrapping native vectors in a helper struct) caused too many unnecessary spills. Rewrote so that the underlying Native Vector type is the value being passed around, and improved some of the specializations (with checking of generated code on x64 under clang and gcc). Clang is actually quite good at recognizing patterns and generating appropriate code (at least for x64); gcc is much less so, at least as far as I can tell. | 14 October 2020, 15:41:24 UTC |
f97c4be | Steven Johnson | 14 October 2020, 15:35:58 UTC | Fix for trunk LLVM | 14 October 2020, 15:40:21 UTC |
1e841d6 | Dillon Sharlet | 14 October 2020, 04:27:47 UTC | Merge pull request #5229 from aankit-ca/aankit_long_div Add unsigned long division to CodeGen_Internal. | 14 October 2020, 04:27:47 UTC |
474b9e1 | Andrew Adams | 13 October 2020, 23:01:42 UTC | Use a binary reduction tree for outer iterations Also it's better to vectorize at the native width, for both the new and baseline schedules. New inner loop (which does the same number of multiply-adds, but in a 16x4 tile instead of an 8x8 tile): ``` vmovdqu64 (%rcx,%rdx,8), %zmm5 vmovq (%r14,%rdx,8), %xmm6 # xmm6 = mem[0],zero vpermw %zmm5, %zmm0, %zmm7 vpermw %zmm5, %zmm1, %zmm5 vpermd %zmm6, %zmm2, %zmm8 vpermd %zmm8, %zmm3, %zmm8 vpmaddwd %zmm8, %zmm7, %zmm7 vpbroadcastd %xmm6, %zmm6 vpmaddwd %zmm6, %zmm5, %zmm5 vpaddd %zmm5, %zmm4, %zmm4 vpaddd %zmm7, %zmm4, %zmm4 incq %rdx cmpq $32, %rdx ``` | 13 October 2020, 23:01:42 UTC |
21b7492 | Andrew Adams | 13 October 2020, 22:10:29 UTC | Fix print | 13 October 2020, 22:10:29 UTC |
f029301 | Andrew Adams | 13 October 2020, 21:00:39 UTC | Handle more types of nested vectorized += without scalarizing With this change, in a nesting of vectorized vars in an associative/commutative reduction (e.g. +=), we can now have a reduction var outermost and a reduction var innermost and get good codegen. This is still not fully general - there can only be one pure var in the vectorized stack for it to work. In general is_interleaved_ramp should be an is_tensor_contraction pass that knows how to do clever codegen for those. For the following schedule: ``` prod.compute_at(result, x) .vectorize(x) .update() .split(r, ro, ri, 8) .split(ri, rio, rii, 2) .reorder(rii, x, rio, ro) .vectorize(x) .atomic() .vectorize(rio) .vectorize(rii); ``` We get the following IR: ``` let t2262 = (int32x32)vector_reduce(Add, (int32x64(shuffle((int16x15)p10_im_global_wrapper$0[ramp(0, 1, 15)], 0, 1, 1, 2, 2, 3, 3, 4, 4, 5, 5, 6, 6, 7, 7, 8, 2, 3, 3, 4, 4, 5, 5, 6, 6, 7, 7, 8, 8, 9, 9, 10, 4, 5, 5, 6, 6, 7, 7, 8, 8, 9, 9, 10, 10, 11, 11, 12, 6, 7, 7, 8, 8, 9, 9, 10, 10, 11, 11, 12, 12, 13, 13, 14))*int32x64(shuffle((int16x8)p11_im_global_wrapper$0[ramp(0, 1, 8)], 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 2, 3, 2, 3, 2, 3, 2, 3, 2, 3, 2, 3, 2, 3, 2, 3, 4, 5, 4, 5, 4, 5, 4, 5, 4, 5, 4, 5, 4, 5, 4, 5, 6, 7, 6, 7, 6, 7, 6, 7, 6, 7, 6, 7, 6, 7, 6, 7)))) f58[ramp(0, 1, 8)] = slice_vectors(t2262, 8, 1, 8) + (slice_vectors(t2262, 0, 1, 8) + (slice_vectors(t2262, 16, 1, 8) + (slice_vectors(t2262, 24, 1, 8) + f58[ramp(0, 1, 8)]))) ``` The vector_reduce is from rii, and the sum of slice_vectors is from rio. The generated asm (for avx512) is: ``` leal (%rbx,%rcx), %edx movslq %edx, %rdx subq %r14, %rdx vmovdqu (%r15,%rcx,2), %xmm6 vbroadcasti64x4 (%r12,%rdx,2), %zmm7 # zmm7 = mem[0,1,2,3,0,1,2,3] vpermw %zmm7, %zmm0, %zmm8 vpermw %zmm7, %zmm1, %zmm7 vpermd %zmm6, %zmm2, %zmm6 vpermd %zmm6, %zmm3, %zmm9 vpmaddwd %zmm9, %zmm8, %zmm8 vpermd %zmm6, %zmm4, %zmm6 vpmaddwd %zmm6, %zmm7, %zmm6 vextracti64x4 $1, %zmm6, %ymm7 vextracti64x4 $1, %zmm8, %ymm9 vpaddd %ymm5, %ymm8, %ymm5 vpaddd %ymm6, %ymm5, %ymm5 vpaddd %ymm5, %ymm9, %ymm5 vpaddd %ymm7, %ymm5, %ymm5 addq $8, %rcx cmpq $128, %rcx ``` The pmaddwds are from rii, and the vpaddds are from rio. This is 3.5x faster than the best schedule that only vectorizes the pure var, and about 2x faster than the best schedule that only vectorizes an unsplit reduction variable. | 13 October 2020, 21:00:39 UTC |
47a5a44 | Andrew Adams | 12 October 2020, 23:32:57 UTC | min and max in the algorithm was confusing loop partitioning Because if any likely tag at all existed on a side of the min/max, even if captured, the other side wasn't getting mutated. This should only happen for uncaptured likelies, where simplifications in the unlikely path are irrelevant. | 13 October 2020, 16:03:51 UTC |
e8430ea | xndcn | 13 October 2020, 12:03:41 UTC | Append missing newline "\n" for camera_pipe usage. | 13 October 2020, 16:00:06 UTC |
57cac82 | xndcn | 13 October 2020, 10:42:55 UTC | Remove redundant last ")" for HalideTraceViz usage | 13 October 2020, 16:00:06 UTC |
b83cad5 | Ankit Aggarwal | 13 October 2020, 09:15:18 UTC | Add calls to simplify | 13 October 2020, 09:15:18 UTC |
a5d91d5 | Dillon Sharlet | 13 October 2020, 03:40:45 UTC | Merge pull request #5349 from halide/abadams/nested_vectorization_compile_time_regression_fix Fix compile-time regression relating to nested vectorization change | 13 October 2020, 03:40:45 UTC |
396cf0c | Shoaib Kamil | 13 October 2020, 00:04:58 UTC | Remove erroneous sentence in comment about GCD (#5354) Grand Central Dispatch isn't used anymore; this comment is outdated. | 13 October 2020, 00:04:58 UTC |
13ae382 | Andrew Adams | 12 October 2020, 23:02:02 UTC | Fix comment for halide_error_code_device_dirty_with_no_device_support (#5352) | 12 October 2020, 23:02:02 UTC |
facb69d | Andrew Adams | 12 October 2020, 20:16:43 UTC | Fix for unbounded lanes | 12 October 2020, 20:16:43 UTC |
50c61b9 | Andrew Adams | 12 October 2020, 20:04:10 UTC | Fix compile-time regression relating to nested vectorization change The min_lane expression could grow very large, and required simplifying once per lane | 12 October 2020, 20:04:10 UTC |
96772ae | Dillon Sharlet | 12 October 2020, 18:09:10 UTC | Stronger simplification of fused ramps (#5343) * Improve simplifier for fused ramp expressions. * Add test checking that fused ramps simplify away. * clang-format * Use custom lowering pass instead of file system to check for mod. Co-authored-by: Dillon Sharlet <dsharlet@gmail.com> | 12 October 2020, 18:09:10 UTC |
cc34401 | Andrew Adams | 10 October 2020, 00:45:52 UTC | Better simplification and codegen for nested vectorization (#5325) * Improvements to nested vectorization simplification and codegen * Add test for nested vectorization perf * Add detection for transpose-shuffles * more ramp-of-ramp simplification * Better pmaddwd recognition for VectorReduce in x86 backend * Add pmaddwd for avx512 | 10 October 2020, 00:45:52 UTC |
fa54197 | Steven Johnson | 08 October 2020, 21:23:52 UTC | Remove the ADD_[U]INT64_T_SUFFIX macros Modify codegen for C-like backends to just emit integer constants with the correct suffix for the backend, rather than wrapping in a giant macro; the macro approach worked but lordy was it painful to read. | 09 October 2020, 16:24:11 UTC |
a1c9d89 | Steven Johnson | 08 October 2020, 17:39:19 UTC | Fix for trunk LLVM | 08 October 2020, 22:50:01 UTC |
3db6e2c | Ankit Aggarwal | 08 October 2020, 21:10:40 UTC | Add braces around while | 08 October 2020, 21:10:40 UTC |
a189fd4 | Andrew Adams | 07 October 2020, 18:36:52 UTC | Merge pull request #5335 from jlaxson/gpu-test-workgroup-size Reduce GPU Tile Size in Tests | 07 October 2020, 18:36:52 UTC |
018d70e | Ankit Aggarwal | 07 October 2020, 06:37:14 UTC | Modify mod_round_to_zero to use remainder from long_div | 07 October 2020, 06:38:48 UTC |
9d48b63 | Ankit Aggarwal | 06 October 2020, 19:25:00 UTC | Run clang-format | 06 October 2020, 19:25:00 UTC |
f99018c | Ankit Aggarwal | 06 October 2020, 19:15:18 UTC | Add long div/mod to CodeGen_Internal. | 06 October 2020, 19:15:18 UTC |
b1fd538 | Steven Johnson | 05 October 2020, 22:41:32 UTC | Make C++ backend requirement for C++11 explicit Some C++11 features had crept into our C++ backend codegen; make this explicit and check for the correct version at the top of the generated file. (Then remove the stray checks for C++11 version elsewhere.) | 06 October 2020, 16:56:23 UTC |
d012df7 | Andrew Adams | 06 October 2020, 16:35:49 UTC | Merge pull request #5331 from rootjalex/master fix bounds inference bug for bounded interval / unbounded interval | 06 October 2020, 16:35:49 UTC |
2d15069 | Andrew Adams | 06 October 2020, 16:34:55 UTC | Merge pull request #5334 from halide/abadams/braces_around_statements Require braces around if/while bodies. | 06 October 2020, 16:34:55 UTC |
8c12c67 | John Laxson | 06 October 2020, 02:00:54 UTC | Reduce GPU Tile Size | 06 October 2020, 02:00:54 UTC |
ecad269 | Andrew Adams | 05 October 2020, 23:56:18 UTC | Use switch statement instead of if sequence | 05 October 2020, 23:56:18 UTC |
325490b | Andrew Adams | 05 October 2020, 23:53:09 UTC | Require braces around if/while bodies. | 05 October 2020, 23:53:09 UTC |
eb279fc | Alexander Root | 05 October 2020, 22:18:10 UTC | update comment on bounds bug | 05 October 2020, 22:18:10 UTC |
408a0e2 | Alexander Root | 05 October 2020, 21:45:48 UTC | fix bounds inference bug for bounded interval / unbounded interval | 05 October 2020, 21:45:48 UTC |
68f66fe | Shoaib Kamil | 05 October 2020, 17:55:33 UTC | Fix 32-bit Windows vcvars command (#5330) `x64_x86` = build using 64-bit compiler, targeting 32-bit x86 (https://docs.microsoft.com/en-us/cpp/build/building-on-the-command-line?view=vs-2019). We had it backwards before. | 05 October 2020, 17:55:33 UTC |
86fa657 | Steven Johnson | 02 October 2020, 22:02:38 UTC | registerPassBuilderCallbacks is only available in LLVM 12+ | 04 October 2020, 18:16:12 UTC |
2d3ebec | Ankit Aggarwal | 01 October 2020, 00:38:14 UTC | Formatting. | 04 October 2020, 18:16:12 UTC |
ea583f3 | Ankit Aggarwal | 26 August 2020, 18:20:45 UTC | Add TargetMachine::registerPassBuilderCallbacks Add call to TargetMachine::registerPassBuilderCallbacks to allow targets to add passes to the pass pipeline using the New Pass Manager | 04 October 2020, 18:16:12 UTC |
fa9da79 | Steven Johnson | 03 October 2020, 17:16:31 UTC | Merge pull request #5324 from halide/srj/inject_buffer InjectBufferCopiesForInputsAndOutputs should check for unexpected Call nodes. | 03 October 2020, 17:16:31 UTC |
3d19071 | Steven Johnson | 03 October 2020, 04:18:10 UTC | Merge pull request #5315 from halide/srj/abort Kill halide_abort() | 03 October 2020, 04:18:10 UTC |
fa4d11b | Steven Johnson | 03 October 2020, 00:33:29 UTC | InjectBufferCopiesForInputsAndOutputs should have an assertion for Call nodes | 03 October 2020, 00:33:29 UTC |
908e626 | Steven Johnson | 02 October 2020, 22:14:00 UTC | Kill halide_abort() This was added long ago as an attempt to work around issues with the Windows Debug runtime (in which calling `abort()` would produce an "Abort, Retry, Ignore" dialog). We no long do debug builds of any sort on our buildbots, so let's lose all this mess to simplify our world a bit. | 02 October 2020, 22:14:00 UTC |
a1206b5 | Steven Johnson | 02 October 2020, 21:11:49 UTC | Merge pull request #5322 from halide/srj/runtime-warn Make sure the runtime compiler settings in CMake match those in Make | 02 October 2020, 21:11:49 UTC |
bacd284 | Steven Johnson | 02 October 2020, 18:47:48 UTC | Make sure the runtime compiler settings in CMake match those in Make Mainly, we weren't setting any of the warning flags, so CMake builds were more forgiving than Make. | 02 October 2020, 21:11:28 UTC |
14b567e | Steven Johnson | 02 October 2020, 18:49:18 UTC | Merge pull request #5312 from halide/vksnk/align_loads Don't try to align loads if alignment is not divisible by the size of the load | 02 October 2020, 18:49:18 UTC |
ac56ec9 | Steven Johnson | 02 October 2020, 16:56:02 UTC | Merge pull request #5308 from halide/build/shared-llvm-fix Fix linking to shared LLVM. | 02 October 2020, 16:56:02 UTC |
ff5c2ad | Steven Johnson | 02 October 2020, 01:07:11 UTC | Merge branch 'master' into vksnk/align_loads | 02 October 2020, 01:07:11 UTC |
08ab4a5 | Steven Johnson | 01 October 2020, 22:35:18 UTC | Merge pull request #5321 from halide/docs/readme-homebrew Add package manager info to README.md | 01 October 2020, 22:35:18 UTC |
2768ee3 | Alex Reinking | 01 October 2020, 21:52:23 UTC | Add package manager info to README.md | 01 October 2020, 21:52:23 UTC |
70e98d2 | Steven Johnson | 01 October 2020, 21:21:11 UTC | Merge branch 'master' into build/shared-llvm-fix | 01 October 2020, 21:21:11 UTC |
17e1ec6 | Steven Johnson | 01 October 2020, 21:20:57 UTC | Merge branch 'master' into vksnk/align_loads | 01 October 2020, 21:20:57 UTC |
3b47c0e | Marcos Slomp | 01 October 2020, 21:19:11 UTC | [d3d12] allocation cache + bugfixes (#5298) * refactoring to remove wait/sync points from kernel dispatch * debugging and bugfixes * refactoring wait/sync procedures * refactoring buffer signal checkpoints * improved time tracing * improved device selection and additional trace scoping features * addressing clang format issues * more clang format complaints... ¯\_(ツ)_/¯ * more clang format... * clang format... * nullptr -> NULL (0) * addressing code review comments * scoping the kernel argument setup code * addressing code review comments * clang format... * buffer allocation cache * rearranging wait/sync points in the allocation cache strategy * cleanup and refactoring * bugfix: must reset descriptor binder state when recycling it for a new command frame * releasing cached resources on device shutdown * reworking device crop release since allocation cache has been implemented * improved trace info, comments and asserts * refactorig device creation * tracking checkpoints in device<->device transfers * better debug dump report scope * removing old code * clang format * unused variable * Fix python_correctness_boundary_conditions * atomic clarity * refactoring of allocation cache (with local toggle) * Tickle Buildbots * Modify load/store codegen to support load/store from/to shared mem * adding trace-level support * Tickle buildbot * addressing code review * refactoring trace errors/warnings and context halting condition * clang format * build fix * resetting d3dd12_frame struct fields after release * reverting accidetal changes (corrupted git index during stage) * adding remarks with regards to device creation quirks in d3d12 * fixing build (windows cross-compilation on linux) * removing double semi-colon (clang-format) * Tickle the buildbots * Fix signed/unsigned mismatch in d3d12compute.cpp * Remove space in cast Co-authored-by: Marcos Slomp <slomp@adobe.com> Co-authored-by: Shoaib Kamil <kamil@adobe.com> Co-authored-by: Steven Johnson <srj@google.com> Co-authored-by: Shoaib Kamil <shoaibkamil@gmail.com> | 01 October 2020, 21:19:11 UTC |
0f8e9cc | Steven Johnson | 01 October 2020, 21:00:55 UTC | Tickle the buildbots | 01 October 2020, 21:00:55 UTC |
cb9589d | Steven Johnson | 01 October 2020, 21:00:25 UTC | Tickle the buildbots | 01 October 2020, 21:00:25 UTC |
3431d52 | Steven Johnson | 01 October 2020, 16:53:33 UTC | Merge branch 'master' into build/shared-llvm-fix | 01 October 2020, 16:53:33 UTC |
87466ad | Steven Johnson | 01 October 2020, 16:53:03 UTC | Merge branch 'master' into vksnk/align_loads | 01 October 2020, 16:53:03 UTC |
a957053 | Steven Johnson | 01 October 2020, 16:43:58 UTC | Merge pull request #5317 from halide/srj/runtime-errors Make halide_assert() use do...while(0) idiom | 01 October 2020, 16:43:58 UTC |
7da643b | Steven Johnson | 01 October 2020, 00:23:50 UTC | Make halide_assert() use do.while(0) idiom This is the canonical form for statement-like macros in C. Added some missing semicolons that were detected by this (and fixed formatting). | 01 October 2020, 00:23:50 UTC |
f501810 | Steven Johnson | 30 September 2020, 23:07:36 UTC | Merge branch 'master' into build/shared-llvm-fix | 30 September 2020, 23:07:36 UTC |
ab3a541 | Steven Johnson | 30 September 2020, 23:06:56 UTC | Merge branch 'master' into vksnk/align_loads | 30 September 2020, 23:06:56 UTC |
6fc5bc8 | Steven Johnson | 30 September 2020, 23:05:10 UTC | Merge pull request #5307 from halide/shoaibkamil/host_supports_target_device Fix host_supports_target_device() | 30 September 2020, 23:05:10 UTC |
f7b5382 | Shoaib Kamil | 30 September 2020, 17:58:06 UTC | Update DeviceInterface.cpp | 30 September 2020, 17:58:06 UTC |
2c6e683 | Volodymyr Kysenko | 30 September 2020, 17:47:31 UTC | Don't try to align loads if alignment is not divisible by the size of the load | 30 September 2020, 17:49:48 UTC |
69be4d4 | Alex Reinking | 30 September 2020, 16:39:20 UTC | Fix linking to shared LLVM. Fixes #5304. | 30 September 2020, 17:17:47 UTC |
c7935de | Shoaib Kamil | 30 September 2020, 13:35:53 UTC | Fix host_supports_target_device() | 30 September 2020, 13:35:53 UTC |
dc89424 | Steven Johnson | 30 September 2020, 01:22:20 UTC | Merge pull request #5303 from halide/srj/win-32 Fix runtime build rules in Makefile | 30 September 2020, 01:22:20 UTC |
d1ea4c3 | Steven Johnson | 29 September 2020, 23:43:04 UTC | Merge pull request #5301 from halide/srj/init-index Avoid possibly-uninitialized use of RVar::_index | 29 September 2020, 23:43:04 UTC |
69aeb9e | Steven Johnson | 29 September 2020, 19:12:53 UTC | Fix runtime build rules in Makefile: - one of the Windows-specific runtime files had a 32-vs-64 glitch - CMake now uses `fno-threadsafe-statics` (instead of `-std=gnu++98`) to disable thread-safe static initialization; as a result, this allowed C++11 code requirements to creep in (via d3d12compute.cpp), but we didn't notice because the Makefile wasn't properly building that file due to the 32-vs-64 glitch. Fixed by updating the Makefile to use this flag instead (which was an overdue fix anyway). | 29 September 2020, 19:44:46 UTC |
97cefb9 | Steven Johnson | 29 September 2020, 18:11:26 UTC | Avoid possibly-uninitialized use of RVar::_index One of the armbots warned that this field could be used uninitialized; I can't replicate anywhere else, but indeed, the string-only ctor of RVar left this uninitialized. Defaulted it to -1 and added an explicit check in _var(). (Yes, the call to `at()` will fail when out of range, but explicit checking is better IMHO.) | 29 September 2020, 18:37:35 UTC |
f6c607b | Alex Reinking | 25 September 2020, 23:53:16 UTC | Replace large code model build option with target feature. (#5216) | 25 September 2020, 23:53:16 UTC |
a239951 | Steven Johnson | 24 September 2020, 18:55:20 UTC | Merge pull request #5294 from halide/srj/dupnames Check for duplicated Parameter/Buffer names in InferArguments (Issue #5292) | 24 September 2020, 18:55:20 UTC |
c110bec | Steven Johnson | 23 September 2020, 00:54:30 UTC | Check for duplicated Parameter/Buffer names in InferArguments | 24 September 2020, 18:54:59 UTC |
a4e4052 | Marcos Slomp | 24 September 2020, 17:12:45 UTC | [d3d12] recycling "frame" resources and removing superfluous sync points (#5293) * refactoring to remove wait/sync points from kernel dispatch * debugging and bugfixes * refactoring wait/sync procedures * refactoring buffer signal checkpoints * improved time tracing * improved device selection and additional trace scoping features * addressing clang format issues * more clang format complaints... ¯\_(ツ)_/¯ * more clang format... * clang format... * nullptr -> NULL (0) * addressing code review comments * scoping the kernel argument setup code * addressing code review comments * clang format... Co-authored-by: Marcos Slomp <slomp@adobe.com> | 24 September 2020, 17:12:45 UTC |
ee2cb21 | Steven Johnson | 23 September 2020, 21:47:31 UTC | Merge pull request #5283 from halide/wabt-bundle Bundle wabt objects into libHalide | 23 September 2020, 21:47:31 UTC |
44817ce | Andrew Adams | 23 September 2020, 20:15:35 UTC | Merge pull request #5295 from halide/abadams/fix_generate_output_snippets Rename LINES to INTERESTING_LINES | 23 September 2020, 20:15:35 UTC |
d638d81 | Andrew Adams | 23 September 2020, 19:34:55 UTC | Rename LINES to INTERESTING_LINES Some terminals treat LINES as a special var, breaking this script | 23 September 2020, 19:34:55 UTC |
948d3b8 | Ankit Aggarwal | 23 September 2020, 02:31:50 UTC | Add unsigned long division to CodeGen_Internal. Code refactoring. | 23 September 2020, 02:31:50 UTC |
21e3f96 | Steven Johnson | 22 September 2020, 16:40:28 UTC | Merge pull request #5290 from NewProggie/patch-1 Fix typo in lesson 21 | 22 September 2020, 16:40:28 UTC |
671530a | Kai Wolf | 22 September 2020, 07:32:06 UTC | Fix typo in lesson 21 | 22 September 2020, 07:32:06 UTC |
f5a764f | Volodymyr Kysenko | 21 September 2020, 16:52:03 UTC | Merge pull request #4873 from halide/vksnk/vector-ramp Support for multi-dim vectorization | 21 September 2020, 16:52:03 UTC |
7178b83 | Alex Reinking | 21 September 2020, 02:22:03 UTC | Re-enable CUDAVectorize tests. Fixes #4554. (#5286) | 21 September 2020, 02:22:03 UTC |
f256e8f | Volodymyr Kysenko | 19 September 2020, 20:24:58 UTC | Address review comments | 19 September 2020, 20:24:58 UTC |
ab54b55 | Alex Reinking | 18 September 2020, 23:36:49 UTC | bundle wabt objects into libHalide | 18 September 2020, 23:40:03 UTC |
26ebef3 | Steven Johnson | 17 September 2020, 22:23:39 UTC | Merge pull request #5279 from halide/srj-tidy Appease clang-tidy | 17 September 2020, 22:23:39 UTC |
b6da613 | Steven Johnson | 17 September 2020, 22:17:44 UTC | Merge pull request #5275 from halide/srj/simplify-if-then-else Simplify Call::if_then_else | 17 September 2020, 22:17:44 UTC |