664dc49 | Andrew Adams | 05 September 2020, 21:51:19 UTC | Merge pull request #5246 from halide/dsharletg-patch-1 Update comments on depthwise convolution schedule. | 05 September 2020, 21:51:19 UTC |
1ebd926 | Dillon Sharlet | 05 September 2020, 19:35:09 UTC | Update comments on depthwise convolution schedule. | 05 September 2020, 19:35:09 UTC |
6dc230f | Volodymyr Kysenko | 04 September 2020, 18:54:21 UTC | Merge pull request #5242 from halide/vksnk/func-error More detailed error messages for *_storage functions of Func | 04 September 2020, 18:54:21 UTC |
6f87124 | Volodymyr Kysenko | 03 September 2020, 20:16:44 UTC | Formatting | 03 September 2020, 20:16:44 UTC |
c4bd534 | Volodymyr Kysenko | 03 September 2020, 19:51:26 UTC | More detailed error messabes for *_storage functions of Func | 03 September 2020, 19:51:26 UTC |
e2fbb5d | Steven Johnson | 03 September 2020, 17:48:43 UTC | Merge pull request #5196 from aankit-ca/aankit_lut_lb [Hexagon] Fix for LUT32 correctness. | 03 September 2020, 17:48:43 UTC |
0e78e37 | Andrew Adams | 03 September 2020, 17:40:35 UTC | Merge pull request #5231 from halide/abadams/dont_lift_strict_float Teach LICM to not view strict_float as work | 03 September 2020, 17:40:35 UTC |
0bcd416 | Andrew Adams | 03 September 2020, 17:40:20 UTC | Merge pull request #5240 from halide/abadams/fully_fused_depthwise_separable_conv Reschedule depthwise separable convolution again | 03 September 2020, 17:40:20 UTC |
4170427 | Andrew Adams | 02 September 2020, 22:32:24 UTC | Remove dead split | 02 September 2020, 22:32:24 UTC |
a054a91 | Andrew Adams | 02 September 2020, 22:31:11 UTC | Reschedule depthwise separable conv again I figured out how to do full fusion of the depthwise stage into the pointwise stage. Got it down to about 65us | 02 September 2020, 22:31:11 UTC |
eb85b27 | Andrew Adams | 02 September 2020, 22:28:55 UTC | Fix tensorflow benchmarking methodology | 02 September 2020, 22:28:55 UTC |
3756878 | Steven Johnson | 02 September 2020, 21:27:18 UTC | Merge branch 'master' into abadams/dont_lift_strict_float | 02 September 2020, 21:27:18 UTC |
ab6ae3e | Steven Johnson | 02 September 2020, 20:02:42 UTC | Merge pull request #5217 from halide/slomp/d3d12-abi-patch ABI fix for D3D12 | 02 September 2020, 20:02:42 UTC |
12e5a4c | Andrew Adams | 02 September 2020, 19:35:23 UTC | Merge pull request #5233 from halide/abadams/fix_llvm_trunk Fix for llvm trunk | 02 September 2020, 19:35:23 UTC |
c1acea8 | Steven Johnson | 01 September 2020, 22:45:51 UTC | Merge branch 'master' into slomp/d3d12-abi-patch | 01 September 2020, 22:45:51 UTC |
3338d78 | Andrew Adams | 01 September 2020, 17:33:34 UTC | Add more free intrinsics | 01 September 2020, 17:33:34 UTC |
306c81f | Andrew Adams | 01 September 2020, 17:29:22 UTC | Better error handling | 01 September 2020, 17:29:22 UTC |
ab43e23 | Andrew Adams | 01 September 2020, 17:26:31 UTC | Fix for llvm trunk get_vector_num_elements has changed again due to SVE-related stuff | 01 September 2020, 17:26:31 UTC |
887a149 | Andrew Adams | 01 September 2020, 06:35:55 UTC | Teach LICM to not view strict_float as work Fixes #5230 | 01 September 2020, 06:35:55 UTC |
f2b2cba | Alex Reinking | 31 August 2020, 22:31:13 UTC | Thoroughly document CMake build (#5215) Turns README_cmake.md into a comprehensive guide to the three main usage stories of the Halide CMake build: 1. Compiling or packaging Halide from source. 2. Building Halide programs using the official CMake package. 3. Contributing to Halide and updating the build files. | 31 August 2020, 22:31:13 UTC |
3312604 | Steven Johnson | 31 August 2020, 22:14:45 UTC | Merge branch 'master' into slomp/d3d12-abi-patch | 31 August 2020, 22:14:45 UTC |
44cc138 | Steven Johnson | 31 August 2020, 22:14:15 UTC | Merge pull request #5219 from halide/abadams/redo_output_assignment_error_messages Redo the error messages when assigning a Func to an Output to be more explicit | 31 August 2020, 22:14:15 UTC |
163a870 | Steven Johnson | 31 August 2020, 18:29:39 UTC | Merge branch 'master' into abadams/redo_output_assignment_error_messages | 31 August 2020, 18:29:39 UTC |
3a52d9f | Steven Johnson | 31 August 2020, 18:29:18 UTC | Merge branch 'master' into slomp/d3d12-abi-patch | 31 August 2020, 18:29:18 UTC |
ddd6c57 | Andrew Adams | 31 August 2020, 18:12:57 UTC | Merge pull request #5214 from halide/abadams/precompute_shared_mem_size Exhaustively compute max allocation size on host for non-monotonic shared memory sizes | 31 August 2020, 18:12:57 UTC |
90f07ad | Steven Johnson | 31 August 2020, 17:05:06 UTC | Merge branch 'master' into abadams/redo_output_assignment_error_messages | 31 August 2020, 17:05:06 UTC |
c097aac | Alex Reinking | 31 August 2020, 06:16:39 UTC | Replace ANSI Win32 API calls with UTF-16; convert UTF-8 at boundary. Fixes #5223 (#5227) | 31 August 2020, 06:16:39 UTC |
c3fecd5 | Andrew Adams | 28 August 2020, 15:36:37 UTC | Merge remote-tracking branch 'origin/master' into slomp/d3d12-abi-patch | 28 August 2020, 15:36:37 UTC |
d046701 | Alex Reinking | 28 August 2020, 07:50:04 UTC | Cmake/metatargets (#5218) * Small CMake fixes discovered while documenting. 1. add_halide_library should always produce a global target 2. HALIDE_ -> Halide_ for large codemodel 3. Remove redundant flags / useless options. 4. "YES" vs "ON" convention & grammar fix 5. Fix "cmake" -> "host" meta-target promotion * Rework `cmake` meta-target to not promote. Warn when used as default. | 28 August 2020, 07:50:04 UTC |
aff1639 | Marcos Slomp | 27 August 2020, 20:53:29 UTC | Merge branch 'master' into slomp/d3d12-abi-patch | 27 August 2020, 20:53:29 UTC |
c78ccbe | Alex Reinking | 27 August 2020, 20:44:39 UTC | Merge pull request #5222 from halide/cmake/ninja-deps Fix incremental builds with Ninja | 27 August 2020, 20:44:39 UTC |
d324e8d | Alex Reinking | 27 August 2020, 09:50:49 UTC | Fix incremental builds with Ninja | 27 August 2020, 09:50:49 UTC |
65cfa48 | Alex Reinking | 27 August 2020, 07:54:26 UTC | Merge pull request #5220 from halide/runtime/exclude-abort fix lesson_15 test by excluding posix/windows_abort | 27 August 2020, 07:54:26 UTC |
9d7c65c | Dillon Sharlet | 27 August 2020, 06:04:58 UTC | Merge pull request #5191 from halide/remove-pipeline-context Remove PipelineContext | 27 August 2020, 06:04:58 UTC |
9903d2b | Andrew Adams | 27 August 2020, 02:14:59 UTC | Add comment explaining why we don't do dynamic tracking when no upper bound too | 27 August 2020, 02:14:59 UTC |
1b50c6c | Andrew Adams | 27 August 2020, 02:05:37 UTC | Fix search-replace run amok | 27 August 2020, 02:05:37 UTC |
82021fc | Alex Reinking | 27 August 2020, 01:56:59 UTC | fix lesson_15 test by excluding posix/windows_abort | 27 August 2020, 02:01:20 UTC |
52e15eb | Andrew Adams | 26 August 2020, 22:39:01 UTC | Redo the error messages when assigning a Func to an Output to be more explicit I got some feedback that these were confusing | 26 August 2020, 22:39:01 UTC |
da1d605 | Marcos Slomp | 26 August 2020, 21:28:33 UTC | addressing clang-format | 26 August 2020, 21:28:33 UTC |
afc8df1 | Marcos Slomp | 26 August 2020, 21:24:03 UTC | addressing code review comments | 26 August 2020, 21:24:03 UTC |
6079327 | Steven Johnson | 26 August 2020, 17:33:03 UTC | Merge pull request #5212 from Infinoid/stringify-expr-rdom-rvar Teach the python bindings how to stringify Expr, RDom, RVar. | 26 August 2020, 17:33:03 UTC |
e6cdb69 | Marcos Slomp | 26 August 2020, 17:18:00 UTC | Merge remote-tracking branch 'remotes/origin/master' into halide-builder/origin/slomp/d3d12-abi-patch | 26 August 2020, 17:18:00 UTC |
9e5cff9 | Andrew Adams | 26 August 2020, 17:16:47 UTC | Merge pull request #5211 from halide/abadams/depthwise_separable_conv Depthwise separable convolution | 26 August 2020, 17:16:47 UTC |
173b762 | Andrew Adams | 26 August 2020, 17:16:09 UTC | Merge pull request #5205 from halide/abadams/more_cuda_generations Add target flags for volta, turing, ansel. | 26 August 2020, 17:16:09 UTC |
affe01b | Andrew Adams | 25 August 2020, 21:16:15 UTC | Merge pull request #5208 from halide/abadams/avoid_name_mangling_in_cross_module_dependencies Avoid C++ name mangling issues when calling between runtime modules | 25 August 2020, 21:16:15 UTC |
13f388d | Andrew Adams | 25 August 2020, 21:15:53 UTC | Merge remote-tracking branch 'origin/master' into abadams/avoid_name_mangling_in_cross_module_dependencies | 25 August 2020, 21:15:53 UTC |
4fe90e6 | Andrew Adams | 25 August 2020, 21:15:10 UTC | Merge remote-tracking branch 'origin/master' into abadams/more_cuda_generations | 25 August 2020, 21:15:10 UTC |
a79172a | Andrew Adams | 25 August 2020, 19:56:49 UTC | Add another test case | 25 August 2020, 19:56:49 UTC |
b101595 | Andrew Adams | 25 August 2020, 19:52:18 UTC | Exhaustively compute max on host for non-monotonic shared memory sizes GPU kernel launches must use the same amount of shared memory per block, and this has to be computed ahead of time on the host. The expression that gives the size of the allocations compute_at blocks are inside the kernel though, and are a function of bounds inference. We therefore have to take the max of these sizes over all blocks. This is extremely prone to interval arithmetic being overconservative, because these are extents computed from a max minus a min, and the max and min are both frequently correlated with the block variable. This causes a lot of otherwise fine schedules to fail at runtime with CUDA_ERROR_INVALID_VALUE. This PR detects cases where interval arithmetic is going to be overconservative using is_monotonic, and hoists the computation of shared memory size to an explicit loop over blocks on the CPU, taking the max shared allocation size exhaustively. This implies some work on the CPU, but 1) A loop over blocks is typically at least 32x fewer iterations than the loop over pixels 2) This work can overlap with the previous kernel launch on the GPU still running 3) The alternative is crashing This feature has proved to make GPU schedules much more robust in the gpu autoscheduler branch, so I think we should promote it to master. It's a bit wild though, because this is the first instance I can think of where we inject a new unscheduled loop for some bounds inference purpose. | 25 August 2020, 19:52:18 UTC |
64fcd56 | Andrew Adams | 25 August 2020, 19:20:51 UTC | Rename some variables | 25 August 2020, 19:20:51 UTC |
59fb46d | Andrew Adams | 25 August 2020, 19:16:06 UTC | Re-enable GPU schedule | 25 August 2020, 19:16:06 UTC |
feb4bd6 | Andrew Adams | 25 August 2020, 17:30:15 UTC | Merge remote-tracking branch 'origin/master' into abadams/depthwise_separable_conv | 25 August 2020, 17:30:15 UTC |
092bb85 | Andrew Adams | 25 August 2020, 17:30:07 UTC | Add missing overload to python bindings | 25 August 2020, 17:30:07 UTC |
f291829 | Mark Glines | 25 August 2020, 13:40:39 UTC | Teach the python bindings how to stringify Expr, RDom, RVar. | 25 August 2020, 13:40:39 UTC |
5ca9c9c | Andrew Adams | 25 August 2020, 00:28:50 UTC | More comments on the CPU schedule | 25 August 2020, 00:28:50 UTC |
679638f | Andrew Adams | 25 August 2020, 00:24:34 UTC | Add a depthwise separable conv layer app And a tensorflow reference. We're quite a bit faster than tensorflow. Also added an extra Func::tile overload that seemed missing. | 25 August 2020, 00:24:34 UTC |
7e06416 | Marcos Slomp | 24 August 2020, 19:00:36 UTC | Merge remote-tracking branch 'remotes/origin/abadams/avoid_name_mangling_in_cross_module_dependencies' into halide-builder/origin/slomp/d3d12-abi-patch | 24 August 2020, 19:00:36 UTC |
bbf2d95 | Andrew Adams | 24 August 2020, 18:16:29 UTC | Reorder feature enum to put cuda capabilities together | 24 August 2020, 18:16:29 UTC |
f816911 | Andrew Adams | 24 August 2020, 18:10:42 UTC | Expose get_cuda_capability_lower_bound helper | 24 August 2020, 18:10:42 UTC |
8aa2308 | dsharletg | 24 August 2020, 17:58:30 UTC | Merge branch 'remove-pipeline-context' of https://github.com/halide/Halide into remove-pipeline-context | 24 August 2020, 17:58:30 UTC |
adaa769 | dsharletg | 24 August 2020, 17:58:00 UTC | Merge branch 'master' of https://github.com/halide/Halide into remove-pipeline-context | 24 August 2020, 17:58:00 UTC |
211a4ef | Dillon Sharlet | 24 August 2020, 17:37:45 UTC | Merge pull request #5192 from halide/remove-mmap2 Remove mmap_dlopen from Hexagon runtime | 24 August 2020, 17:37:45 UTC |
9b7ff66 | Dillon Sharlet | 24 August 2020, 17:30:47 UTC | Merge pull request #5206 from halide/abadams/check_reorder_dups Check for duplicate vars in calls to reorder/reorder_storage | 24 August 2020, 17:30:47 UTC |
5ed5c86 | Alex Reinking | 24 August 2020, 08:12:16 UTC | Merge pull request #5209 from halide/bugfix/tutorial-test Fix expected files list for lesson 15 test. | 24 August 2020, 08:12:16 UTC |
9e98e13 | Alex Reinking | 24 August 2020, 02:17:33 UTC | Fix expected files list for lesson 15 test. | 24 August 2020, 02:17:33 UTC |
95adfce | Andrew Adams | 24 August 2020, 01:03:08 UTC | Avoid C++ name mangling issues when calling between runtime modules Different runtime modules are initially compiled to llvm bitcode with different target triples (to support stdcall stuff on windows without requiring a precompiled windows version of every single runtime module). These triples are unified before the modules are linked, but before that happens we need to avoid anything target-triple-sensitive done by clang when compiling to llvm assembly (e.g. structs that need padding bytes). One such issue is calling c++ functions across runtime modules. The name mangling could work out differently on both sides of the call. This PR removes all instances of this (at least the ones that go via runtime_internal.h) The functions declared in Halide::Runtime::Internal didn't seem any more internal than the extern "C" functions declared immediately above it, so I promoted those functions (e.g. halide_abort()) to there. | 24 August 2020, 01:03:08 UTC |
8cee0da | Andrew Adams | 23 August 2020, 21:39:07 UTC | Check for duplicate vars in calls to reorder/reorder_storage | 23 August 2020, 21:39:07 UTC |
5dea096 | Andrew Adams | 23 August 2020, 21:20:51 UTC | Always copy-to-host before conversion | 23 August 2020, 21:20:51 UTC |
d9ab698 | Andrew Adams | 23 August 2020, 21:20:43 UTC | reschedule hist | 23 August 2020, 21:20:43 UTC |
d112f1c | Andrew Adams | 23 August 2020, 19:38:58 UTC | Slight schedule fix for stencil chain | 23 August 2020, 19:38:58 UTC |
9d42b3a | Andrew Adams | 23 August 2020, 19:38:20 UTC | Better schedule for harris | 23 August 2020, 19:38:20 UTC |
7281218 | Andrew Adams | 23 August 2020, 19:37:06 UTC | Add support for cuda generations volta, turing, and ansel This doesn't actually change much of anything, as the driver's ptx jit compiler generates code for the appropriate arch already. It does seem to result in different codegen in one or two cases though. | 23 August 2020, 19:37:06 UTC |
73c81c4 | Alex Reinking | 21 August 2020, 21:37:24 UTC | Merge pull request #5186 from halide/cmake/generator-objs Teach add_halide_library about cross compilation | 21 August 2020, 21:37:24 UTC |
df8a55f | Alex Reinking | 21 August 2020, 13:05:52 UTC | Improvements to add_halide_library. Generator tests build clean-up. * When not cross-compiling, add_halide_library creates STATIC, not IMPORTED libraries. * Otherwise, creates IMPORTED libraries * Add `cmake` pseudo-target to ensure compatibility with active CMake toolchain. * Clean up generator tests build and add missing tests. * Extend CMake file lists presubmit check to look for a mention of every file in the folder. | 21 August 2020, 13:05:52 UTC |
a60e0aa | Alex Reinking | 21 August 2020, 12:54:46 UTC | Export Halide_HOST_TARGET in distribution packages. | 21 August 2020, 12:54:46 UTC |
6f3f13a | Alex Reinking | 21 August 2020, 12:51:24 UTC | Use VERBATIM on custom commands. Normalize argument order. | 21 August 2020, 12:51:24 UTC |
d0dd71d | Alex Reinking | 21 August 2020, 12:47:01 UTC | Rename "HALIDE_" CMake variables to "Halide_" | 21 August 2020, 12:47:01 UTC |
bcdd14d | Steven Johnson | 20 August 2020, 22:40:49 UTC | Merge branch 'master' into remove-pipeline-context | 20 August 2020, 22:40:49 UTC |
eff14a1 | Steven Johnson | 20 August 2020, 22:40:39 UTC | Merge branch 'master' into remove-mmap2 | 20 August 2020, 22:40:39 UTC |
25f8231 | Steven Johnson | 20 August 2020, 22:37:30 UTC | Allow compile_to_multitarget() to emit object files (#5183) * Allow compile_to_multitarget() to emit object files (Issue #5169) * Update Module.cpp * Update Module.cpp * Smarten compile_to_multitarget * c_source should be single, not multi * Fix apps/linear_algebra * Revert "Fix apps/linear_algebra" This reverts commit 01c15b40c86893ae30820dec33e7057f85b15bc6. * Update Module.cpp * Fixes * Don't substitute _ for - Co-authored-by: Alex Reinking <alex.reinking@gmail.com> | 20 August 2020, 22:37:30 UTC |
38df2b7 | Steven Johnson | 19 August 2020, 18:30:28 UTC | Merge branch 'master' into remove-mmap2 | 19 August 2020, 18:30:28 UTC |
65f1422 | Steven Johnson | 19 August 2020, 18:30:18 UTC | Merge branch 'master' into remove-pipeline-context | 19 August 2020, 18:30:18 UTC |
d1f34da | Steven Johnson | 19 August 2020, 18:29:04 UTC | fix for trunk llvm, try #2 (#5198) Previous fix broke LLVM 11 (I was too eager to land, sorry) | 19 August 2020, 18:29:04 UTC |
640214d | Steven Johnson | 19 August 2020, 18:13:10 UTC | Fix for trunk LLVM (#5197) | 19 August 2020, 18:13:10 UTC |
e63b996 | Ankit Aggarwal | 19 August 2020, 17:45:25 UTC | [Hexagon] Fix for LUT32 correctness. Lower bound for lut needs to > -1 instead of >= -1. The bug was introduced in b7dd787c4532d759334f0a2348c993af5e21f152 commit. | 19 August 2020, 17:56:10 UTC |
cd5ebc1 | dsharletg | 18 August 2020, 04:26:06 UTC | Remove mmap_dlopen. | 18 August 2020, 04:26:06 UTC |
8db3c9c | dsharletg | 18 August 2020, 03:19:37 UTC | Remove PipelineContext. | 18 August 2020, 03:19:37 UTC |
ef37487 | Volodymyr Kysenko | 17 August 2020, 18:02:27 UTC | Merge pull request #5185 from halide/vksnk/compute_with_store_at Fix #5178: Fix the case when functions from the fused group have different store_levels | 17 August 2020, 18:02:27 UTC |
a0d3237 | Andrew Adams | 17 August 2020, 17:14:40 UTC | Merge pull request #5187 from halide/abadams/reschedule_bgu Reschedule BGU to fix performance regression | 17 August 2020, 17:14:40 UTC |
9669817 | Andrew Adams | 16 August 2020, 20:54:08 UTC | Reschedule BGU to fix performance regression BGU on CUDA had regressed from its stated performance due to the atomic floating point adds being compiled to CAS loops due to complex indexing expressions diverging on the LHS and RHS of the +=. Inlining less stuff into the += operations makes it succeed again, and the schedule was improved with a few other tweaks. Longer-term we need a first-class way to represent += so that we're not sensitive to this sort of divergence. | 16 August 2020, 20:54:08 UTC |
e280037 | Volodymyr Kysenko | 15 August 2020, 02:35:42 UTC | Handle the case when the same function is build multiple times | 15 August 2020, 02:35:42 UTC |
051d674 | Volodymyr Kysenko | 15 August 2020, 00:42:44 UTC | make format | 15 August 2020, 00:42:44 UTC |
277b5db | Volodymyr Kysenko | 15 August 2020, 00:41:50 UTC | Fix the case when functions from the fused group have different store_levels | 15 August 2020, 00:41:50 UTC |
1234fad | Marcos Slomp | 14 August 2020, 22:09:15 UTC | adjusting Makefile | 14 August 2020, 22:09:15 UTC |
010f9b7 | Marcos Slomp | 14 August 2020, 21:33:36 UTC | ensure that WCHAR is always 2 bytes, since wchar_t could vary (Windows assumes 2 bytes, clang likes 4 bytes) | 14 August 2020, 21:51:23 UTC |
9f55e10 | Andrew Adams | 14 August 2020, 21:38:39 UTC | Merge pull request #5182 from halide/abadams/reschedule_stencil_chain Add memory staging to stencil chain | 14 August 2020, 21:38:39 UTC |
3008fa5 | Marcos Slomp | 14 August 2020, 21:32:12 UTC | purging the old d3d12 abi assembly stubs, and wrapping the d3d12compute runtime module on a windows/x86 specific module | 14 August 2020, 21:32:12 UTC |
3177019 | Steven Johnson | 14 August 2020, 21:05:40 UTC | Don't allow Target strings without complete arch-bits-os (#5181) * Don't allow Target strings without complete arch-bits-os We previously accepted 'incomplete' Target strings (filling in host attributes for arch-bits-os if unspecified); we thought this would be a convenience, but in practice, this is usually indicative of an error or typo. This changes to make the Target(string) ctor assert-fail if the resulting target has an unspecified arch-bits-os. * Update target.py * Update Target.cpp * Update Target.cpp | 14 August 2020, 21:05:40 UTC |
7365fc4 | Marcos Slomp | 14 August 2020, 21:04:23 UTC | improved trace logging | 14 August 2020, 21:04:23 UTC |
b7cf1a1 | Andrew Adams | 14 August 2020, 17:43:07 UTC | Merge branch 'abadams/reschedule_stencil_chain' of https://github.com/halide/Halide into abadams/reschedule_stencil_chain | 14 August 2020, 17:43:07 UTC |