swh:1:snp:70f530b74f5be73cfb71c212c9e3317ce44c1ebc

sort by:
Revision Author Date Message Commit Date
0bcd416 Merge pull request #5240 from halide/abadams/fully_fused_depthwise_separable_conv Reschedule depthwise separable convolution again 03 September 2020, 17:40:20 UTC
4170427 Remove dead split 02 September 2020, 22:32:24 UTC
a054a91 Reschedule depthwise separable conv again I figured out how to do full fusion of the depthwise stage into the pointwise stage. Got it down to about 65us 02 September 2020, 22:31:11 UTC
eb85b27 Fix tensorflow benchmarking methodology 02 September 2020, 22:28:55 UTC
c1916ae Merge branch 'master' into abadams/fix_scatter_intrinsic_usage 02 September 2020, 21:27:33 UTC
3756878 Merge branch 'master' into abadams/dont_lift_strict_float 02 September 2020, 21:27:18 UTC
ab6ae3e Merge pull request #5217 from halide/slomp/d3d12-abi-patch ABI fix for D3D12 02 September 2020, 20:02:42 UTC
12e5a4c Merge pull request #5233 from halide/abadams/fix_llvm_trunk Fix for llvm trunk 02 September 2020, 19:35:23 UTC
8ba64e7 Add support for Func::async in Python bindings The Python bindings were missing a wrapper for `Func::async`. This change adds a wrapper called `Func.async_` to avoid clashes with the Python `async` keyword. 02 September 2020, 12:32:03 UTC
c1acea8 Merge branch 'master' into slomp/d3d12-abi-patch 01 September 2020, 22:45:51 UTC
b7a88f0 Clean up scatter/gather hvx intrinsics I was very surprised to see scatter/gather intrinsics, because that would be a substantial IR design change - currently we don't use intrinsics for anything that mixes vector lanes (i.e. intrinsics should vectorize trivially, and we should be able to slice up wide vectors into narrow vectors trivially when using intrinsics). These are only used in the HVX backend though, so I guess within that restricted scope it's reasonable, because they're handled specially. I prefixed them with hvx_ to make this clearer and modified some code to avoid string comparisons. 01 September 2020, 18:22:25 UTC
3338d78 Add more free intrinsics 01 September 2020, 17:33:34 UTC
306c81f Better error handling 01 September 2020, 17:29:22 UTC
ab43e23 Fix for llvm trunk get_vector_num_elements has changed again due to SVE-related stuff 01 September 2020, 17:26:31 UTC
887a149 Teach LICM to not view strict_float as work Fixes #5230 01 September 2020, 06:35:55 UTC
f2b2cba Thoroughly document CMake build (#5215) Turns README_cmake.md into a comprehensive guide to the three main usage stories of the Halide CMake build: 1. Compiling or packaging Halide from source. 2. Building Halide programs using the official CMake package. 3. Contributing to Halide and updating the build files. 31 August 2020, 22:31:13 UTC
3312604 Merge branch 'master' into slomp/d3d12-abi-patch 31 August 2020, 22:14:45 UTC
44cc138 Merge pull request #5219 from halide/abadams/redo_output_assignment_error_messages Redo the error messages when assigning a Func to an Output to be more explicit 31 August 2020, 22:14:15 UTC
163a870 Merge branch 'master' into abadams/redo_output_assignment_error_messages 31 August 2020, 18:29:39 UTC
3a52d9f Merge branch 'master' into slomp/d3d12-abi-patch 31 August 2020, 18:29:18 UTC
ddd6c57 Merge pull request #5214 from halide/abadams/precompute_shared_mem_size Exhaustively compute max allocation size on host for non-monotonic shared memory sizes 31 August 2020, 18:12:57 UTC
90f07ad Merge branch 'master' into abadams/redo_output_assignment_error_messages 31 August 2020, 17:05:06 UTC
c097aac Replace ANSI Win32 API calls with UTF-16; convert UTF-8 at boundary. Fixes #5223 (#5227) 31 August 2020, 06:16:39 UTC
c3fecd5 Merge remote-tracking branch 'origin/master' into slomp/d3d12-abi-patch 28 August 2020, 15:36:37 UTC
d046701 Cmake/metatargets (#5218) * Small CMake fixes discovered while documenting. 1. add_halide_library should always produce a global target 2. HALIDE_ -> Halide_ for large codemodel 3. Remove redundant flags / useless options. 4. "YES" vs "ON" convention & grammar fix 5. Fix "cmake" -> "host" meta-target promotion * Rework `cmake` meta-target to not promote. Warn when used as default. 28 August 2020, 07:50:04 UTC
aff1639 Merge branch 'master' into slomp/d3d12-abi-patch 27 August 2020, 20:53:29 UTC
c78ccbe Merge pull request #5222 from halide/cmake/ninja-deps Fix incremental builds with Ninja 27 August 2020, 20:44:39 UTC
d324e8d Fix incremental builds with Ninja 27 August 2020, 09:50:49 UTC
65cfa48 Merge pull request #5220 from halide/runtime/exclude-abort fix lesson_15 test by excluding posix/windows_abort 27 August 2020, 07:54:26 UTC
9d7c65c Merge pull request #5191 from halide/remove-pipeline-context Remove PipelineContext 27 August 2020, 06:04:58 UTC
9903d2b Add comment explaining why we don't do dynamic tracking when no upper bound too 27 August 2020, 02:14:59 UTC
1b50c6c Fix search-replace run amok 27 August 2020, 02:05:37 UTC
82021fc fix lesson_15 test by excluding posix/windows_abort 27 August 2020, 02:01:20 UTC
52e15eb Redo the error messages when assigning a Func to an Output to be more explicit I got some feedback that these were confusing 26 August 2020, 22:39:01 UTC
da1d605 addressing clang-format 26 August 2020, 21:28:33 UTC
afc8df1 addressing code review comments 26 August 2020, 21:24:03 UTC
6079327 Merge pull request #5212 from Infinoid/stringify-expr-rdom-rvar Teach the python bindings how to stringify Expr, RDom, RVar. 26 August 2020, 17:33:03 UTC
e6cdb69 Merge remote-tracking branch 'remotes/origin/master' into halide-builder/origin/slomp/d3d12-abi-patch 26 August 2020, 17:18:00 UTC
9e5cff9 Merge pull request #5211 from halide/abadams/depthwise_separable_conv Depthwise separable convolution 26 August 2020, 17:16:47 UTC
173b762 Merge pull request #5205 from halide/abadams/more_cuda_generations Add target flags for volta, turing, ansel. 26 August 2020, 17:16:09 UTC
affe01b Merge pull request #5208 from halide/abadams/avoid_name_mangling_in_cross_module_dependencies Avoid C++ name mangling issues when calling between runtime modules 25 August 2020, 21:16:15 UTC
13f388d Merge remote-tracking branch 'origin/master' into abadams/avoid_name_mangling_in_cross_module_dependencies 25 August 2020, 21:15:53 UTC
4fe90e6 Merge remote-tracking branch 'origin/master' into abadams/more_cuda_generations 25 August 2020, 21:15:10 UTC
a79172a Add another test case 25 August 2020, 19:56:49 UTC
b101595 Exhaustively compute max on host for non-monotonic shared memory sizes GPU kernel launches must use the same amount of shared memory per block, and this has to be computed ahead of time on the host. The expression that gives the size of the allocations compute_at blocks are inside the kernel though, and are a function of bounds inference. We therefore have to take the max of these sizes over all blocks. This is extremely prone to interval arithmetic being overconservative, because these are extents computed from a max minus a min, and the max and min are both frequently correlated with the block variable. This causes a lot of otherwise fine schedules to fail at runtime with CUDA_ERROR_INVALID_VALUE. This PR detects cases where interval arithmetic is going to be overconservative using is_monotonic, and hoists the computation of shared memory size to an explicit loop over blocks on the CPU, taking the max shared allocation size exhaustively. This implies some work on the CPU, but 1) A loop over blocks is typically at least 32x fewer iterations than the loop over pixels 2) This work can overlap with the previous kernel launch on the GPU still running 3) The alternative is crashing This feature has proved to make GPU schedules much more robust in the gpu autoscheduler branch, so I think we should promote it to master. It's a bit wild though, because this is the first instance I can think of where we inject a new unscheduled loop for some bounds inference purpose. 25 August 2020, 19:52:18 UTC
64fcd56 Rename some variables 25 August 2020, 19:20:51 UTC
59fb46d Re-enable GPU schedule 25 August 2020, 19:16:06 UTC
feb4bd6 Merge remote-tracking branch 'origin/master' into abadams/depthwise_separable_conv 25 August 2020, 17:30:15 UTC
092bb85 Add missing overload to python bindings 25 August 2020, 17:30:07 UTC
f291829 Teach the python bindings how to stringify Expr, RDom, RVar. 25 August 2020, 13:40:39 UTC
5ca9c9c More comments on the CPU schedule 25 August 2020, 00:28:50 UTC
679638f Add a depthwise separable conv layer app And a tensorflow reference. We're quite a bit faster than tensorflow. Also added an extra Func::tile overload that seemed missing. 25 August 2020, 00:24:34 UTC
7e06416 Merge remote-tracking branch 'remotes/origin/abadams/avoid_name_mangling_in_cross_module_dependencies' into halide-builder/origin/slomp/d3d12-abi-patch 24 August 2020, 19:00:36 UTC
bbf2d95 Reorder feature enum to put cuda capabilities together 24 August 2020, 18:16:29 UTC
f816911 Expose get_cuda_capability_lower_bound helper 24 August 2020, 18:10:42 UTC
8aa2308 Merge branch 'remove-pipeline-context' of https://github.com/halide/Halide into remove-pipeline-context 24 August 2020, 17:58:30 UTC
adaa769 Merge branch 'master' of https://github.com/halide/Halide into remove-pipeline-context 24 August 2020, 17:58:00 UTC
211a4ef Merge pull request #5192 from halide/remove-mmap2 Remove mmap_dlopen from Hexagon runtime 24 August 2020, 17:37:45 UTC
9b7ff66 Merge pull request #5206 from halide/abadams/check_reorder_dups Check for duplicate vars in calls to reorder/reorder_storage 24 August 2020, 17:30:47 UTC
5ed5c86 Merge pull request #5209 from halide/bugfix/tutorial-test Fix expected files list for lesson 15 test. 24 August 2020, 08:12:16 UTC
9e98e13 Fix expected files list for lesson 15 test. 24 August 2020, 02:17:33 UTC
95adfce Avoid C++ name mangling issues when calling between runtime modules Different runtime modules are initially compiled to llvm bitcode with different target triples (to support stdcall stuff on windows without requiring a precompiled windows version of every single runtime module). These triples are unified before the modules are linked, but before that happens we need to avoid anything target-triple-sensitive done by clang when compiling to llvm assembly (e.g. structs that need padding bytes). One such issue is calling c++ functions across runtime modules. The name mangling could work out differently on both sides of the call. This PR removes all instances of this (at least the ones that go via runtime_internal.h) The functions declared in Halide::Runtime::Internal didn't seem any more internal than the extern "C" functions declared immediately above it, so I promoted those functions (e.g. halide_abort()) to there. 24 August 2020, 01:03:08 UTC
8cee0da Check for duplicate vars in calls to reorder/reorder_storage 23 August 2020, 21:39:07 UTC
5dea096 Always copy-to-host before conversion 23 August 2020, 21:20:51 UTC
d9ab698 reschedule hist 23 August 2020, 21:20:43 UTC
d112f1c Slight schedule fix for stencil chain 23 August 2020, 19:38:58 UTC
9d42b3a Better schedule for harris 23 August 2020, 19:38:20 UTC
7281218 Add support for cuda generations volta, turing, and ansel This doesn't actually change much of anything, as the driver's ptx jit compiler generates code for the appropriate arch already. It does seem to result in different codegen in one or two cases though. 23 August 2020, 19:37:06 UTC
73c81c4 Merge pull request #5186 from halide/cmake/generator-objs Teach add_halide_library about cross compilation 21 August 2020, 21:37:24 UTC
df8a55f Improvements to add_halide_library. Generator tests build clean-up. * When not cross-compiling, add_halide_library creates STATIC, not IMPORTED libraries. * Otherwise, creates IMPORTED libraries * Add `cmake` pseudo-target to ensure compatibility with active CMake toolchain. * Clean up generator tests build and add missing tests. * Extend CMake file lists presubmit check to look for a mention of every file in the folder. 21 August 2020, 13:05:52 UTC
a60e0aa Export Halide_HOST_TARGET in distribution packages. 21 August 2020, 12:54:46 UTC
6f3f13a Use VERBATIM on custom commands. Normalize argument order. 21 August 2020, 12:51:24 UTC
d0dd71d Rename "HALIDE_" CMake variables to "Halide_" 21 August 2020, 12:47:01 UTC
bcdd14d Merge branch 'master' into remove-pipeline-context 20 August 2020, 22:40:49 UTC
eff14a1 Merge branch 'master' into remove-mmap2 20 August 2020, 22:40:39 UTC
25f8231 Allow compile_to_multitarget() to emit object files (#5183) * Allow compile_to_multitarget() to emit object files (Issue #5169) * Update Module.cpp * Update Module.cpp * Smarten compile_to_multitarget * c_source should be single, not multi * Fix apps/linear_algebra * Revert "Fix apps/linear_algebra" This reverts commit 01c15b40c86893ae30820dec33e7057f85b15bc6. * Update Module.cpp * Fixes * Don't substitute _ for - Co-authored-by: Alex Reinking <alex.reinking@gmail.com> 20 August 2020, 22:37:30 UTC
38df2b7 Merge branch 'master' into remove-mmap2 19 August 2020, 18:30:28 UTC
65f1422 Merge branch 'master' into remove-pipeline-context 19 August 2020, 18:30:18 UTC
d1f34da fix for trunk llvm, try #2 (#5198) Previous fix broke LLVM 11 (I was too eager to land, sorry) 19 August 2020, 18:29:04 UTC
640214d Fix for trunk LLVM (#5197) 19 August 2020, 18:13:10 UTC
e63b996 [Hexagon] Fix for LUT32 correctness. Lower bound for lut needs to > -1 instead of >= -1. The bug was introduced in b7dd787c4532d759334f0a2348c993af5e21f152 commit. 19 August 2020, 17:56:10 UTC
cd5ebc1 Remove mmap_dlopen. 18 August 2020, 04:26:06 UTC
8db3c9c Remove PipelineContext. 18 August 2020, 03:19:37 UTC
ef37487 Merge pull request #5185 from halide/vksnk/compute_with_store_at Fix #5178: Fix the case when functions from the fused group have different store_levels 17 August 2020, 18:02:27 UTC
a0d3237 Merge pull request #5187 from halide/abadams/reschedule_bgu Reschedule BGU to fix performance regression 17 August 2020, 17:14:40 UTC
9669817 Reschedule BGU to fix performance regression BGU on CUDA had regressed from its stated performance due to the atomic floating point adds being compiled to CAS loops due to complex indexing expressions diverging on the LHS and RHS of the +=. Inlining less stuff into the += operations makes it succeed again, and the schedule was improved with a few other tweaks. Longer-term we need a first-class way to represent += so that we're not sensitive to this sort of divergence. 16 August 2020, 20:54:08 UTC
e280037 Handle the case when the same function is build multiple times 15 August 2020, 02:35:42 UTC
051d674 make format 15 August 2020, 00:42:44 UTC
277b5db Fix the case when functions from the fused group have different store_levels 15 August 2020, 00:41:50 UTC
1234fad adjusting Makefile 14 August 2020, 22:09:15 UTC
010f9b7 ensure that WCHAR is always 2 bytes, since wchar_t could vary (Windows assumes 2 bytes, clang likes 4 bytes) 14 August 2020, 21:51:23 UTC
9f55e10 Merge pull request #5182 from halide/abadams/reschedule_stencil_chain Add memory staging to stencil chain 14 August 2020, 21:38:39 UTC
3008fa5 purging the old d3d12 abi assembly stubs, and wrapping the d3d12compute runtime module on a windows/x86 specific module 14 August 2020, 21:32:12 UTC
3177019 Don't allow Target strings without complete arch-bits-os (#5181) * Don't allow Target strings without complete arch-bits-os We previously accepted 'incomplete' Target strings (filling in host attributes for arch-bits-os if unspecified); we thought this would be a convenience, but in practice, this is usually indicative of an error or typo. This changes to make the Target(string) ctor assert-fail if the resulting target has an unspecified arch-bits-os. * Update target.py * Update Target.cpp * Update Target.cpp 14 August 2020, 21:05:40 UTC
7365fc4 improved trace logging 14 August 2020, 21:04:23 UTC
b7cf1a1 Merge branch 'abadams/reschedule_stencil_chain' of https://github.com/halide/Halide into abadams/reschedule_stencil_chain 14 August 2020, 17:43:07 UTC
39c1a9a Explanatory comments for .in() usage 14 August 2020, 17:42:56 UTC
2a46538 Merge pull request #5184 from halide/abadams/fix_potential_gpu_deadlock Fix a source of GPU barrier deadlocks 14 August 2020, 16:06:12 UTC
cd9a0ae Merge branch 'master' into abadams/reschedule_stencil_chain 13 August 2020, 22:12:20 UTC
1d49c70 Merge pull request #5135 from halide/cpack Flesh out CPack packaging for releases. 13 August 2020, 20:47:47 UTC
back to top