https://github.com/halide/Halide

sort by:
Revision Author Date Message Commit Date
4170427 Remove dead split 02 September 2020, 22:32:24 UTC
a054a91 Reschedule depthwise separable conv again I figured out how to do full fusion of the depthwise stage into the pointwise stage. Got it down to about 65us 02 September 2020, 22:31:11 UTC
eb85b27 Fix tensorflow benchmarking methodology 02 September 2020, 22:28:55 UTC
12e5a4c Merge pull request #5233 from halide/abadams/fix_llvm_trunk Fix for llvm trunk 02 September 2020, 19:35:23 UTC
306c81f Better error handling 01 September 2020, 17:29:22 UTC
ab43e23 Fix for llvm trunk get_vector_num_elements has changed again due to SVE-related stuff 01 September 2020, 17:26:31 UTC
f2b2cba Thoroughly document CMake build (#5215) Turns README_cmake.md into a comprehensive guide to the three main usage stories of the Halide CMake build: 1. Compiling or packaging Halide from source. 2. Building Halide programs using the official CMake package. 3. Contributing to Halide and updating the build files. 31 August 2020, 22:31:13 UTC
44cc138 Merge pull request #5219 from halide/abadams/redo_output_assignment_error_messages Redo the error messages when assigning a Func to an Output to be more explicit 31 August 2020, 22:14:15 UTC
163a870 Merge branch 'master' into abadams/redo_output_assignment_error_messages 31 August 2020, 18:29:39 UTC
ddd6c57 Merge pull request #5214 from halide/abadams/precompute_shared_mem_size Exhaustively compute max allocation size on host for non-monotonic shared memory sizes 31 August 2020, 18:12:57 UTC
90f07ad Merge branch 'master' into abadams/redo_output_assignment_error_messages 31 August 2020, 17:05:06 UTC
c097aac Replace ANSI Win32 API calls with UTF-16; convert UTF-8 at boundary. Fixes #5223 (#5227) 31 August 2020, 06:16:39 UTC
d046701 Cmake/metatargets (#5218) * Small CMake fixes discovered while documenting. 1. add_halide_library should always produce a global target 2. HALIDE_ -> Halide_ for large codemodel 3. Remove redundant flags / useless options. 4. "YES" vs "ON" convention & grammar fix 5. Fix "cmake" -> "host" meta-target promotion * Rework `cmake` meta-target to not promote. Warn when used as default. 28 August 2020, 07:50:04 UTC
c78ccbe Merge pull request #5222 from halide/cmake/ninja-deps Fix incremental builds with Ninja 27 August 2020, 20:44:39 UTC
d324e8d Fix incremental builds with Ninja 27 August 2020, 09:50:49 UTC
65cfa48 Merge pull request #5220 from halide/runtime/exclude-abort fix lesson_15 test by excluding posix/windows_abort 27 August 2020, 07:54:26 UTC
9d7c65c Merge pull request #5191 from halide/remove-pipeline-context Remove PipelineContext 27 August 2020, 06:04:58 UTC
9903d2b Add comment explaining why we don't do dynamic tracking when no upper bound too 27 August 2020, 02:14:59 UTC
1b50c6c Fix search-replace run amok 27 August 2020, 02:05:37 UTC
82021fc fix lesson_15 test by excluding posix/windows_abort 27 August 2020, 02:01:20 UTC
52e15eb Redo the error messages when assigning a Func to an Output to be more explicit I got some feedback that these were confusing 26 August 2020, 22:39:01 UTC
6079327 Merge pull request #5212 from Infinoid/stringify-expr-rdom-rvar Teach the python bindings how to stringify Expr, RDom, RVar. 26 August 2020, 17:33:03 UTC
9e5cff9 Merge pull request #5211 from halide/abadams/depthwise_separable_conv Depthwise separable convolution 26 August 2020, 17:16:47 UTC
173b762 Merge pull request #5205 from halide/abadams/more_cuda_generations Add target flags for volta, turing, ansel. 26 August 2020, 17:16:09 UTC
affe01b Merge pull request #5208 from halide/abadams/avoid_name_mangling_in_cross_module_dependencies Avoid C++ name mangling issues when calling between runtime modules 25 August 2020, 21:16:15 UTC
13f388d Merge remote-tracking branch 'origin/master' into abadams/avoid_name_mangling_in_cross_module_dependencies 25 August 2020, 21:15:53 UTC
4fe90e6 Merge remote-tracking branch 'origin/master' into abadams/more_cuda_generations 25 August 2020, 21:15:10 UTC
a79172a Add another test case 25 August 2020, 19:56:49 UTC
b101595 Exhaustively compute max on host for non-monotonic shared memory sizes GPU kernel launches must use the same amount of shared memory per block, and this has to be computed ahead of time on the host. The expression that gives the size of the allocations compute_at blocks are inside the kernel though, and are a function of bounds inference. We therefore have to take the max of these sizes over all blocks. This is extremely prone to interval arithmetic being overconservative, because these are extents computed from a max minus a min, and the max and min are both frequently correlated with the block variable. This causes a lot of otherwise fine schedules to fail at runtime with CUDA_ERROR_INVALID_VALUE. This PR detects cases where interval arithmetic is going to be overconservative using is_monotonic, and hoists the computation of shared memory size to an explicit loop over blocks on the CPU, taking the max shared allocation size exhaustively. This implies some work on the CPU, but 1) A loop over blocks is typically at least 32x fewer iterations than the loop over pixels 2) This work can overlap with the previous kernel launch on the GPU still running 3) The alternative is crashing This feature has proved to make GPU schedules much more robust in the gpu autoscheduler branch, so I think we should promote it to master. It's a bit wild though, because this is the first instance I can think of where we inject a new unscheduled loop for some bounds inference purpose. 25 August 2020, 19:52:18 UTC
64fcd56 Rename some variables 25 August 2020, 19:20:51 UTC
59fb46d Re-enable GPU schedule 25 August 2020, 19:16:06 UTC
feb4bd6 Merge remote-tracking branch 'origin/master' into abadams/depthwise_separable_conv 25 August 2020, 17:30:15 UTC
092bb85 Add missing overload to python bindings 25 August 2020, 17:30:07 UTC
f291829 Teach the python bindings how to stringify Expr, RDom, RVar. 25 August 2020, 13:40:39 UTC
5ca9c9c More comments on the CPU schedule 25 August 2020, 00:28:50 UTC
679638f Add a depthwise separable conv layer app And a tensorflow reference. We're quite a bit faster than tensorflow. Also added an extra Func::tile overload that seemed missing. 25 August 2020, 00:24:34 UTC
bbf2d95 Reorder feature enum to put cuda capabilities together 24 August 2020, 18:16:29 UTC
f816911 Expose get_cuda_capability_lower_bound helper 24 August 2020, 18:10:42 UTC
8aa2308 Merge branch 'remove-pipeline-context' of https://github.com/halide/Halide into remove-pipeline-context 24 August 2020, 17:58:30 UTC
adaa769 Merge branch 'master' of https://github.com/halide/Halide into remove-pipeline-context 24 August 2020, 17:58:00 UTC
211a4ef Merge pull request #5192 from halide/remove-mmap2 Remove mmap_dlopen from Hexagon runtime 24 August 2020, 17:37:45 UTC
9b7ff66 Merge pull request #5206 from halide/abadams/check_reorder_dups Check for duplicate vars in calls to reorder/reorder_storage 24 August 2020, 17:30:47 UTC
5ed5c86 Merge pull request #5209 from halide/bugfix/tutorial-test Fix expected files list for lesson 15 test. 24 August 2020, 08:12:16 UTC
9e98e13 Fix expected files list for lesson 15 test. 24 August 2020, 02:17:33 UTC
95adfce Avoid C++ name mangling issues when calling between runtime modules Different runtime modules are initially compiled to llvm bitcode with different target triples (to support stdcall stuff on windows without requiring a precompiled windows version of every single runtime module). These triples are unified before the modules are linked, but before that happens we need to avoid anything target-triple-sensitive done by clang when compiling to llvm assembly (e.g. structs that need padding bytes). One such issue is calling c++ functions across runtime modules. The name mangling could work out differently on both sides of the call. This PR removes all instances of this (at least the ones that go via runtime_internal.h) The functions declared in Halide::Runtime::Internal didn't seem any more internal than the extern "C" functions declared immediately above it, so I promoted those functions (e.g. halide_abort()) to there. 24 August 2020, 01:03:08 UTC
8cee0da Check for duplicate vars in calls to reorder/reorder_storage 23 August 2020, 21:39:07 UTC
5dea096 Always copy-to-host before conversion 23 August 2020, 21:20:51 UTC
d9ab698 reschedule hist 23 August 2020, 21:20:43 UTC
d112f1c Slight schedule fix for stencil chain 23 August 2020, 19:38:58 UTC
9d42b3a Better schedule for harris 23 August 2020, 19:38:20 UTC
7281218 Add support for cuda generations volta, turing, and ansel This doesn't actually change much of anything, as the driver's ptx jit compiler generates code for the appropriate arch already. It does seem to result in different codegen in one or two cases though. 23 August 2020, 19:37:06 UTC
73c81c4 Merge pull request #5186 from halide/cmake/generator-objs Teach add_halide_library about cross compilation 21 August 2020, 21:37:24 UTC
df8a55f Improvements to add_halide_library. Generator tests build clean-up. * When not cross-compiling, add_halide_library creates STATIC, not IMPORTED libraries. * Otherwise, creates IMPORTED libraries * Add `cmake` pseudo-target to ensure compatibility with active CMake toolchain. * Clean up generator tests build and add missing tests. * Extend CMake file lists presubmit check to look for a mention of every file in the folder. 21 August 2020, 13:05:52 UTC
a60e0aa Export Halide_HOST_TARGET in distribution packages. 21 August 2020, 12:54:46 UTC
6f3f13a Use VERBATIM on custom commands. Normalize argument order. 21 August 2020, 12:51:24 UTC
d0dd71d Rename "HALIDE_" CMake variables to "Halide_" 21 August 2020, 12:47:01 UTC
bcdd14d Merge branch 'master' into remove-pipeline-context 20 August 2020, 22:40:49 UTC
eff14a1 Merge branch 'master' into remove-mmap2 20 August 2020, 22:40:39 UTC
25f8231 Allow compile_to_multitarget() to emit object files (#5183) * Allow compile_to_multitarget() to emit object files (Issue #5169) * Update Module.cpp * Update Module.cpp * Smarten compile_to_multitarget * c_source should be single, not multi * Fix apps/linear_algebra * Revert "Fix apps/linear_algebra" This reverts commit 01c15b40c86893ae30820dec33e7057f85b15bc6. * Update Module.cpp * Fixes * Don't substitute _ for - Co-authored-by: Alex Reinking <alex.reinking@gmail.com> 20 August 2020, 22:37:30 UTC
38df2b7 Merge branch 'master' into remove-mmap2 19 August 2020, 18:30:28 UTC
65f1422 Merge branch 'master' into remove-pipeline-context 19 August 2020, 18:30:18 UTC
d1f34da fix for trunk llvm, try #2 (#5198) Previous fix broke LLVM 11 (I was too eager to land, sorry) 19 August 2020, 18:29:04 UTC
640214d Fix for trunk LLVM (#5197) 19 August 2020, 18:13:10 UTC
cd5ebc1 Remove mmap_dlopen. 18 August 2020, 04:26:06 UTC
8db3c9c Remove PipelineContext. 18 August 2020, 03:19:37 UTC
ef37487 Merge pull request #5185 from halide/vksnk/compute_with_store_at Fix #5178: Fix the case when functions from the fused group have different store_levels 17 August 2020, 18:02:27 UTC
a0d3237 Merge pull request #5187 from halide/abadams/reschedule_bgu Reschedule BGU to fix performance regression 17 August 2020, 17:14:40 UTC
9669817 Reschedule BGU to fix performance regression BGU on CUDA had regressed from its stated performance due to the atomic floating point adds being compiled to CAS loops due to complex indexing expressions diverging on the LHS and RHS of the +=. Inlining less stuff into the += operations makes it succeed again, and the schedule was improved with a few other tweaks. Longer-term we need a first-class way to represent += so that we're not sensitive to this sort of divergence. 16 August 2020, 20:54:08 UTC
e280037 Handle the case when the same function is build multiple times 15 August 2020, 02:35:42 UTC
051d674 make format 15 August 2020, 00:42:44 UTC
277b5db Fix the case when functions from the fused group have different store_levels 15 August 2020, 00:41:50 UTC
9f55e10 Merge pull request #5182 from halide/abadams/reschedule_stencil_chain Add memory staging to stencil chain 14 August 2020, 21:38:39 UTC
3177019 Don't allow Target strings without complete arch-bits-os (#5181) * Don't allow Target strings without complete arch-bits-os We previously accepted 'incomplete' Target strings (filling in host attributes for arch-bits-os if unspecified); we thought this would be a convenience, but in practice, this is usually indicative of an error or typo. This changes to make the Target(string) ctor assert-fail if the resulting target has an unspecified arch-bits-os. * Update target.py * Update Target.cpp * Update Target.cpp 14 August 2020, 21:05:40 UTC
b7cf1a1 Merge branch 'abadams/reschedule_stencil_chain' of https://github.com/halide/Halide into abadams/reschedule_stencil_chain 14 August 2020, 17:43:07 UTC
39c1a9a Explanatory comments for .in() usage 14 August 2020, 17:42:56 UTC
2a46538 Merge pull request #5184 from halide/abadams/fix_potential_gpu_deadlock Fix a source of GPU barrier deadlocks 14 August 2020, 16:06:12 UTC
cd9a0ae Merge branch 'master' into abadams/reschedule_stencil_chain 13 August 2020, 22:12:20 UTC
1d49c70 Merge pull request #5135 from halide/cpack Flesh out CPack packaging for releases. 13 August 2020, 20:47:47 UTC
e3606cc Fix GPU barrier deadlocks Partition loops shouldn't mess with serial loops containing thread barriers, potentially causing warp divergence and deadlock (seen in some obscure lens blur schedules). Also we were generating too many thread barriers in a branch where the base mutator class was accidentally always mutating something, so there's a change to FuseGPUThreadLoops to make it more bug-resistant. Without these additional barriers I have been unable to come up with a case where a barrier ends up somewhere that would deadlock, so no test. 13 August 2020, 18:41:13 UTC
98a116a Clean up is-jit-compiled checks in Pipeline (#5172) * Clean up is-jit-compiled checks in Pipeline Because WebAssembly is a special beast, the way it is 'jitted' is special, and the checks to avoid redundant jitting needed extra logic in compile_jit(). Unfortunately there was another place in Pipeline that also needed this special casing. This PR adds a `get_compiled_jit_target()` bottleneck to consolidate this. * defined() -> has_unknowns() 13 August 2020, 16:31:11 UTC
fa1abba Add infer_input_bounds(vector<int>) (#5174) * Add infer_input_bounds(vector<int>) Add a variant of infer_input_bounds() that takes an explicit vector of int, rather than the up-to-4-int version that is a holdover from the buffer_t days; deprecate the old version; convert all existing code to use the new one. Note that I'm using a new overload (with an initializer-list) as a way to subvert the mis-binding of `{}` and `{1}` to the deprecated function; this adds a trivial amount of overhead but (I think) allows us to ensure that converted code probably avoids the deprecated method. * Update Func.cpp 13 August 2020, 16:30:46 UTC
f6dcdde Merge pull request #5177 from halide/abadams/fix_stencil_chain_gpu_schedule Schedule last stage of stencil chain on GPU too 13 August 2020, 02:08:43 UTC
bc066f9 Add memory staging to stencil chain 12 August 2020, 22:45:47 UTC
f7528c2 Merge pull request #5176 from halide/srj-hvx-codegen Remove unnecessary call to halide.hexagon.pack.vh in CG_HVX 12 August 2020, 21:06:03 UTC
d1592d1 Merge pull request #5155 from halide/abadams/add_missing_boundary_condition_overload Add missing overload for boundary conditions on a buffer 12 August 2020, 17:08:02 UTC
2e2649f Merge remote-tracking branch 'origin/master' into abadams/add_missing_boundary_condition_overload 11 August 2020, 19:54:08 UTC
b8ad19f Schedule last stage of stencil chain on GPU too 11 August 2020, 19:12:57 UTC
528b46b Remove unnecessary call to halide.hexagon.pack.vh in CG_HVX In the degenerate case of shuffle_vector() calling vlut() to shuffle a vector that is wider than 256 elements, the code was incorrectly using halide.hexagon.pack.vh on a vector-of-bool; this used to be necessary, but hasn't been for a while, so clearly this code path wasn't being exercised. Remove the halide.hexagon.pack.vh and added a test case to exercise that path. Also, drive-by removal of #include "EliminateBoolVectors.h" from CGHVX since it is no longer used there. 11 August 2020, 16:59:31 UTC
26b5be4 Update fft app to use new boundary condition syntax 09 August 2020, 16:05:40 UTC
52da814 Merge pull request #5165 from halide/abadams/rungen_set_host_dirty Make sure to set_host_dirty in rungen 09 August 2020, 00:46:32 UTC
aa92f5c Make sure to set_host_dirty in rungen Otherwise synthetic inputs like 'random' end up being 'zero' 08 August 2020, 18:36:47 UTC
41e10e0 Merge branch 'abadams/add_missing_boundary_condition_overload' of https://github.com/halide/Halide into abadams/add_missing_boundary_condition_overload 07 August 2020, 19:07:06 UTC
46ae5ff Fix boundary condition in blur app 07 August 2020, 19:06:57 UTC
49d0476 Merge pull request #5162 from halide/srj-wasm-shell-version Update WASM_SHELL_VERSION 07 August 2020, 04:31:25 UTC
640c324 Merge pull request #5161 from halide/srj-infer-input-bounds Add a Target to the args of infer_input_bounds() 07 August 2020, 04:31:10 UTC
cfc125d Merge branch 'master' into cpack 07 August 2020, 01:10:26 UTC
4e2c25f Merge pull request #5163 from halide/srj-blur-fix Fix apps/blur on Hexagon 07 August 2020, 01:08:47 UTC
62814b1 Add Windows support for bundling LLVM. 07 August 2020, 00:54:55 UTC
8127ba9 Merge branch 'master' into abadams/add_missing_boundary_condition_overload 06 August 2020, 23:22:07 UTC
32c6fdb Merge pull request #5158 from halide/abadams/fix_nl_means_estimates Fix incorrect estimates for nl_means autoscheduler 06 August 2020, 23:21:30 UTC
back to top