https://github.com/halide/Halide

sort by:
Revision Author Date Message Commit Date
f7d1a8f Merge branch 'master' into abadams/fix_cuda_mat_mul_assert 19 June 2020, 02:09:18 UTC
361a3a7 Merge branch 'abadams/fix_cuda_mat_mul_assert' of https://github.com/halide/Halide into abadams/fix_cuda_mat_mul_assert 19 June 2020, 02:09:09 UTC
c53c7e8 Merge pull request #5054 from halide/srj-cublas Skip cublas on Windows (Issue #5053) 19 June 2020, 02:07:35 UTC
07834a5 Skip cublas on Windows (Issue #5053) 18 June 2020, 22:25:34 UTC
9a88535 Add debugging code to cuda mat mul app to figure out why it's crashing on windows 17 June 2020, 17:30:12 UTC
d7c99db Merge pull request #5045 from halide/srj-llvmfixer Fix for trunk LLVM API changes 17 June 2020, 16:04:41 UTC
02552dd Merge pull request #5042 from halide/shoaibkamil/arm64_windows Add preliminary AOT Windows ARM64 support 17 June 2020, 14:24:09 UTC
23fa7d0 Update Makefile 17 June 2020, 05:13:34 UTC
705d6e4 Fix for trunk LLVM API changes 17 June 2020, 00:37:13 UTC
61d0060 clang-format 16 June 2020, 20:31:31 UTC
b761bfe Add issue 16 June 2020, 20:30:18 UTC
cc90832 Merge branch 'abadams/fix_cuda_mat_mul_assert' of https://github.com/halide/Halide into abadams/fix_cuda_mat_mul_assert 16 June 2020, 20:27:30 UTC
377eb9c Touch 16 June 2020, 20:27:22 UTC
383ce61 Actually remove NDEBUG for real this time 16 June 2020, 20:27:19 UTC
c8aed70 Temporarily add prints to help diagnose crash on buildbots 16 June 2020, 20:27:19 UTC
4abfa1a Require asserts 16 June 2020, 20:27:19 UTC
340246a Merge remote-tracking branch 'origin/master' into shoaibkamil/arm64_windows 16 June 2020, 18:29:03 UTC
4905851 Merge branch 'master' into abadams/fix_cuda_mat_mul_assert 16 June 2020, 17:22:10 UTC
c7098f8 Merge pull request #5035 from halide/abadams/improve_cuda_mat_mul It's worth cancelling correlated subexpressions in load/store indices 16 June 2020, 16:25:50 UTC
19ef844 Merge pull request #5037 from halide/abadams/openglcompute_loop_invariants Put buffers before other uniforms in gl uniform list 16 June 2020, 16:25:32 UTC
b9fa8bf Merge pull request #5038 from halide/srj-copyto Clarify debug logging in copy_to_device() 16 June 2020, 16:25:03 UTC
f147d7b Improve comment on simplify correlated differences 16 June 2020, 00:20:45 UTC
75aa213 It's worth cancelling correlated subexpressions in load/store indices In particular, this makes warp shuffles much more reliable, because any dependence of a load or store index on the block id is more likely to get cancelled out. This PR massively simplifies the generated code for cuda_mat_mul, and makes it about 30% faster (although it's still mysteriously 2x slower than cublas on my card). Also reduces the amount of IR in some other apps very slightly. Doesn't seem to affect compile times. 16 June 2020, 00:20:45 UTC
70b3b75 Clarify debug logging in copy_to_device() We currently always call copy_to_device() on buffers that need to be on device (with the understanding that it's a no-op if no copy is needed); if the `debug` feature is on, a naive reading might make someone think that needless copy-to-device operations are actually happening. This adds a bit of logging (debug mode only) to make it clearer whether the copy to device actually happened, or if it was skipped because host was not dirty. 15 June 2020, 23:11:26 UTC
13ad60f Actually remove NDEBUG for real this time 15 June 2020, 22:53:32 UTC
a6634b6 Add extra comment about why buffers come first 15 June 2020, 22:35:50 UTC
42f66da Put buffers before other uniforms in gl uniform list buffer ids are constrained to be smaller than arbitrary scalar uniforms, so they should go first in the closure. Also added a stress-test for lifting out lots of loop invariants, and disabled LICM completely for GLSL, because it uses magic names (.varying) for some things. 15 June 2020, 22:30:37 UTC
45e35d1 Not a function call 15 June 2020, 17:04:34 UTC
638ac11 Merge pull request #5033 from halide/srj-tsan-fix Fix broken TSAN code 15 June 2020, 16:49:06 UTC
edda3c2 Merge remote-tracking branch 'origin/master' into shoaibkamil/arm64_windows 15 June 2020, 16:26:59 UTC
64467ba Merge branch 'master' into shoaibkamil/arm64_windows 15 June 2020, 16:26:49 UTC
d7d1dac Merge pull request #5025 from halide/shoaibkamil/correct_memory_fences Make gpu_thread_barrier() semantics consistent 15 June 2020, 16:26:30 UTC
f0d9b6d Temporarily add prints to help diagnose crash on buildbots 13 June 2020, 17:26:51 UTC
2011720 Merge pull request #5032 from halide/abadams/atomic_vectorization_tweaks atomic vectorization tweaks 13 June 2020, 02:00:52 UTC
e0f9583 Require asserts 13 June 2020, 00:28:18 UTC
cacda0e Merge remote-tracking branch 'origin/abadams/fix_cuda_mat_mul_assert' into shoaibkamil/correct_memory_fences 12 June 2020, 21:05:45 UTC
55dac45 Fix inverted assert 12 June 2020, 20:20:32 UTC
79c3873 Fix broken TSAN code Update for a recent LLVM change was incorrect; it compiled but didn't actually work properly. (We should probably run sanitizers on the buildbots...) 12 June 2020, 19:25:58 UTC
1328084 Merge remote-tracking branch 'origin/master' into shoaibkamil/correct_memory_fences 12 June 2020, 19:04:50 UTC
bbe4acf Merge pull request #5030 from halide/abadams/licm_on_innermost_loop_bodies_too Don't lift constant integer offsets 12 June 2020, 17:50:41 UTC
0bc3070 Merge remote-tracking branch 'origin/master' into abadams/atomic_vectorization_tweaks 12 June 2020, 17:49:01 UTC
d9795ee Merge pull request #5031 from halide/srj-comdat Fix some "MachO doesn't support COMDAT" issues in runtime 12 June 2020, 17:46:59 UTC
a6ec01a Try to work around MSL compiler stupidity. 12 June 2020, 16:42:50 UTC
195bcbc Merge remote-tracking branch 'origin/master' into shoaibkamil/correct_memory_fences 12 June 2020, 16:19:49 UTC
887eacc Merge pull request #5029 from halide/abadams/fix_associativity Make it harder for the associativity test to get confused 12 June 2020, 05:14:28 UTC
7caecf2 Pass LLVM_VERSION to tests 11 June 2020, 22:43:35 UTC
471e882 More verbose error in CSE 11 June 2020, 22:43:35 UTC
67bbbd3 Add comment to AddAtomicMutex 11 June 2020, 22:43:35 UTC
8ca7602 Add 16-bit float to associative ops table 11 June 2020, 22:43:35 UTC
f681742 Simplify some code in deinterleave 11 June 2020, 22:43:35 UTC
d0584b3 Permit fusing pure and impure rvars 11 June 2020, 22:43:35 UTC
394536f Make lossless_cast more aggressive 11 June 2020, 22:43:24 UTC
cec8290 Fix some "MachO doesn't support COMDAT" issues in runtime Runtime code that will be instantiated for OSX/iOS needs to ensure that there are no plain 'inline' functions -- they must be either WEAK or __attribute__((always_inline)) -- otherwise, some compiler configurations can produce the error above. (Note that this also applies to member functions that are defined inline, even without an explicit 'inline' keyword). (Note also that the vagaries of C++ mean that declaring a ctor implies that a dtor will be auto-created; in some of these we must explicitly declare the dtor so that it too is always-inlined, even if it is empty...) 11 June 2020, 22:20:39 UTC
979d701 Don't lift constant integer offsets 11 June 2020, 21:44:58 UTC
d755381 Stop copying strings 11 June 2020, 20:59:01 UTC
1483793 Address review comments, make algorithm simpler. 11 June 2020, 20:48:45 UTC
4e55416 Make it harder for the associativity test to get confused 11 June 2020, 19:29:38 UTC
8cddb2e Merge pull request #5027 from halide/srj-absd Fix codegen for absd() in GLSLBase 11 June 2020, 17:14:55 UTC
3d19643 Merge pull request #5026 from halide/srj-glsl Combine visit(Cast) for GLSL and OpenGLCompute 11 June 2020, 17:14:42 UTC
665001d Partially address reviewer comments 11 June 2020, 15:59:37 UTC
3721fcb Merge pull request #5028 from halide/srj-appv Enable verbosity in apps builds 11 June 2020, 00:50:22 UTC
53e8ab6 Also add --output-on-failure 11 June 2020, 00:31:56 UTC
9e780af Enable verbosity in apps builds Hoping this will help us track down flaky Windows failures. 11 June 2020, 00:22:27 UTC
5c52122 Merge pull request #5023 from halide/abadams/more_simplifier_rules New simplifier rules necessary for the gpu autoscheduler 11 June 2020, 00:20:59 UTC
4a2216d Fix codegen for absd() in GLSLBase It was emitting as a float, which is *never* correct, since absd() is only used for int or uint types. (This happened to work before because GLSL was previously also incorrectly using float for uint in some cases.) Also did a drive-by removal of code in Codegen_C that recapitulated the logic from IROperator.cpp; maybe the type field of absd() was incorrect at some point in the past, but this calculation seems redundant and wrong now. 10 June 2020, 22:25:06 UTC
ab1c53e Combine visit(Cast) for GLSL and OpenGLCompute These are the only two overrides of `visit(Cast) from GLSLBase and they both have identical implementations; combine them into one and move into GLSLBase to save code. 10 June 2020, 22:15:08 UTC
17f0176 clang-format 10 June 2020, 20:22:29 UTC
724cd28 clang-format 10 June 2020, 20:19:25 UTC
d7225ae Tweak spacing 10 June 2020, 20:18:07 UTC
5f0ce89 Merge remote-tracking branch 'origin/master' into shoaibkamil/correct_memory_fences 10 June 2020, 20:11:55 UTC
75fe44a Slight change in D3D12 logic. 10 June 2020, 20:09:09 UTC
9cec5a5 New simplifier rules necessary for the gpu autoscheduler 10 June 2020, 17:20:36 UTC
8b9081b Merge pull request #5021 from halide/abadams/fewer_print_parentheses Fewer print parentheses 10 June 2020, 16:39:23 UTC
27b478e Minor 10 June 2020, 16:34:40 UTC
ca420a4 Checkpoint 10 June 2020, 15:36:34 UTC
ee5f90e Merge pull request #5015 from acolinisi/PR--cmake-llvm-dynlib-2 cmake: llvm: fix linking against LLVM shared lib 10 June 2020, 06:49:36 UTC
3609c63 Merge pull request #5022 from halide/wording_fix Small wording improvements. 10 June 2020, 06:36:02 UTC
a9ab983 Small wording improvements. 10 June 2020, 06:20:01 UTC
3ca5331 Fix test 10 June 2020, 00:19:58 UTC
8d304b4 Merge pull request #5013 from halide/abadams/fix_atomics_gpu_mutex Fix #5012 10 June 2020, 00:17:50 UTC
cbcbcfc Don't print so many parentheses in IRPrinter 10 June 2020, 00:15:32 UTC
dd27bdf Make gpu_thread_barrier() take a mask describing the type of memory fence required 09 June 2020, 20:38:16 UTC
c668a7f Merge branch 'master' into abadams/fix_atomics_gpu_mutex 09 June 2020, 16:43:49 UTC
fb941b6 Merge pull request #5010 from halide/abadams/d3d12_workaround_llvm_bug Large stack printer objects break llvm debuginfo 09 June 2020, 16:43:05 UTC
769ac5b Merge branch 'master' into abadams/fix_atomics_gpu_mutex 09 June 2020, 06:00:39 UTC
10166ad cmake: llvm: fix linking against LLVM shared lib Fixes the build with -DLLVM_USE_SHARED_LLVM_LIBRARY=ON to actually use the dynamic build (libLLVM.so). Relevant for LLVM builds with -DLLVM_BUILD_LLVM_DYLIB=ON. Tested with LLVM v10.0.0. There are two issues that prevent a build against LLVM shared lib: 1. cmake failure: CMake Error at /usr/lib/llvm/10/lib64/cmake/llvm/LLVM-Config.cmake:145 (message): Target NVPTX is not in the set of libraries. Call Stack (most recent call first): /usr/lib/llvm/10/lib64/cmake/llvm/LLVM-Config.cmake:270 (llvm_expand_pseudo_components) dependencies/llvm/CMakeLists.txt:141 (llvm_map_components_to_libnames) 2. missing -lLLVM on the link line Regarding issue 2: the result of missing -lLLVM on the link line can be either of the following scenarios both of which are wrong: (A) (in theory) a successful build but with all LLVM modules linked into libHalide.so, which is not what the user requested with LLVM_USE_SHARED_LLVM_LIBRARY=ON, (B) (observed) a broken libHalide.so, where linking of apps against it fails with undefined symbols errors related to target components (related to issue 1, see more below): /usr/lib/gcc/x86_64-pc-linux-gnu/10.1.0/../../../../x86_64-pc-linux-gnu/bin/ld: src/libHalide.so: undefined reference to `LLVMInitializeX86Target' Sidenote regarding (B), linking the app against -lLLVM makes the link succeed but fails at runtime, because then some static LLVM data is linked into the app binary twice, so components get loaded twice: : CommandLine Error: Option 'pm-max-devirt-iterations' registered more than once! LLVM ERROR: inconsistency in registered CommandLine options The above errors from (B) are symptomps that are not important, the fix is to just link libHalide.so against libLLVM.so. This commit provides two alternatives for achiving it: (a) use llvm_config() (b) woraround by manually doing what llvm_config() does I think the correct way is (a), however it cannot be used yet, because in lateset LLVM 10.0.0, llvm_config does not accept targets of type INTERFACE_LIBRARY (see attached patch for what it would take to add that support to LLVM). If/when LLVM starts supporting INTERFACE_LIBRARY, then (b) should be deleted in favor of (a). The problem with existing cmake code that tries to link against LLVM library(ies) was that it seemed to communicate the intent to LLVM-Config.cmake via LLVM_USE_SHARED var, but LLVM-Config doesn't use that var. Instead, the llvm_config method takes an argument option USE_SHARED. Regarding issue 1: appears to be caused by treating targets as components. Observed when LLVM is built into a shared library with: -DBUILD_SHARED_LIBS=OFF -DLLVM_BUILD_LLVM_DYLIB=ON -DLLVM_LINK_LLVM_DYLIB=ON -DLLVM_DISTRIBUTION_COMPONENTS="comp1;comp2;..." -DLLVM_TARGETS_TO_BUILD="" -DLLVM_EXPERIMENTAL_TARGETS_TO_BUILD="X86;NVPTX;...." Note that LLVM_DYLIB_COMPONENTS is NOT set explicitly, and I am not sure what exactly it defaults to, but the shared lib (libLLVM.so does end up containing all components selected by LLVM_DISTRIBUTION_COMPONENTS and all selected targets. (Not that it matters, but this is the LLVM config used in the package recipe in the Gentoo distribution.) The selected LLVM components do get built as static archives (e.g. LLVMSupport.a, etc.) and do get linked into libLLVM.so and do get installed into the system, however, there the static archives for target libraries (e.g. LLVMX86TargetInfo.a, etc) get linked but do NOT get installed into the system. So, when Halide cmake script asks for the target libraries as components (in call to llvm_map_components_to_libnames), that call fails), presumably because the corresponding static archives for the target libraries do not exist in the system. However, when linking against the LLVM shared library (as opposed to static archives), it is not necessary to specify the targets as components on the link line. The target components are linked into the shared object, along with the rest of the components, and specifying just the shared object is sufficient. Note that llvm_config implementation specifies both the shared object, followed by static archives for sake of fallback, so I kept that same logic. It would work equally well to only list LLVM without the static objects for the case with LLVM_USE_SHARED_LLVM_LIBRARY=ON. For reference, to enable approach (a), the patch to add support for targets of type INTERFACE_LIBRARY in llvm_config() to LLVM v10.0.0: --- a/cmake/modules/LLVM-Config.cmake 2020-06-08 19:36:44.804248189 -0000 +++ b/cmake/modules/LLVM-Config.cmake 2020-06-08 19:37:23.191439807 -0000 @@ -87,7 +87,13 @@ endif() endif() - target_link_libraries(${executable} PRIVATE LLVM) + get_target_property(t ${executable} TYPE) + if(t STREQUAL "INTERFACE_LIBRARY") + set(link_mode "INTERFACE") + else() + set(link_mode "PRIVATE") + endif() + target_link_libraries(${executable} ${link_mode} LLVM) endif() explicit_llvm_config(${executable} ${link_components}) @@ -99,7 +105,7 @@ llvm_map_components_to_libnames(LIBRARIES ${link_components}) get_target_property(t ${executable} TYPE) - if(t STREQUAL "STATIC_LIBRARY") + if(t STREQUAL "STATIC_LIBRARY" OR t STREQUAL "INTERFACE_LIBRARY") target_link_libraries(${executable} INTERFACE ${LIBRARIES}) elseif(t STREQUAL "EXECUTABLE" OR t STREQUAL "SHARED_LIBRARY" OR t STREQUAL "MODULE_LIBRARY") target_link_libraries(${executable} PRIVATE ${LIBRARIES}) 09 June 2020, 01:22:25 UTC
adeef5d Merge pull request #5014 from halide/abadams/fix_nvvm_shfl_up Fix args to nvvm.shfl.up 08 June 2020, 23:57:03 UTC
b7eaaf3 Merge branch 'abadams/d3d12_workaround_llvm_bug' of https://github.com/halide/Halide into abadams/d3d12_workaround_llvm_bug 08 June 2020, 21:39:49 UTC
1bfb698 Also fix OpenCL 08 June 2020, 21:39:42 UTC
a7caa22 Merge pull request #5003 from halide/shoaibkamil/fix_apps_d3d12compute Fix D3D12Compute apps tests 08 June 2020, 19:27:25 UTC
65a1bc2 Fix args to nvvm.shfl.up 08 June 2020, 19:18:02 UTC
0d6fe34 Use if-else instead of nested ternary 08 June 2020, 18:50:24 UTC
01b1a14 Fix #5012 08 June 2020, 18:45:58 UTC
2e5309a Delete StackPrinter and improve Printer instead 08 June 2020, 18:36:46 UTC
a79da6e Merge pull request #4990 from halide/cmake-pch-threads CMake PCH with threads 08 June 2020, 16:41:42 UTC
c275f3a clang-format 07 June 2020, 21:50:20 UTC
a0359cb Make thread-safe 07 June 2020, 21:44:18 UTC
52bf96d Large stack printer objects seem to cause problems with LLVM when debug info is on. Use alternative means. 07 June 2020, 21:38:34 UTC
941a576 Merge pull request #4997 from halide/abadams/better_split_tuples Lower tuples to separate atomic nodes where possible 06 June 2020, 17:34:15 UTC
b126ef7 Merge pull request #5007 from halide/abadams/dgemm_min_vector_size Fix linear algebra schedules for old x86 06 June 2020, 17:34:04 UTC
back to top