6b32fa2 | Andrew Adams | 24 June 2020, 01:42:10 UTC | Merge branch 'abadams/1v3_linear_comparison_cancellations' into abadams/diagnose_boundary_condition_failure | 24 June 2020, 01:42:10 UTC |
f95386b | Andrew Adams | 22 June 2020, 16:46:20 UTC | Test three hypotheses 1) llvm loop opts are messing things up 2) The auto-benchmarking code is running amok 3) We're rejitting every iteration | 24 June 2020, 00:51:13 UTC |
b44d1de | Andrew Adams | 20 June 2020, 23:40:32 UTC | Add debugging spew to help figure out why test is failing on buildbots | 24 June 2020, 00:51:13 UTC |
e3107d7 | Andrew Adams | 23 June 2020, 20:59:36 UTC | Add some missing 1 vs 3 linear comparison cancellations Somehow we were missing these. They're useful in canceling non-linear terms from both sides of a comparison. Pretty trivial, but formally verified anyway to protect us from typos. | 23 June 2020, 20:59:36 UTC |
69e320e | Shoaib Kamil | 23 June 2020, 18:49:04 UTC | Merge pull request #5061 from halide/shoaibkamil/metal_is_nan Add is_nan_f32 for metal | 23 June 2020, 18:49:04 UTC |
c0870ff | Steven Johnson | 23 June 2020, 18:19:37 UTC | Merge pull request #5065 from halide/abadams/better_error_message_when_no_distrib better error message if you try to build an app before libHalide | 23 June 2020, 18:19:37 UTC |
d716ea1 | Shoaib Kamil | 23 June 2020, 18:08:04 UTC | Merge remote-tracking branch 'origin/master' into shoaibkamil/metal_is_nan | 23 June 2020, 18:08:04 UTC |
3d45335 | Andrew Adams | 23 June 2020, 18:07:11 UTC | Remove debugging print | 23 June 2020, 18:07:11 UTC |
5ebe589 | Andrew Adams | 23 June 2020, 18:06:24 UTC | Merge pull request #5041 from halide/abadams/trim_no_ops_lift_loop_invariant_if_statements Add an explicit pass to lift loop invariant if statements | 23 June 2020, 18:06:24 UTC |
1bda178 | Shoaib Kamil | 23 June 2020, 17:45:31 UTC | Try to trigger buildbots | 23 June 2020, 17:45:31 UTC |
31546bd | Shoaib Kamil | 23 June 2020, 17:24:08 UTC | Add is_inf_f32/is_nan_f32/is_finite_f32 for D3D12Compute | 23 June 2020, 17:24:08 UTC |
7a156c4 | Shoaib Kamil | 23 June 2020, 14:00:10 UTC | clang-format | 23 June 2020, 14:00:10 UTC |
be6ed6e | Shoaib Kamil | 23 June 2020, 13:57:33 UTC | Add GPU version of test | 23 June 2020, 13:57:33 UTC |
2c83da3 | Andrew Adams | 22 June 2020, 21:09:46 UTC | Give a better error message if you try to build an app before building Halide Fixes #5060 | 22 June 2020, 21:09:46 UTC |
4bb7897 | Shoaib Kamil | 22 June 2020, 19:08:45 UTC | Add is_inf and is_finite as well | 22 June 2020, 19:08:45 UTC |
2748848 | Shoaib Kamil | 22 June 2020, 18:52:56 UTC | Add is_nan_f32 for metal. | 22 June 2020, 18:52:56 UTC |
8521896 | Andrew Adams | 21 June 2020, 03:32:16 UTC | Merge pull request #5049 from halide/abadams/fix_rval_reference_typo Fix #5046 | 21 June 2020, 03:32:16 UTC |
17eb851 | Steven Johnson | 20 June 2020, 22:43:01 UTC | Merge branch 'master' into abadams/trim_no_ops_lift_loop_invariant_if_statements | 20 June 2020, 22:43:01 UTC |
a59107f | Steven Johnson | 20 June 2020, 22:42:52 UTC | Merge branch 'abadams/trim_no_ops_lift_loop_invariant_if_statements' of https://github.com/halide/Halide into abadams/trim_no_ops_lift_loop_invariant_if_statements | 20 June 2020, 22:42:52 UTC |
9279fa5 | Steven Johnson | 20 June 2020, 22:42:28 UTC | Merge branch 'master' into abadams/fix_rval_reference_typo | 20 June 2020, 22:42:28 UTC |
24d7e97 | Steven Johnson | 20 June 2020, 22:41:07 UTC | Merge pull request #5057 from halide/srj-sig Minor JITExtern (& related) cleanups | 20 June 2020, 22:41:07 UTC |
23bc0dc | Steven Johnson | 19 June 2020, 23:47:01 UTC | Fixes | 19 June 2020, 23:47:01 UTC |
d70c6db | Andrew Adams | 19 June 2020, 23:42:30 UTC | Merge pull request #5036 from halide/abadams/store_in_register_with_no_lanes_loop Constant extents inferred pre-storage flattening | 19 June 2020, 23:42:30 UTC |
a98e04e | Andrew Adams | 15 June 2020, 22:03:01 UTC | Constant extents need to be inferred pre storage flattening Consider an allocation that has a dynamic extent, but needs to have a constant extent because it's stored in MemoryType::Register (e.g. see the test). We take an upper bound in these cases to get a constant allocation size. This PR changes things to take that upper bound *before* storage flattening instead of after. This way the individual per-dimension extents are all constant, instead of just their product. If you do it after storage flattening then you get dynamic strides within a constant-sized allocation, which is silly and not compatible with hoisting values into registers anyway (because access is at non-constant coords). Also fixed the assumption that MemoryType::Register on the GPU means that there must be a GPULanes loop. | 19 June 2020, 23:42:06 UTC |
47d8c30 | Steven Johnson | 19 June 2020, 23:28:19 UTC | Minor JITExtern (& related) cleanups - make all single-arg ctors explicit, and add one missing explicit usage - add an operator<< to ExternSignature to make debugging related issues easier | 19 June 2020, 23:28:19 UTC |
a31c39e | Andrew Adams | 16 June 2020, 20:42:36 UTC | Add an explicit pass to lift loop invariant if statements If statements can be injected by GuardWithIf, RDom predicates, specializations, and uses of undef. There are various situations where an if statement can end up further inside a loop nest than strictly necessary. This PR adds a pass to hoist them. This results in slightly better codegen for some conv layer schedules on GPU. Also reduced the expr count in lots_of_loop_invariants because it spends a long time inside LLVM | 19 June 2020, 23:17:08 UTC |
57e0b94 | Steven Johnson | 19 June 2020, 17:17:09 UTC | Touch | 19 June 2020, 17:17:09 UTC |
5f91893 | Steven Johnson | 19 June 2020, 17:16:33 UTC | Touch | 19 June 2020, 17:16:33 UTC |
54302b6 | Steven Johnson | 19 June 2020, 02:09:53 UTC | Merge branch 'master' into abadams/fix_rval_reference_typo | 19 June 2020, 02:09:53 UTC |
b668cd8 | Steven Johnson | 19 June 2020, 02:08:46 UTC | Merge branch 'master' into abadams/trim_no_ops_lift_loop_invariant_if_statements | 19 June 2020, 02:08:46 UTC |
ed40000 | Steven Johnson | 19 June 2020, 02:08:44 UTC | Merge branch 'abadams/trim_no_ops_lift_loop_invariant_if_statements' of https://github.com/halide/Halide into abadams/trim_no_ops_lift_loop_invariant_if_statements | 19 June 2020, 02:08:44 UTC |
c53c7e8 | Steven Johnson | 19 June 2020, 02:07:35 UTC | Merge pull request #5054 from halide/srj-cublas Skip cublas on Windows (Issue #5053) | 19 June 2020, 02:07:35 UTC |
07834a5 | Steven Johnson | 18 June 2020, 19:33:52 UTC | Skip cublas on Windows (Issue #5053) | 18 June 2020, 22:25:34 UTC |
5534e3f | Andrew Adams | 16 June 2020, 20:42:36 UTC | Add an explicit pass to lift loop invariant if statements If statements can be injected by GuardWithIf, RDom predicates, specializations, and uses of undef. There are various situations where an if statement can end up further inside a loop nest than strictly necessary. This PR adds a pass to hoist them. This results in slightly better codegen for some conv layer schedules on GPU. Also reduced the expr count in lots_of_loop_invariants because it spends a long time inside LLVM | 17 June 2020, 17:49:05 UTC |
a308308 | Andrew Adams | 17 June 2020, 17:32:20 UTC | Fix #5046 | 17 June 2020, 17:32:20 UTC |
4fc3606 | Steven Johnson | 17 June 2020, 16:05:12 UTC | Merge branch 'master' into abadams/trim_no_ops_lift_loop_invariant_if_statements | 17 June 2020, 16:05:12 UTC |
d7c99db | Steven Johnson | 17 June 2020, 16:04:41 UTC | Merge pull request #5045 from halide/srj-llvmfixer Fix for trunk LLVM API changes | 17 June 2020, 16:04:41 UTC |
02552dd | Shoaib Kamil | 17 June 2020, 14:24:09 UTC | Merge pull request #5042 from halide/shoaibkamil/arm64_windows Add preliminary AOT Windows ARM64 support | 17 June 2020, 14:24:09 UTC |
23fa7d0 | Steven Johnson | 17 June 2020, 05:13:34 UTC | Update Makefile | 17 June 2020, 05:13:34 UTC |
705d6e4 | Steven Johnson | 17 June 2020, 00:13:41 UTC | Fix for trunk LLVM API changes | 17 June 2020, 00:37:13 UTC |
347608d | Andrew Adams | 16 June 2020, 20:42:36 UTC | Add an explicit pass to lift loop invariant if statements If statements can be injected by GuardWithIf, RDom predicates, specializations, and uses of undef. There are various situations where an if statement can end up further inside a loop nest than strictly necessary. This PR adds a pass to hoist them. This results in slightly better codegen for some conv layer schedules on GPU. Also reduced the expr count in lots_of_loop_invariants because it spends a long time inside LLVM | 16 June 2020, 20:42:36 UTC |
61d0060 | Shoaib Kamil | 16 June 2020, 20:31:31 UTC | clang-format | 16 June 2020, 20:31:31 UTC |
b761bfe | Shoaib Kamil | 16 June 2020, 20:30:18 UTC | Add issue | 16 June 2020, 20:30:18 UTC |
340246a | Shoaib Kamil | 16 June 2020, 18:29:03 UTC | Merge remote-tracking branch 'origin/master' into shoaibkamil/arm64_windows | 16 June 2020, 18:29:03 UTC |
c7098f8 | Steven Johnson | 16 June 2020, 16:25:50 UTC | Merge pull request #5035 from halide/abadams/improve_cuda_mat_mul It's worth cancelling correlated subexpressions in load/store indices | 16 June 2020, 16:25:50 UTC |
19ef844 | Steven Johnson | 16 June 2020, 16:25:32 UTC | Merge pull request #5037 from halide/abadams/openglcompute_loop_invariants Put buffers before other uniforms in gl uniform list | 16 June 2020, 16:25:32 UTC |
b9fa8bf | Steven Johnson | 16 June 2020, 16:25:03 UTC | Merge pull request #5038 from halide/srj-copyto Clarify debug logging in copy_to_device() | 16 June 2020, 16:25:03 UTC |
f147d7b | Andrew Adams | 13 June 2020, 00:06:50 UTC | Improve comment on simplify correlated differences | 16 June 2020, 00:20:45 UTC |
75aa213 | Andrew Adams | 12 June 2020, 21:03:53 UTC | It's worth cancelling correlated subexpressions in load/store indices In particular, this makes warp shuffles much more reliable, because any dependence of a load or store index on the block id is more likely to get cancelled out. This PR massively simplifies the generated code for cuda_mat_mul, and makes it about 30% faster (although it's still mysteriously 2x slower than cublas on my card). Also reduces the amount of IR in some other apps very slightly. Doesn't seem to affect compile times. | 16 June 2020, 00:20:45 UTC |
70b3b75 | Steven Johnson | 15 June 2020, 23:11:26 UTC | Clarify debug logging in copy_to_device() We currently always call copy_to_device() on buffers that need to be on device (with the understanding that it's a no-op if no copy is needed); if the `debug` feature is on, a naive reading might make someone think that needless copy-to-device operations are actually happening. This adds a bit of logging (debug mode only) to make it clearer whether the copy to device actually happened, or if it was skipped because host was not dirty. | 15 June 2020, 23:11:26 UTC |
a6634b6 | Andrew Adams | 15 June 2020, 22:35:50 UTC | Add extra comment about why buffers come first | 15 June 2020, 22:35:50 UTC |
42f66da | Andrew Adams | 15 June 2020, 22:30:37 UTC | Put buffers before other uniforms in gl uniform list buffer ids are constrained to be smaller than arbitrary scalar uniforms, so they should go first in the closure. Also added a stress-test for lifting out lots of loop invariants, and disabled LICM completely for GLSL, because it uses magic names (.varying) for some things. | 15 June 2020, 22:30:37 UTC |
45e35d1 | Shoaib Kamil | 15 June 2020, 17:04:34 UTC | Not a function call | 15 June 2020, 17:04:34 UTC |
638ac11 | Steven Johnson | 15 June 2020, 16:49:06 UTC | Merge pull request #5033 from halide/srj-tsan-fix Fix broken TSAN code | 15 June 2020, 16:49:06 UTC |
edda3c2 | Shoaib Kamil | 15 June 2020, 16:26:59 UTC | Merge remote-tracking branch 'origin/master' into shoaibkamil/arm64_windows | 15 June 2020, 16:26:59 UTC |
64467ba | Shoaib Kamil | 15 June 2020, 16:26:49 UTC | Merge branch 'master' into shoaibkamil/arm64_windows | 15 June 2020, 16:26:49 UTC |
d7d1dac | Andrew Adams | 15 June 2020, 16:26:30 UTC | Merge pull request #5025 from halide/shoaibkamil/correct_memory_fences Make gpu_thread_barrier() semantics consistent | 15 June 2020, 16:26:30 UTC |
2011720 | Andrew Adams | 13 June 2020, 02:00:52 UTC | Merge pull request #5032 from halide/abadams/atomic_vectorization_tweaks atomic vectorization tweaks | 13 June 2020, 02:00:52 UTC |
cacda0e | Shoaib Kamil | 12 June 2020, 21:05:45 UTC | Merge remote-tracking branch 'origin/abadams/fix_cuda_mat_mul_assert' into shoaibkamil/correct_memory_fences | 12 June 2020, 21:05:45 UTC |
55dac45 | Andrew Adams | 12 June 2020, 20:20:32 UTC | Fix inverted assert | 12 June 2020, 20:20:32 UTC |
79c3873 | Steven Johnson | 12 June 2020, 19:25:58 UTC | Fix broken TSAN code Update for a recent LLVM change was incorrect; it compiled but didn't actually work properly. (We should probably run sanitizers on the buildbots...) | 12 June 2020, 19:25:58 UTC |
1328084 | Shoaib Kamil | 12 June 2020, 19:04:50 UTC | Merge remote-tracking branch 'origin/master' into shoaibkamil/correct_memory_fences | 12 June 2020, 19:04:50 UTC |
bbe4acf | Andrew Adams | 12 June 2020, 17:50:41 UTC | Merge pull request #5030 from halide/abadams/licm_on_innermost_loop_bodies_too Don't lift constant integer offsets | 12 June 2020, 17:50:41 UTC |
0bc3070 | Andrew Adams | 12 June 2020, 17:49:01 UTC | Merge remote-tracking branch 'origin/master' into abadams/atomic_vectorization_tweaks | 12 June 2020, 17:49:01 UTC |
d9795ee | Steven Johnson | 12 June 2020, 17:46:59 UTC | Merge pull request #5031 from halide/srj-comdat Fix some "MachO doesn't support COMDAT" issues in runtime | 12 June 2020, 17:46:59 UTC |
a6ec01a | Shoaib Kamil | 12 June 2020, 16:42:50 UTC | Try to work around MSL compiler stupidity. | 12 June 2020, 16:42:50 UTC |
195bcbc | Shoaib Kamil | 12 June 2020, 16:19:49 UTC | Merge remote-tracking branch 'origin/master' into shoaibkamil/correct_memory_fences | 12 June 2020, 16:19:49 UTC |
887eacc | Andrew Adams | 12 June 2020, 05:14:28 UTC | Merge pull request #5029 from halide/abadams/fix_associativity Make it harder for the associativity test to get confused | 12 June 2020, 05:14:28 UTC |
7caecf2 | Andrew Adams | 11 June 2020, 22:31:24 UTC | Pass LLVM_VERSION to tests | 11 June 2020, 22:43:35 UTC |
471e882 | Andrew Adams | 11 June 2020, 22:33:36 UTC | More verbose error in CSE | 11 June 2020, 22:43:35 UTC |
67bbbd3 | Andrew Adams | 11 June 2020, 22:33:48 UTC | Add comment to AddAtomicMutex | 11 June 2020, 22:43:35 UTC |
8ca7602 | Andrew Adams | 11 June 2020, 22:34:07 UTC | Add 16-bit float to associative ops table | 11 June 2020, 22:43:35 UTC |
f681742 | Andrew Adams | 11 June 2020, 22:34:38 UTC | Simplify some code in deinterleave | 11 June 2020, 22:43:35 UTC |
d0584b3 | Andrew Adams | 11 June 2020, 22:34:51 UTC | Permit fusing pure and impure rvars | 11 June 2020, 22:43:35 UTC |
394536f | Andrew Adams | 11 June 2020, 22:35:14 UTC | Make lossless_cast more aggressive | 11 June 2020, 22:43:24 UTC |
cec8290 | Steven Johnson | 11 June 2020, 22:20:39 UTC | Fix some "MachO doesn't support COMDAT" issues in runtime Runtime code that will be instantiated for OSX/iOS needs to ensure that there are no plain 'inline' functions -- they must be either WEAK or __attribute__((always_inline)) -- otherwise, some compiler configurations can produce the error above. (Note that this also applies to member functions that are defined inline, even without an explicit 'inline' keyword). (Note also that the vagaries of C++ mean that declaring a ctor implies that a dtor will be auto-created; in some of these we must explicitly declare the dtor so that it too is always-inlined, even if it is empty...) | 11 June 2020, 22:20:39 UTC |
979d701 | Andrew Adams | 11 June 2020, 21:44:58 UTC | Don't lift constant integer offsets | 11 June 2020, 21:44:58 UTC |
d755381 | Shoaib Kamil | 11 June 2020, 20:59:01 UTC | Stop copying strings | 11 June 2020, 20:59:01 UTC |
1483793 | Shoaib Kamil | 11 June 2020, 20:48:45 UTC | Address review comments, make algorithm simpler. | 11 June 2020, 20:48:45 UTC |
4e55416 | Andrew Adams | 11 June 2020, 19:29:38 UTC | Make it harder for the associativity test to get confused | 11 June 2020, 19:29:38 UTC |
8cddb2e | Steven Johnson | 11 June 2020, 17:14:55 UTC | Merge pull request #5027 from halide/srj-absd Fix codegen for absd() in GLSLBase | 11 June 2020, 17:14:55 UTC |
3d19643 | Steven Johnson | 11 June 2020, 17:14:42 UTC | Merge pull request #5026 from halide/srj-glsl Combine visit(Cast) for GLSL and OpenGLCompute | 11 June 2020, 17:14:42 UTC |
665001d | Shoaib Kamil | 11 June 2020, 15:59:37 UTC | Partially address reviewer comments | 11 June 2020, 15:59:37 UTC |
3721fcb | Steven Johnson | 11 June 2020, 00:50:22 UTC | Merge pull request #5028 from halide/srj-appv Enable verbosity in apps builds | 11 June 2020, 00:50:22 UTC |
53e8ab6 | Steven Johnson | 11 June 2020, 00:31:56 UTC | Also add --output-on-failure | 11 June 2020, 00:31:56 UTC |
9e780af | Steven Johnson | 11 June 2020, 00:22:27 UTC | Enable verbosity in apps builds Hoping this will help us track down flaky Windows failures. | 11 June 2020, 00:22:27 UTC |
5c52122 | Andrew Adams | 11 June 2020, 00:20:59 UTC | Merge pull request #5023 from halide/abadams/more_simplifier_rules New simplifier rules necessary for the gpu autoscheduler | 11 June 2020, 00:20:59 UTC |
4a2216d | Steven Johnson | 10 June 2020, 22:25:06 UTC | Fix codegen for absd() in GLSLBase It was emitting as a float, which is *never* correct, since absd() is only used for int or uint types. (This happened to work before because GLSL was previously also incorrectly using float for uint in some cases.) Also did a drive-by removal of code in Codegen_C that recapitulated the logic from IROperator.cpp; maybe the type field of absd() was incorrect at some point in the past, but this calculation seems redundant and wrong now. | 10 June 2020, 22:25:06 UTC |
ab1c53e | Steven Johnson | 10 June 2020, 22:15:08 UTC | Combine visit(Cast) for GLSL and OpenGLCompute These are the only two overrides of `visit(Cast) from GLSLBase and they both have identical implementations; combine them into one and move into GLSLBase to save code. | 10 June 2020, 22:15:08 UTC |
17f0176 | Shoaib Kamil | 10 June 2020, 20:22:29 UTC | clang-format | 10 June 2020, 20:22:29 UTC |
724cd28 | Shoaib Kamil | 10 June 2020, 20:19:25 UTC | clang-format | 10 June 2020, 20:19:25 UTC |
d7225ae | Shoaib Kamil | 10 June 2020, 20:18:07 UTC | Tweak spacing | 10 June 2020, 20:18:07 UTC |
5f0ce89 | Shoaib Kamil | 10 June 2020, 20:11:55 UTC | Merge remote-tracking branch 'origin/master' into shoaibkamil/correct_memory_fences | 10 June 2020, 20:11:55 UTC |
75fe44a | Shoaib Kamil | 10 June 2020, 20:09:09 UTC | Slight change in D3D12 logic. | 10 June 2020, 20:09:09 UTC |
9cec5a5 | Andrew Adams | 10 June 2020, 17:20:36 UTC | New simplifier rules necessary for the gpu autoscheduler | 10 June 2020, 17:20:36 UTC |
8b9081b | Steven Johnson | 10 June 2020, 16:39:23 UTC | Merge pull request #5021 from halide/abadams/fewer_print_parentheses Fewer print parentheses | 10 June 2020, 16:39:23 UTC |
27b478e | Shoaib Kamil | 10 June 2020, 16:34:40 UTC | Minor | 10 June 2020, 16:34:40 UTC |
ca420a4 | Shoaib Kamil | 10 June 2020, 15:36:34 UTC | Checkpoint | 10 June 2020, 15:36:34 UTC |
ee5f90e | Alex Reinking | 10 June 2020, 06:49:36 UTC | Merge pull request #5015 from acolinisi/PR--cmake-llvm-dynlib-2 cmake: llvm: fix linking against LLVM shared lib | 10 June 2020, 06:49:36 UTC |
3609c63 | Andrew Adams | 10 June 2020, 06:36:02 UTC | Merge pull request #5022 from halide/wording_fix Small wording improvements. | 10 June 2020, 06:36:02 UTC |