HEAD | 4e0b313 | Rewrite IREquality to use a more compact stack instead of deep recursion (#8198) * Rewrite IREquality to use a more compact stack instead of deep recursion Deletes a bunch of code and speeds up lowering time of local laplacian with 20 pyramid levels by ~2.5% * clang-tidy * Fold in the version of equal in IRMatch.h/cpp * Add missing switch breaks * Add missing comments * Elaborate on why we treat NaNs as equal | 18 April 2024, 19:48:59 UTC |
refs/heads/Halide_unsharp | 61c1b40 | Merge pull request #3458 from white-pony/master Allocate hexagon runtime arguments buffers on the heap if there are too many arguments | 04 December 2018, 00:00:00 UTC |
refs/heads/abadams/aggressive_is_single_point | b15a648 | clang-tidy | 18 April 2024, 22:39:07 UTC |
refs/heads/abadams/align_strided_const_loads | ed529e0 | Align the base when doing strided loads from constant addresses When we codegen something like f[ramp(x + 1, 2, 16)], where f is an internal allocation, we subtract the 1, do the dense load f[ramp(x, 1, 32)] and then take the odd lanes of the result. The reason for this is that it's likely that there's an f[ramp(x, 2, 16)] nearby, and aligning down the x+1 to x means we can share the dense loads and just deinterleave. This PR does the same when there's no x, just an odd constant. This means that cases like f[ramp(64, 2, 16)] + f[ramp(65, 2, 16)] now generate much better assembly. In one case I have it speeds up an entire pipeline by 8%, because aligning the loads in this way causes them to all be promoted off the stack into registers. | 29 November 2020, 22:07:28 UTC |
refs/heads/abadams/alloca | 3fa94ab | Fix comment location | 07 October 2021, 23:31:27 UTC |
refs/heads/abadams/atomic_parallel_compiled_in | 407d308 | Compile leaf parallel loops using an internal atomic counter | 06 November 2020, 20:03:09 UTC |
refs/heads/abadams/atomic_vector_non_recursive | 22c2530 | Remove dead Vars | 13 February 2023, 19:27:53 UTC |
refs/heads/abadams/averaging_tree | bc10623 | Merge branch 'abadams/averaging_tree' of https://github.com/halide/Halide into abadams/averaging_tree | 26 April 2022, 17:38:39 UTC |
refs/heads/abadams/avoid_name_mangling_in_cross_module_dependencies | 13f388d | Merge remote-tracking branch 'origin/master' into abadams/avoid_name_mangling_in_cross_module_dependencies | 25 August 2020, 21:15:53 UTC |
refs/heads/abadams/better_absd | 86dfde4 | typo | 06 January 2022, 19:03:20 UTC |
refs/heads/abadams/better_codegen_for_non_const_ramps | 6721d40 | Better codegen for ramps with non-const stride | 20 November 2020, 22:20:41 UTC |
refs/heads/abadams/bgu_cholesky | 1707c0a | Address review comments | 01 February 2021, 18:54:47 UTC |
refs/heads/abadams/braces_around_statements | ecad269 | Use switch statement instead of if sequence | 05 October 2020, 23:56:18 UTC |
refs/heads/abadams/cache_tighten_producer_consumer_nodes | e848ee8 | Merge remote-tracking branch 'origin/main' into abadams/cache_tighten_producer_consumer_nodes | 21 February 2024, 18:54:24 UTC |
refs/heads/abadams/check_reorder_dups | 8cee0da | Check for duplicate vars in calls to reorder/reorder_storage | 23 August 2020, 21:39:07 UTC |
refs/heads/abadams/clarify_broadcast_shuffle | d13bfa8 | Revert accidental change | 18 March 2024, 16:00:29 UTC |
refs/heads/abadams/compositing_app | 8b5ca06 | Revert inclusion of cmath | 22 June 2023, 22:05:22 UTC |
refs/heads/abadams/cond_wait_spin | 2bee115 | Merge branch 'master' into abadams/cond_wait_spin | 01 February 2021, 17:59:31 UTC |
refs/heads/abadams/cse_in_unroll_split_tuples | d1c71d0 | Merge branch 'master' into abadams/cse_in_unroll_split_tuples | 15 December 2021, 00:59:07 UTC |
refs/heads/abadams/custom_cuda_context | 0b14ec0 | Comment clarifications | 15 October 2021, 20:59:48 UTC |
refs/heads/abadams/custom_cuda_context_2 | d3df50f | Clean up comments | 25 October 2021, 20:41:50 UTC |
refs/heads/abadams/custom_cuda_context_3 | d0cdc15 | Improve comments | 27 October 2021, 00:51:39 UTC |
refs/heads/abadams/d3d12abi | 75b4f0d | Rename d3d12 modules to windows_d3d12 to simplify build Also clobber invalid module flags from generic modules | 14 August 2020, 17:27:21 UTC |
refs/heads/abadams/deflake_mullapudi_reorder | cc6e06d | Increase test threshold for mullapudi histogram test It uses fine-grained parallelism, which has a very noisy runtime. | 03 April 2023, 23:43:52 UTC |
refs/heads/abadams/delete_prepare_for_early_exit | 5a9d2ee | Merge remote-tracking branch 'origin/main' into abadams/delete_prepare_for_early_exit | 11 November 2023, 17:14:52 UTC |
refs/heads/abadams/depthwise_separable_conv | 64fcd56 | Rename some variables | 25 August 2020, 19:20:51 UTC |
refs/heads/abadams/diagnose_boundary_condition_failure | 6b32fa2 | Merge branch 'abadams/1v3_linear_comparison_cancellations' into abadams/diagnose_boundary_condition_failure | 24 June 2020, 01:42:10 UTC |
refs/heads/abadams/disable_onnx_app_on_mac | f3b548f | Skip onnx app on mac | 01 September 2023, 20:43:19 UTC |
refs/heads/abadams/divide_using_pavgw | a12b3cb | Add comment elaborating on why this is a good idea | 15 October 2021, 20:00:33 UTC |
refs/heads/abadams/dont_link_to_cudart | 6004e5f | Don't link to cudart or opencl library. These are loaded dynamically when required. | 19 July 2023, 16:47:18 UTC |
refs/heads/abadams/dont_reinterpret_concat | 94d7f01 | Don't reinterpret cast when codegenning vector concat It confuses the HVX LLVM backend, and shouldn't be necessary anyway. | 02 July 2021, 17:23:21 UTC |
refs/heads/abadams/early_out | ec551ee | Appease clang-tidy | 13 June 2022, 18:38:59 UTC |
refs/heads/abadams/enable_f16c | f7776c8 | Merge remote-tracking branch 'origin/main' into abadams/enable_f16c | 06 September 2023, 17:06:20 UTC |
refs/heads/abadams/extract_concat_bits | 0457109 | Fix concat_bits call | 13 August 2022, 22:15:34 UTC |
refs/heads/abadams/fast_integer_divide_round_to_zero | f215365 | Pacify clang tidy | 30 November 2021, 22:02:00 UTC |
refs/heads/abadams/faster_runtime_integer_division | 2806116 | Cleaner initialization of tables | 23 November 2021, 18:44:57 UTC |
refs/heads/abadams/faster_substitute_facts | 07672fe | Merge remote-tracking branch 'origin/main' into abadams/faster_substitute_facts | 18 April 2024, 19:49:36 UTC |
refs/heads/abadams/faster_unroll | 5012aba | Fix computational complexity of unrolling large muxes | 03 February 2021, 20:49:10 UTC |
refs/heads/abadams/fix-arm-seg2 | 4f20718 | Merge remote-tracking branch 'origin/master' into abadams/fix-arm-seg2 | 05 March 2021, 23:29:40 UTC |
refs/heads/abadams/fix_4211 | 76b8cbc | Merge branch 'main' into abadams/fix_4211 | 15 June 2023, 00:48:33 UTC |
refs/heads/abadams/fix_5323 | be50f8a | Add --help flag to rungenmain, fixing #5323 | 26 October 2021, 19:47:18 UTC |
refs/heads/abadams/fix_5329 | 72224e1 | Add explicit cast to remove ambiguous operator== (Fixes #5329) | 05 April 2021, 17:06:25 UTC |
refs/heads/abadams/fix_5889 | 15abc45 | Use guarded versions of vars if they exist in bounds inference | 08 April 2021, 19:42:48 UTC |
refs/heads/abadams/fix_6984 | 88509a8 | fix typo | 02 March 2023, 19:33:55 UTC |
refs/heads/abadams/fix_7229 | 3c055c9 | Actually perform the requested operation | 12 December 2022, 22:10:41 UTC |
refs/heads/abadams/fix_7260 | 4e9f812 | Merge branch 'abadams/fix_7260' of https://github.com/halide/Halide into abadams/fix_7260 | 01 January 2023, 23:01:45 UTC |
refs/heads/abadams/fix_7365 | fe3fb36 | Overflow on casts is fine for ints < 32 bits | 20 February 2023, 17:33:09 UTC |
refs/heads/abadams/fix_7374 | 646e53c | Add test | 24 February 2023, 23:08:18 UTC |
refs/heads/abadams/fix_7504 | 57c484f | Add missing test | 12 April 2023, 23:00:56 UTC |
refs/heads/abadams/fix_7514 | dfe07b0 | Silence clang-tidy | 17 April 2023, 22:04:33 UTC |
refs/heads/abadams/fix_7531 | d10c6fd | Fix inverted may_subtile checks | 12 June 2023, 17:29:45 UTC |
refs/heads/abadams/fix_7584 | 7baedca | Fix operator/ on ModulusRemainder It wasn't reducing the remainder modulo the modulus, which confused trim_bounds_using_alignment in the simplifier. | 31 May 2023, 21:40:28 UTC |
refs/heads/abadams/fix_7584_v2 | 1006c4e | Fix operator/ on ModulusRemainder It wasn't reducing the remainder modulo the modulus, which confused trim_bounds_using_alignment in the simplifier. | 31 May 2023, 21:40:28 UTC |
refs/heads/abadams/fix_7742 | 3a79f46 | Remove accidental return | 04 August 2023, 21:05:01 UTC |
refs/heads/abadams/fix_7756 | 0973abd | Add success print | 26 September 2023, 20:47:16 UTC |
refs/heads/abadams/fix_7761 | 46fb1e3 | Add test | 25 September 2023, 21:47:22 UTC |
refs/heads/abadams/fix_7768 | 5db62a0 | Add test | 21 August 2023, 21:26:29 UTC |
refs/heads/abadams/fix_7786 | 011d42b | Don't inject undef() in the simplifier We shouldn't be using undef() in the simplifier. This replaces a load with a constant false predicate with a zero instead. I also added a guard around some dubious logic about out of bounds loads. out of bounds loads may be reachable if they have a false predicate, so I changed this simplification to only trigger if the load is unpredicated. | 21 August 2023, 20:52:00 UTC |
refs/heads/abadams/fix_7810 | fb06e94 | trigger buildbots | 29 November 2023, 22:47:14 UTC |
refs/heads/abadams/fix_7811 | ed1a5dd | Merge branch 'main' into abadams/fix_7811 | 28 November 2023, 15:22:03 UTC |
refs/heads/abadams/fix_7815 | dabc935 | Merge remote-tracking branch 'origin/main' into abadams/fix_7815 | 01 September 2023, 04:28:35 UTC |
refs/heads/abadams/fix_7867 | 153709b | trigger buildbots | 29 November 2023, 22:46:45 UTC |
refs/heads/abadams/fix_7871 | b2e3cc3 | Merge branch 'abadams/fix_riscv_vx_vi' into abadams/fix_7871 | 04 October 2023, 19:22:12 UTC |
refs/heads/abadams/fix_7872 | 47a209d | Merge remote-tracking branch 'origin/main' into abadams/fix_7872 | 05 October 2023, 16:16:26 UTC |
refs/heads/abadams/fix_7873 | b6132ef | Don't deduce unreachability from predicated out of bounds stores Fixes #7873 | 03 October 2023, 23:52:16 UTC |
refs/heads/abadams/fix_7888 | 022bcd5 | Don't try to construct illegal types | 11 October 2023, 19:14:31 UTC |
refs/heads/abadams/fix_7890 | 10687b5 | Fix rfactor adding too many pure loops When you rfactor an update definition, the new update definition must use all the pure vars of the Func, even though the one you're rfactoring may not have used them all. We also want to preserve any scheduling already done to the pure vars, so we want to preserve the dims list and splits list from the original definition. The code accounted for this by checking the dims list for any missing pure vars and adding them at the end (just before Var::outermost()), but this didn't account for the fact that they may no longer exist in the dims list due to splits that didn't reuse the outer name. In these circumstances we could end up with too many pure loops. E.g. if x has been split into xo and xi, then the code was adding a loop for x even though there were already loops for xo and xi, which of course produces garbage output. This PR instead just checks which pure vars are actually used in the update definition up front, and then uses that to tell which ones should be added. Fixes #7890 | 09 February 2024, 19:20:56 UTC |
refs/heads/abadams/fix_7891 | 5598c35 | Merge remote-tracking branch 'origin/main' into abadams/fix_7891 | 18 October 2023, 17:57:49 UTC |
refs/heads/abadams/fix_7892 | e26ce62 | Merge remote-tracking branch 'origin/main' into abadams/fix_7892 | 16 October 2023, 17:15:15 UTC |
refs/heads/abadams/fix_7893 | 476e1f7 | Merge remote-tracking branch 'origin/main' into abadams/fix_7893 | 16 October 2023, 17:15:34 UTC |
refs/heads/abadams/fix_7906 | 08afbbc | Stop interleaver from expanding the scope of letstmts In the following code: let a = b in X let a = c in Y If Stmt X successfully had stores interleaved, it was re-nesting it like so: let a = b in X let a = c in Y This introduces a shadowed variable 'a', which is illegal at this stage of lowering. Fixes #7906 Also some drive-by fixes to earlier tests that had debugging code left in. | 19 October 2023, 17:12:31 UTC |
refs/heads/abadams/fix_7909 | b3507f9 | Merge branch 'main' into abadams/fix_7909 | 20 October 2023, 17:23:13 UTC |
refs/heads/abadams/fix_7968 | 0ad79da | Add missing print | 05 December 2023, 18:09:11 UTC |
refs/heads/abadams/fix_8038 | ae04001 | trigger buildbots | 26 January 2024, 01:50:10 UTC |
refs/heads/abadams/fix_8054 | fa88d14 | Fix type error in VectorizeLoops | 01 February 2024, 01:19:21 UTC |
refs/heads/abadams/fix_8170 | da4d491 | Merge branch 'main' into abadams/fix_8170 | 16 April 2024, 16:42:42 UTC |
refs/heads/abadams/fix_8184 | 8155454 | Don't print on parallel task entry/exit with -debug flag Fixes #8184 | 09 April 2024, 18:28:59 UTC |
refs/heads/abadams/fix_arm_fcvtmp | c7cb4c4 | Add support for fcvtm/p, make scalars go through pattern matching too | 12 March 2024, 19:44:58 UTC |
refs/heads/abadams/fix_autoschedule_feature_transposition | 0e361d4 | Fix transposed variable names | 29 July 2020, 18:19:45 UTC |
refs/heads/abadams/fix_cse_name_collisions | 83b07f1 | Merge remote-tracking branch 'origin/main' into abadams/fix_cse_name_collisions | 01 September 2023, 03:00:32 UTC |
refs/heads/abadams/fix_cuda_mat_mul_assert | f7d1a8f | Merge branch 'master' into abadams/fix_cuda_mat_mul_assert | 19 June 2020, 02:09:18 UTC |
refs/heads/abadams/fix_deinterleave_bug | 987f531 | Remove buggy deinterleave misfeature | 24 March 2021, 00:08:17 UTC |
refs/heads/abadams/fix_deinterleave_for_reinterpret | 1772c1f | Minimal approach to making Deinterleave correct for Reinterpret | 05 August 2022, 19:30:20 UTC |
refs/heads/abadams/fix_div_round_to_zero | 108dcea | Add missing print | 11 September 2022, 21:32:42 UTC |
refs/heads/abadams/fix_fft_compile_time_regression | 2a8ced8 | Merge branch 'master' into abadams/fix_fft_compile_time_regression | 01 December 2020, 18:46:33 UTC |
refs/heads/abadams/fix_generate_output_snippets | d638d81 | Rename LINES to INTERESTING_LINES Some terminals treat LINES as a special var, breaking this script | 23 September 2020, 19:34:55 UTC |
refs/heads/abadams/fix_if_nesting_condition | 84b0aee | clang-format | 19 November 2023, 01:13:54 UTC |
refs/heads/abadams/fix_leaks_in_memoize_test | 0e85be4 | Fix comment | 04 August 2023, 23:47:58 UTC |
refs/heads/abadams/fix_lgtm_warnings | db22a23 | Fix a few warnings from lgtm.com | 21 February 2021, 03:34:46 UTC |
refs/heads/abadams/fix_links_to_master | 166a748 | Fix some dead links to the 'master' branch | 20 October 2022, 16:56:02 UTC |
refs/heads/abadams/fix_load_of_broadcast | 0dc03ee | Handle loads of broadcasts in FlattenNestedRamps With sufficiently perverse schedules, it's possible to end up with a load of a broadcast index (rather than a broadcast of a scalar load). This made FlattenNestedRamps divide by zero. Unfortunately this happened in a complex production pipeline, so I'm not entirely sure how to reproduce it. For that pipeline, this change fixes it and produces correct output. | 06 March 2024, 19:17:59 UTC |
refs/heads/abadams/fix_lossless_cast_of_sub | 66c56f1 | Fix some UB | 01 April 2024, 20:35:01 UTC |
refs/heads/abadams/fix_onnx_app | 32d529a | Don't test onnx app in a 32-bit build | 11 July 2023, 00:57:37 UTC |
refs/heads/abadams/fix_pointless_lower_condition | 91d87d7 | Merge remote-tracking branch 'origin/main' into abadams/fix_pointless_lower_condition | 12 March 2024, 16:48:53 UTC |
refs/heads/abadams/fix_potential_gpu_deadlock | e3606cc | Fix GPU barrier deadlocks Partition loops shouldn't mess with serial loops containing thread barriers, potentially causing warp divergence and deadlock (seen in some obscure lens blur schedules). Also we were generating too many thread barriers in a branch where the base mutator class was accidentally always mutating something, so there's a change to FuseGPUThreadLoops to make it more bug-resistant. Without these additional barriers I have been unable to come up with a case where a barrier ends up somewhere that would deadlock, so no test. | 13 August 2020, 17:37:30 UTC |
refs/heads/abadams/fix_realize_condition_depends_on_tuple | 4a3df05 | Fix bug when realize condition depends on tuple call If the realization is tuple-valued, and the condition on the realization uses a tuple call (index != 0), then the condition wasn't getting resolved during the split_tuples pass. The cause was a missing mutate call. | 03 August 2022, 22:06:50 UTC |
refs/heads/abadams/fix_reduce_expr_modulo_of_vector | 0afb878 | Fix test | 12 February 2024, 22:26:36 UTC |
refs/heads/abadams/fix_riscv_vx_vi | 33fa8a6 | Fix for llvm trunk | 04 October 2023, 19:02:13 UTC |
refs/heads/abadams/fix_round | 5c063bb | Merge branch 'main' into abadams/fix_round | 26 September 2022, 18:49:16 UTC |
refs/heads/abadams/fix_stencil_chain_gpu_schedule | b8ad19f | Schedule last stage of stencil chain on GPU too | 11 August 2020, 19:12:57 UTC |