HEAD | a9ea9b5 | Fix for top-of-tree LLVM (#7194) | 02 December 2022, 00:17:48 UTC |
refs/heads/Halide_unsharp | 61c1b40 | Merge pull request #3458 from white-pony/master Allocate hexagon runtime arguments buffers on the heap if there are too many arguments | 04 December 2018, 00:00:00 UTC |
refs/heads/abadams/align_strided_const_loads | ed529e0 | Align the base when doing strided loads from constant addresses When we codegen something like f[ramp(x + 1, 2, 16)], where f is an internal allocation, we subtract the 1, do the dense load f[ramp(x, 1, 32)] and then take the odd lanes of the result. The reason for this is that it's likely that there's an f[ramp(x, 2, 16)] nearby, and aligning down the x+1 to x means we can share the dense loads and just deinterleave. This PR does the same when there's no x, just an odd constant. This means that cases like f[ramp(64, 2, 16)] + f[ramp(65, 2, 16)] now generate much better assembly. In one case I have it speeds up an entire pipeline by 8%, because aligning the loads in this way causes them to all be promoted off the stack into registers. | 29 November 2020, 22:07:28 UTC |
refs/heads/abadams/alloca | 3fa94ab | Fix comment location | 07 October 2021, 23:31:27 UTC |
refs/heads/abadams/atomic_parallel_compiled_in | 407d308 | Compile leaf parallel loops using an internal atomic counter | 06 November 2020, 20:03:09 UTC |
refs/heads/abadams/averaging_tree | bc10623 | Merge branch 'abadams/averaging_tree' of https://github.com/halide/Halide into abadams/averaging_tree | 26 April 2022, 17:38:39 UTC |
refs/heads/abadams/avoid_name_mangling_in_cross_module_dependencies | 13f388d | Merge remote-tracking branch 'origin/master' into abadams/avoid_name_mangling_in_cross_module_dependencies | 25 August 2020, 21:15:53 UTC |
refs/heads/abadams/better_absd | 86dfde4 | typo | 06 January 2022, 19:03:20 UTC |
refs/heads/abadams/better_codegen_for_non_const_ramps | 6721d40 | Better codegen for ramps with non-const stride | 20 November 2020, 22:20:41 UTC |
refs/heads/abadams/bgu_cholesky | 1707c0a | Address review comments | 01 February 2021, 18:54:47 UTC |
refs/heads/abadams/braces_around_statements | ecad269 | Use switch statement instead of if sequence | 05 October 2020, 23:56:18 UTC |
refs/heads/abadams/check_reorder_dups | 8cee0da | Check for duplicate vars in calls to reorder/reorder_storage | 23 August 2020, 21:39:07 UTC |
refs/heads/abadams/cond_wait_spin | 2bee115 | Merge branch 'master' into abadams/cond_wait_spin | 01 February 2021, 17:59:31 UTC |
refs/heads/abadams/cse_in_unroll_split_tuples | d1c71d0 | Merge branch 'master' into abadams/cse_in_unroll_split_tuples | 15 December 2021, 00:59:07 UTC |
refs/heads/abadams/custom_cuda_context | 0b14ec0 | Comment clarifications | 15 October 2021, 20:59:48 UTC |
refs/heads/abadams/custom_cuda_context_2 | d3df50f | Clean up comments | 25 October 2021, 20:41:50 UTC |
refs/heads/abadams/custom_cuda_context_3 | d0cdc15 | Improve comments | 27 October 2021, 00:51:39 UTC |
refs/heads/abadams/d3d12abi | 75b4f0d | Rename d3d12 modules to windows_d3d12 to simplify build Also clobber invalid module flags from generic modules | 14 August 2020, 17:27:21 UTC |
refs/heads/abadams/depthwise_separable_conv | 64fcd56 | Rename some variables | 25 August 2020, 19:20:51 UTC |
refs/heads/abadams/diagnose_boundary_condition_failure | 6b32fa2 | Merge branch 'abadams/1v3_linear_comparison_cancellations' into abadams/diagnose_boundary_condition_failure | 24 June 2020, 01:42:10 UTC |
refs/heads/abadams/divide_using_pavgw | a12b3cb | Add comment elaborating on why this is a good idea | 15 October 2021, 20:00:33 UTC |
refs/heads/abadams/dont_reinterpret_concat | 94d7f01 | Don't reinterpret cast when codegenning vector concat It confuses the HVX LLVM backend, and shouldn't be necessary anyway. | 02 July 2021, 17:23:21 UTC |
refs/heads/abadams/early_out | ec551ee | Appease clang-tidy | 13 June 2022, 18:38:59 UTC |
refs/heads/abadams/extract_concat_bits | 0457109 | Fix concat_bits call | 13 August 2022, 22:15:34 UTC |
refs/heads/abadams/fast_integer_divide_round_to_zero | f215365 | Pacify clang tidy | 30 November 2021, 22:02:00 UTC |
refs/heads/abadams/faster_runtime_integer_division | 2806116 | Cleaner initialization of tables | 23 November 2021, 18:44:57 UTC |
refs/heads/abadams/faster_unroll | 5012aba | Fix computational complexity of unrolling large muxes | 03 February 2021, 20:49:10 UTC |
refs/heads/abadams/fix-arm-seg2 | 4f20718 | Merge remote-tracking branch 'origin/master' into abadams/fix-arm-seg2 | 05 March 2021, 23:29:40 UTC |
refs/heads/abadams/fix_5323 | be50f8a | Add --help flag to rungenmain, fixing #5323 | 26 October 2021, 19:47:18 UTC |
refs/heads/abadams/fix_5329 | 72224e1 | Add explicit cast to remove ambiguous operator== (Fixes #5329) | 05 April 2021, 17:06:25 UTC |
refs/heads/abadams/fix_5889 | 15abc45 | Use guarded versions of vars if they exist in bounds inference | 08 April 2021, 19:42:48 UTC |
refs/heads/abadams/fix_autoschedule_feature_transposition | 0e361d4 | Fix transposed variable names | 29 July 2020, 18:19:45 UTC |
refs/heads/abadams/fix_cuda_mat_mul_assert | f7d1a8f | Merge branch 'master' into abadams/fix_cuda_mat_mul_assert | 19 June 2020, 02:09:18 UTC |
refs/heads/abadams/fix_deinterleave_bug | 987f531 | Remove buggy deinterleave misfeature | 24 March 2021, 00:08:17 UTC |
refs/heads/abadams/fix_deinterleave_for_reinterpret | 1772c1f | Minimal approach to making Deinterleave correct for Reinterpret | 05 August 2022, 19:30:20 UTC |
refs/heads/abadams/fix_div_round_to_zero | 108dcea | Add missing print | 11 September 2022, 21:32:42 UTC |
refs/heads/abadams/fix_fft_compile_time_regression | 2a8ced8 | Merge branch 'master' into abadams/fix_fft_compile_time_regression | 01 December 2020, 18:46:33 UTC |
refs/heads/abadams/fix_generate_output_snippets | d638d81 | Rename LINES to INTERESTING_LINES Some terminals treat LINES as a special var, breaking this script | 23 September 2020, 19:34:55 UTC |
refs/heads/abadams/fix_lgtm_warnings | db22a23 | Fix a few warnings from lgtm.com | 21 February 2021, 03:34:46 UTC |
refs/heads/abadams/fix_links_to_master | 166a748 | Fix some dead links to the 'master' branch | 20 October 2022, 16:56:02 UTC |
refs/heads/abadams/fix_potential_gpu_deadlock | e3606cc | Fix GPU barrier deadlocks Partition loops shouldn't mess with serial loops containing thread barriers, potentially causing warp divergence and deadlock (seen in some obscure lens blur schedules). Also we were generating too many thread barriers in a branch where the base mutator class was accidentally always mutating something, so there's a change to FuseGPUThreadLoops to make it more bug-resistant. Without these additional barriers I have been unable to come up with a case where a barrier ends up somewhere that would deadlock, so no test. | 13 August 2020, 17:37:30 UTC |
refs/heads/abadams/fix_realize_condition_depends_on_tuple | 4a3df05 | Fix bug when realize condition depends on tuple call If the realization is tuple-valued, and the condition on the realization uses a tuple call (index != 0), then the condition wasn't getting resolved during the split_tuples pass. The cause was a missing mutate call. | 03 August 2022, 22:06:50 UTC |
refs/heads/abadams/fix_round | 5c063bb | Merge branch 'main' into abadams/fix_round | 26 September 2022, 18:49:16 UTC |
refs/heads/abadams/fix_stencil_chain_gpu_schedule | b8ad19f | Schedule last stage of stencil chain on GPU too | 11 August 2020, 19:12:57 UTC |
refs/heads/abadams/fix_track_bounds_intervals | 5091725 | Rename inner version of bounds_of_expr_in_scope It's not in the explicit namespace that it's requested in (Halide::Internal), so turning on that debugging code results in compile failures. I just gave it a different name to disambiguate. | 27 August 2021, 17:44:40 UTC |
refs/heads/abadams/fix_tutorial_2 | 52ff477 | Remove incorrect not-multiple-of-16 claim | 20 January 2022, 16:41:42 UTC |
refs/heads/abadams/fully_fused_depthwise_separable_conv | 4170427 | Remove dead split | 02 September 2020, 22:32:24 UTC |
refs/heads/abadams/gaussian_blur_app | 8a92c26 | Use a vectorized sum scan for the pyramid version too | 08 September 2021, 00:40:04 UTC |
refs/heads/abadams/gpu_autoscheduler_parallel_random_probes | f8057f8 | Add ability to do parallel random probes in-process | 18 August 2020, 23:07:25 UTC |
refs/heads/abadams/interleave_nested_vector | d1deb58 | Don't deinterleave all the way down to scalars | 13 February 2021, 23:06:27 UTC |
refs/heads/abadams/ir_match_by_ref | 7a586aa | Remove assert that was blowing up simplifier stack frames | 03 February 2021, 03:52:24 UTC |
refs/heads/abadams/lerp_plus_cast | c54f4a4 | Don't produce out-of-range lerp values | 10 December 2021, 13:12:56 UTC |
refs/heads/abadams/lower_halving_sub | 429ab73 | Add explanation of signed case | 29 June 2022, 19:24:09 UTC |
refs/heads/abadams/lower_rounding_shift_right | ba47819 | Non-widening lowering of rounding shifts This version lowers it without needing to widen, which is a large win on x86 for 16 and 32-bit types (3.8x faster and 2.8x faster respectively). It's a very slight slowdown for 8-bit because x86 doesn't have 8-bit shift instructions. Also drive-by typo fix. | 03 May 2021, 23:51:26 UTC |
refs/heads/abadams/mac-arm-fixes | 92355ea | Revert unintended change in precision | 04 March 2021, 00:30:10 UTC |
refs/heads/abadams/mixed_sign_mul_shift_right | 1d07ebd | Add comment | 08 February 2022, 21:55:58 UTC |
refs/heads/abadams/mixed_width_mul_shift_right | 36b990d | Merge branch 'master' into abadams/mixed_width_mul_shift_right | 03 January 2022, 20:47:07 UTC |
refs/heads/abadams/multiple_scatter | 5b06a14 | Address review comments | 31 December 2020, 00:49:16 UTC |
refs/heads/abadams/mux_intrinsic | 913887f | Add comment about out of range mux index | 05 February 2021, 19:35:28 UTC |
refs/heads/abadams/nested_vectorization_compile_time_regression_fix | facb69d | Fix for unbounded lanes | 12 October 2020, 20:16:43 UTC |
refs/heads/abadams/nested_vectorization_tweaks | d7cf9bc | Merge branch 'master' into abadams/nested_vectorization_tweaks | 09 October 2020, 16:25:38 UTC |
refs/heads/abadams/precompute_shared_mem_size | 9903d2b | Add comment explaining why we don't do dynamic tracking when no upper bound too | 27 August 2020, 02:14:59 UTC |
refs/heads/abadams/psabdw | 38a77cb | Merge remote-tracking branch 'origin/main' into abadams/psabdw | 22 July 2022, 15:59:38 UTC |
refs/heads/abadams/random_pipelines | 1833a0b | Make training binary robust to bad pipeline ids | 16 October 2022, 22:05:51 UTC |
refs/heads/abadams/reenable_unscheduled_stage_warning | d82a456 | Add Stage::unscheduled() | 17 February 2022, 21:44:35 UTC |
refs/heads/abadams/reinterpret_vector | 7c70051 | clang-format | 19 December 2021, 19:40:49 UTC |
refs/heads/abadams/remove_bad_pruning | 21b3c85 | Relax overzealous pruning rule We don't allow schedules that fuse to the extent that we can no longer vectorize. This was implemented incorrectly though. The check assumed that something was going to be compute_at inside the innermost loop, and neglected the possibility that we were about to tile that loop. | 28 June 2021, 18:51:40 UTC |
refs/heads/abadams/remove_readnone_on_functions | 0181dd9 | Revert formatting changes | 07 November 2022, 22:03:38 UTC |
refs/heads/abadams/reschedule_bgu | 9669817 | Reschedule BGU to fix performance regression BGU on CUDA had regressed from its stated performance due to the atomic floating point adds being compiled to CAS loops due to complex indexing expressions diverging on the LHS and RHS of the +=. Inlining less stuff into the += operations makes it succeed again, and the schedule was improved with a few other tweaks. Longer-term we need a first-class way to represent += so that we're not sensitive to this sort of divergence. | 16 August 2020, 20:54:08 UTC |
refs/heads/abadams/rounding_shift_right_use_average | 357a12a | Address review comments | 13 December 2021, 16:37:12 UTC |
refs/heads/abadams/rungenmain_error | 43f94b3 | Add an error message if you forget to compile RunGenMain with a registration file | 17 July 2020, 20:38:21 UTC |
refs/heads/abadams/sampling_profiler_overhead_v2 | 588de72 | One line per member | 23 November 2021, 21:15:23 UTC |
refs/heads/abadams/simplify_correlated_pyramid | 718989c | Slightly more general | 12 March 2021, 21:47:41 UTC |
refs/heads/abadams/siotas_20 | 325daac | Misc fixes | 18 August 2021, 17:49:39 UTC |
refs/heads/abadams/sioutas_20 | 44817ce | Merge pull request #5295 from halide/abadams/fix_generate_output_snippets Rename LINES to INTERESTING_LINES | 23 September 2020, 20:15:35 UTC |
refs/heads/abadams/slide_over_split_loop | c413e32 | Merge branch 'dsharletg/sliding-window' into abadams/slide_over_split_loop | 23 February 2021, 22:26:49 UTC |
refs/heads/abadams/sorting_network_working_branch | 9358860 | codegen tweaks | 08 January 2021, 01:18:19 UTC |
refs/heads/abadams/switch_stmt | d01bf4e | Merge branch 'master' into abadams/switch_stmt | 21 January 2021, 22:12:36 UTC |
refs/heads/abadams/target_specific_lerp | 52c13b5 | Target is a struct | 19 November 2021, 18:53:35 UTC |
refs/heads/abadams/undo_pointless_widening | 02492ca | Push casts inside integer narrowing | 14 February 2022, 17:10:31 UTC |
refs/heads/abadams/unordered_blocks | 1d9f85b | Loops in between a store_at and a compute_at are ordered | 04 August 2020, 23:41:45 UTC |
refs/heads/abadams/unsigned_demosaic | 26f6457 | Merge remote-tracking branch 'origin/master' into abadams/unsigned_demosaic | 11 October 2021, 21:03:06 UTC |
refs/heads/abadams/use_arm_for_runtime_triple | 6dd63ac | How about wasm? | 22 April 2021, 22:33:33 UTC |
refs/heads/abadams/vector_reduce_hexagon_predicate | 37a0d77 | Use a VectorReduce not to determine if any lanes are true in Hexagon backend | 06 May 2021, 21:16:35 UTC |
refs/heads/abadams/vst_type_fix | a591fbc | Change 64-bit only | 12 April 2022, 23:38:46 UTC |
refs/heads/abadams/widening_let_bug | f94dfce | Just redo the comments | 11 November 2021, 00:19:55 UTC |
refs/heads/abadams/x86_avg | 99d3795 | Delete more dead code | 08 October 2021, 22:06:10 UTC |
refs/heads/adadams/profile_allocator | 2bf474d | Remove unnecessary asserts | 25 February 2021, 17:57:01 UTC |
refs/heads/add_image_checks_after_bounds_inference_plus_new_rules | 2cce30e | Delete rules that cause cycles Also move simplify_correlated_differences back to where it was, and add a handful of other rules that were in the branch. | 03 February 2020, 22:24:15 UTC |
refs/heads/add_outermost_to_extern | 31715f8 | Add outermost dim to the dim list when defining extern | 27 January 2017, 22:35:07 UTC |
refs/heads/add_vectorization_to_search_space | b497cf1 | Enable tests | 18 December 2018, 19:03:41 UTC |
refs/heads/align_loads_comment_fix | c62bcf6 | Wording improvement. | 12 August 2021, 18:06:55 UTC |
refs/heads/alina-strided-store | 9f9a64c | Merge remote-tracking branch 'origin/master' into alina-strided-store | 07 December 2017, 19:16:54 UTC |
refs/heads/another_buffer_copy_fix | 7701abe | Fix cases where halide_buffer_copy could copy to/from a host pointer that was NULL where the case was valid by compying from the device allocation. Add tests for these cases. Change name of do_multidimensional_copy in opencl and cuda runtimes to be unique to each runtime as the opencl runtime was calling the cuda do_multidimensional_copy despite both being in anonymous namespaces inside their respective files. Weak linking and C++ namespaces and our unusual runtime linking and probably at least one bug somewhere caused this to go badly. Required trying to use both cuda and opencl at the same time. | 27 August 2018, 09:04:43 UTC |
refs/heads/ataei-block_asserts-codegen | 46f432a | Remove commented experimental code | 25 January 2019, 00:20:13 UTC |
refs/heads/ataei-debug_info | 07821c6 | Print llvm -time-passes statstics when JIT or AOT compile an LLVM module | 17 January 2019, 00:27:41 UTC |
refs/heads/ataei-fix-pow | f5120c3 | Fix cuda nan_f32 value | 13 June 2019, 21:12:52 UTC |
refs/heads/ataei-gen_str_param | 8a4f1f4 | Merge branch 'master' into ataei-gen_str_param | 09 April 2019, 22:31:54 UTC |
refs/heads/ataei-implicit_lhs_vars | 04bd712 | Merge branch 'master' into ataei-implicit_lhs_vars | 05 March 2019, 23:04:43 UTC |
refs/heads/ataei-onnx | 8c9c8d4 | Update onnx_converter | 26 March 2019, 22:34:40 UTC |