https://github.com/halide/Halide
- HEAD
- refs/heads/Halide_unsharp
- refs/heads/abadams-patch-1
- refs/heads/abadams/aggressive_is_single_point
- refs/heads/abadams/aggressive_unify_duplicate_lets
- refs/heads/abadams/align_strided_const_loads
- refs/heads/abadams/alloca
- refs/heads/abadams/atomic_parallel_compiled_in
- refs/heads/abadams/atomic_vector_non_recursive
- refs/heads/abadams/averaging_tree
- refs/heads/abadams/avoid_name_mangling_in_cross_module_dependencies
- refs/heads/abadams/better_absd
- refs/heads/abadams/better_codegen_for_non_const_ramps
- refs/heads/abadams/bgu_cholesky
- refs/heads/abadams/braces_around_statements
- refs/heads/abadams/cache_tighten_producer_consumer_nodes
- refs/heads/abadams/check_reorder_dups
- refs/heads/abadams/clarify_broadcast_shuffle
- refs/heads/abadams/compositing_app
- refs/heads/abadams/cond_wait_spin
- refs/heads/abadams/constant_interval_simplifier
- refs/heads/abadams/cse_in_unroll_split_tuples
- refs/heads/abadams/custom_cuda_context
- refs/heads/abadams/custom_cuda_context_2
- refs/heads/abadams/custom_cuda_context_3
- refs/heads/abadams/d3d12abi
- refs/heads/abadams/deflake_mullapudi_reorder
- refs/heads/abadams/delete_division_rule_that_makes_large_constants
- refs/heads/abadams/delete_prepare_for_early_exit
- refs/heads/abadams/depthwise_separable_conv
- refs/heads/abadams/diagnose_boundary_condition_failure
- refs/heads/abadams/disable_onnx_app_on_mac
- refs/heads/abadams/divide_using_pavgw
- refs/heads/abadams/dont_link_to_cudart
- refs/heads/abadams/dont_reinterpret_concat
- refs/heads/abadams/early_out
- refs/heads/abadams/enable_f16c
- refs/heads/abadams/extract_concat_bits
- refs/heads/abadams/fast_integer_divide_round_to_zero
- refs/heads/abadams/faster_runtime_integer_division
- refs/heads/abadams/faster_substitute_facts
- refs/heads/abadams/faster_unroll
- refs/heads/abadams/faster_vars_used_in_simplify_let
- refs/heads/abadams/fix-arm-seg2
- refs/heads/abadams/fix_4211
- refs/heads/abadams/fix_5323
- refs/heads/abadams/fix_5329
- refs/heads/abadams/fix_5889
- refs/heads/abadams/fix_6984
- refs/heads/abadams/fix_7229
- refs/heads/abadams/fix_7260
- refs/heads/abadams/fix_7365
- refs/heads/abadams/fix_7374
- refs/heads/abadams/fix_7504
- refs/heads/abadams/fix_7514
- refs/heads/abadams/fix_7531
- refs/heads/abadams/fix_7584
- refs/heads/abadams/fix_7584_v2
- refs/heads/abadams/fix_7742
- refs/heads/abadams/fix_7756
- refs/heads/abadams/fix_7761
- refs/heads/abadams/fix_7768
- refs/heads/abadams/fix_7786
- refs/heads/abadams/fix_7810
- refs/heads/abadams/fix_7811
- refs/heads/abadams/fix_7815
- refs/heads/abadams/fix_7867
- refs/heads/abadams/fix_7871
- refs/heads/abadams/fix_7872
- refs/heads/abadams/fix_7873
- refs/heads/abadams/fix_7888
- refs/heads/abadams/fix_7890
- refs/heads/abadams/fix_7891
- refs/heads/abadams/fix_7892
- refs/heads/abadams/fix_7893
- refs/heads/abadams/fix_7906
- refs/heads/abadams/fix_7909
- refs/heads/abadams/fix_7968
- refs/heads/abadams/fix_8038
- refs/heads/abadams/fix_8054
- refs/heads/abadams/fix_8170
- refs/heads/abadams/fix_8184
- refs/heads/abadams/fix_8280
- refs/heads/abadams/fix_8282
- refs/heads/abadams/fix_arm_fcvtmp
- refs/heads/abadams/fix_associative_ops_saturating_add
- refs/heads/abadams/fix_autoschedule_feature_transposition
- refs/heads/abadams/fix_cse_name_collisions
- refs/heads/abadams/fix_cuda_mat_mul_assert
- refs/heads/abadams/fix_deinterleave_bug
- refs/heads/abadams/fix_deinterleave_for_reinterpret
- refs/heads/abadams/fix_div_round_to_zero
- refs/heads/abadams/fix_fft_compile_time_regression
- refs/heads/abadams/fix_generate_output_snippets
- refs/heads/abadams/fix_if_nesting_condition
- refs/heads/abadams/fix_leaks_in_memoize_test
- refs/heads/abadams/fix_lgtm_warnings
- refs/heads/abadams/fix_links_to_master
- refs/heads/abadams/fix_load_of_broadcast
- refs/heads/abadams/fix_lossless_cast_of_sub
- refs/heads/abadams/fix_onnx_app
- refs/heads/abadams/fix_pointless_lower_condition
- refs/heads/abadams/fix_potential_gpu_deadlock
- refs/heads/abadams/fix_realize_condition_depends_on_tuple
- refs/heads/abadams/fix_reduce_expr_modulo_of_vector
- refs/heads/abadams/fix_riscv_vx_vi
- refs/heads/abadams/fix_round
- refs/heads/abadams/fix_stencil_chain_gpu_schedule
- refs/heads/abadams/fix_track_bounds_intervals
- refs/heads/abadams/fix_tutorial_2
- refs/heads/abadams/fix_ub_in_lower_rounding_shift_right
- refs/heads/abadams/forward_partition_methods
- refs/heads/abadams/fully_fused_depthwise_separable_conv
- refs/heads/abadams/fuzz_sliding_window
- refs/heads/abadams/gaussian_blur_app
- refs/heads/abadams/generator_infinite_default_timeout
- refs/heads/abadams/gpu_autoscheduler_parallel_random_probes
- refs/heads/abadams/include_riscv_in_readme
- refs/heads/abadams/interleave_nested_vector
- refs/heads/abadams/ir_match_by_ref
- refs/heads/abadams/lerp_plus_cast
- refs/heads/abadams/local_laplacian_code_size
- refs/heads/abadams/lower_halving_sub
- refs/heads/abadams/lower_rounding_shift_right
- refs/heads/abadams/mac-arm-fixes
- refs/heads/abadams/make_fast_inverse_test_throughput_limited
- refs/heads/abadams/makefile_serialization_support
- refs/heads/abadams/mismatched_new_delete
- refs/heads/abadams/mixed_sign_mul_shift_right
- refs/heads/abadams/mixed_width_mul_shift_right
- refs/heads/abadams/multiple_scatter
- refs/heads/abadams/mux_intrinsic
- refs/heads/abadams/name_helpers
- refs/heads/abadams/narrow_predicates
- refs/heads/abadams/nested_vectorization_compile_time_regression_fix
- refs/heads/abadams/nested_vectorization_tweaks
- refs/heads/abadams/parallel_simd_op_check
- refs/heads/abadams/partially_backport_lossless_cast_fix
- refs/heads/abadams/per_instance_profiling
- refs/heads/abadams/precompute_shared_mem_size
- refs/heads/abadams/prefer_no_gather
- refs/heads/abadams/print_uncaught_exception
- refs/heads/abadams/promise_clamped_is_a_tag
- refs/heads/abadams/promote_fixed_point_intrinsics
- refs/heads/abadams/psabdw
- refs/heads/abadams/random_pipelines
- refs/heads/abadams/rationalize_gpu_for_loop_names
- refs/heads/abadams/reenable_unscheduled_stage_warning
- refs/heads/abadams/refactor_constant_interval
- refs/heads/abadams/reinterpret_vector
- refs/heads/abadams/remove_arch_os_for_shaders
- refs/heads/abadams/remove_bad_pruning
- refs/heads/abadams/remove_parameter_self_references
- refs/heads/abadams/remove_readnone_on_functions
- refs/heads/abadams/remove_use_of_python_config_in_onnx_makefile
- refs/heads/abadams/reschedule_bgu
- refs/heads/abadams/reschedule_bilateral_grid
- refs/heads/abadams/rewrite_atomic_pass
- refs/heads/abadams/rewrite_ir_equality
- refs/heads/abadams/rounding_shift_right_use_average
- refs/heads/abadams/rungenmain_error
- refs/heads/abadams/sampling_profiler_overhead_v2
- refs/heads/abadams/scope_improvements
- refs/heads/abadams/simpler_broadcasts
- refs/heads/abadams/simplify_correlated_pyramid
- refs/heads/abadams/simplify_sub_eval_in_lambda
- refs/heads/abadams/siotas_20
- refs/heads/abadams/sioutas_20
- refs/heads/abadams/slide_over_split_loop
- refs/heads/abadams/sorting_network_working_branch
- refs/heads/abadams/stable_topological_order
- refs/heads/abadams/string_view
- refs/heads/abadams/strip_asserts_last
- refs/heads/abadams/switch_stmt
- refs/heads/abadams/target_specific_lerp
- refs/heads/abadams/time_lowering_passes
- refs/heads/abadams/track_failedness_through_solver_lets
- refs/heads/abadams/turn_off_slp_vectorization_for_avx512
- refs/heads/abadams/tweak_unpack_buffers
- refs/heads/abadams/undo_pointless_widening
- refs/heads/abadams/unordered_blocks
- refs/heads/abadams/unsigned_demosaic
- refs/heads/abadams/update_makefile_for_llvm_19
- refs/heads/abadams/use_arm_for_runtime_triple
- refs/heads/abadams/use_pmaddubsw_for_downsample
- refs/heads/abadams/validate_gpu_schedules
- refs/heads/abadams/vector_reduce_hexagon_predicate
- refs/heads/abadams/vector_scan
- refs/heads/abadams/vst_type_fix
- refs/heads/abadams/widening_let_bug
- refs/heads/abadams/x86_avg
- refs/heads/abadams/zen4
- refs/heads/adadams/profile_allocator
- refs/heads/add_image_checks_after_bounds_inference_plus_new_rules
- refs/heads/add_outermost_to_extern
- refs/heads/add_vectorization_to_search_space
- refs/heads/aelphy/broadcast_q8
- refs/heads/aelphy/f16x16
- refs/heads/aelphy/feature_cadence_changes
- refs/heads/aelphy/float_extracts
- refs/heads/aelphy/i32_sat_widening_shift_right
- refs/heads/aelphy/sqrt_f16
- refs/heads/aelphy/vector_loads_f16_f32
- refs/heads/align_loads_comment_fix
- refs/heads/alina-strided-store
- refs/heads/another_buffer_copy_fix
- refs/heads/arm_sve_redux
- refs/heads/ataei-block_asserts-codegen
- refs/heads/ataei-debug_info
- refs/heads/ataei-fix-pow
- refs/heads/ataei-gen_str_param
- refs/heads/ataei-implicit_lhs_vars
- refs/heads/ataei-onnx
- refs/heads/ataei-onnx_converter_update
- refs/heads/ataei-onnx_pybind
- refs/heads/ataei-resnet50_benchmarks
- refs/heads/ataei-standalone_autoscheduler
- refs/heads/ataei_lots_of_inputs
- refs/heads/auto_sched_benchmarks
- refs/heads/auto_sched_estimates
- refs/heads/auto_sched_inline
- refs/heads/auto_sched_test_notparallel
- refs/heads/autoschedule_top_down
- refs/heads/autoschedule_with_convnet
- refs/heads/autoscheduler_scalar_imageparam_fix
- refs/heads/backports/10.x
- refs/heads/backports/11.x
- refs/heads/backports/12.x
- refs/heads/backports/13.x
- refs/heads/balance_expressions
- refs/heads/bazel
- refs/heads/benchmarks
- refs/heads/blaze
- refs/heads/bounds_buffer_lets_fix
- refs/heads/bounds_correct_vs_bounds_loaded_reduced
- refs/heads/buffer_device_api_target
- refs/heads/bug_device_free
- refs/heads/bug_inline_unbounded
- refs/heads/build-proposals
- refs/heads/build/fix-xcode-2
- refs/heads/build/manylinux-fixes
- refs/heads/circ_buffer
- refs/heads/cmake-no-runtime-debug-symbols
- refs/heads/cmake/asan
- refs/heads/cmake/deps-cleanup
- refs/heads/cmake/find-modules
- refs/heads/cmake/spirv
- refs/heads/cmake_wasm_features
- refs/heads/compute_at_guard_with_if_goes_on_stack
- refs/heads/compute_with_at
- refs/heads/compute_with_check
- refs/heads/compute_with_excessive_bounds
- refs/heads/compute_with_inlined
- refs/heads/compute_with_remove_is_right_level
- refs/heads/cpack/nuget
- refs/heads/ctest/wrappers
- refs/heads/cuda-constant
- refs/heads/d3d12-allocation-cache
- refs/heads/deferred_cse_after_inlining
- refs/heads/destructor_calls_deinit
- refs/heads/dg/deserialize_unmapped_objects
- refs/heads/dg/fix_vulkan_codegen_bool_conversion
- refs/heads/dg/fix_vulkan_gpu_vars
- refs/heads/dg/vulkan_conform_api
- refs/heads/dg/vulkan_load_lib
- refs/heads/dg/vulkan_region_allocator_fixes
- refs/heads/dgerstmann/fix-vulkan-memory-config-init
- refs/heads/disable_acquire_release_test_vulkan
- refs/heads/distinct_wrapper_names
- refs/heads/dkg/6863_asan_fixes
- refs/heads/dkg/vulkan
- refs/heads/dpalermo_dmabuf
- refs/heads/dpalermo_dmabuf_libion
- refs/heads/dpalermo_hexagon_remote_202003
- refs/heads/dpalermo_sdk4_2_0_2
- refs/heads/ds/buffer-get-pure
- refs/heads/ds/opt-tile-size
- refs/heads/ds/tail-none
- refs/heads/ds/while
- refs/heads/dsharletg/bitwise-intrinsics
- refs/heads/dsharletg/find-vector-reduce
- refs/heads/dsharletg/jit-optimization
- refs/heads/dsharletg/memcpy-copy_from
- refs/heads/dsharletg/pattern-headroom
- refs/heads/dsharletg/refactor-host-alignment
- refs/heads/dsharletg/runtime-size
- refs/heads/dsharletg/simplify-abs
- refs/heads/dsharletg/simplify-type-bounds
- refs/heads/dsharletg/specialize-bounds
- refs/heads/dsharletg/upsample-channels
- refs/heads/empty_prefetch
- refs/heads/emscripten_vector_fix
- refs/heads/export_all-wsmoses
- refs/heads/expr_auto_sched
- refs/heads/extern_bugs
- refs/heads/extern_host_alloc
- refs/heads/factor_parallel_codegen_hack
- refs/heads/fast_sync_tsan
- refs/heads/faster_integer_division
- refs/heads/feature/apps-external
- refs/heads/feature/cmake-presets
- refs/heads/feature/convert
- refs/heads/feature/f16_interleave
- refs/heads/feature/gather_load_q7
- refs/heads/feature/gather_load_undefined_ramp
- refs/heads/feature/llvm-codemodel
- refs/heads/feature/load_predicated
- refs/heads/feature/luma_regression
- refs/heads/feature/maintanence
- refs/heads/feature/reinterprets
- refs/heads/feature/tcm_bump_allocator
- refs/heads/feature/xtensa_fix_interleave_q8
- refs/heads/feature/xtensa_q8_tests
- refs/heads/find_intrinsics_issue
- refs/heads/find_intrinsics_widening_lets
- refs/heads/fix-7854
- refs/heads/fix-8257
- refs/heads/fix-floated-pure-stage
- refs/heads/fix-race-condition
- refs/heads/fix_hexagon_alignment
- refs/heads/fix_hvx_intrinsics
- refs/heads/fix_prefetch_test
- refs/heads/fix_windows_vs15_build
- refs/heads/fixed_length_vectors
- refs/heads/fixed_point_local_laplac
- refs/heads/gemmlowp
- refs/heads/generate
- refs/heads/gha/pip
- refs/heads/gpu_canon_fix
- refs/heads/halide_extended_exp
- refs/heads/halide_ir_flatbuffer
- refs/heads/hex_dma2_async
- refs/heads/hexagon_le_runtime
- refs/heads/hexagon_priority
- refs/heads/hexagon_setpriority
- refs/heads/hexagon_strided_pred_load
- refs/heads/hexagon_sysmon_markers
- refs/heads/imaging-synthesis
- refs/heads/includes_fix
- refs/heads/ios_fast_sync_fix
- refs/heads/issues/7979-arm-cpu-features
- refs/heads/jia-kai-fix-runtime-cuda-init
- refs/heads/kamil-openglcompute-infinity
- refs/heads/kamil/name_pthread_workers
- refs/heads/kp_bit_shift
- refs/heads/line_buffer
- refs/heads/loop_carry_not_working
- refs/heads/lower_on_huge_stack
- refs/heads/main
- refs/heads/memoize_with_extents
- refs/heads/metal_float16
- refs/heads/metaprogrammed_simplifier_mod
- refs/heads/mohamedadaly-vmlal
- refs/heads/more_powerful_sliding
- refs/heads/new_autoschedule_with_new_simplifier_arm_worker_branch
- refs/heads/new_autoscheduler
- refs/heads/new_simplifier_rule_testing
- refs/heads/newer_ion_ioctl
- refs/heads/no_bounds_query_when_bounds_used
- refs/heads/opengl_compute_buffer_types_fix
- refs/heads/openglcompute_reuse_shared_allocations
- refs/heads/optmize_reorder
- refs/heads/par_for_opt
- refs/heads/pdb/fix_7806
- refs/heads/pdb/hexagon_remote_cmake
- refs/heads/pdb_add_libcpp_makefile_inc
- refs/heads/pdb_eliminate_interleaves_test
- refs/heads/pdb_fix_clang_build
- refs/heads/pdb_fix_install_qc
- refs/heads/pdb_fix_loop_carry
- refs/heads/pdb_fix_simd_op_check_hvx
- refs/heads/pdb_mul_div_mod_multi_thread
- refs/heads/pdb_remove_hvx_v64
- refs/heads/perform_inline_with_order
- refs/heads/pr/2572
- refs/heads/pr/2676
- refs/heads/pr/2975
- refs/heads/pr/3017
- refs/heads/pr/3081
- refs/heads/pr/3387
- refs/heads/pr/3939
- refs/heads/pr/3960
- refs/heads/pr/4380
- refs/heads/pr/4414
- refs/heads/pr/5331
- refs/heads/pr/5438
- refs/heads/pr/5455
- refs/heads/pr/5758_2
- refs/heads/predicated_vector
- refs/heads/prefetch_specialize
- refs/heads/print_schedule
- refs/heads/profile_hardware_counters
- refs/heads/random-pipelines
- refs/heads/rdom_with_pure_vars
- refs/heads/readme-fix-gcd
- refs/heads/realization_order
- refs/heads/refactor_module
- refs/heads/register_promotion
- refs/heads/release/10.x
- refs/heads/release/11.x
- refs/heads/release/12.x
- refs/heads/release/13.x
- refs/heads/release/14.x
- refs/heads/release/15.x
- refs/heads/release/16.x
- refs/heads/release/17.x
- refs/heads/release/8.x
- refs/heads/remove_max_on_fuse_factor
- refs/heads/reorder_rvar
- refs/heads/reset_unique_counter
- refs/heads/revert-3612-ataei-speedup_compiletime
- refs/heads/revert-7009-rootjalex/distribute-w_shl
- refs/heads/revert-7601-compile_hexagon_remote
- refs/heads/riscv_update
- refs/heads/rl_simplifier_rules
- refs/heads/rootjalex/add_simpl_rules
- refs/heads/rootjalex/arm-optimize
- refs/heads/rootjalex/autoscheduler_mcts
- refs/heads/rootjalex/bounds-rewriter
- refs/heads/rootjalex/bounds_synthesis
- refs/heads/rootjalex/cbounds
- refs/heads/rootjalex/cbounds_predicated
- refs/heads/rootjalex/fix-reinterpret-cmp
- refs/heads/rootjalex/fix-sat-overflow
- refs/heads/rootjalex/fix_estimate_issue
- refs/heads/rootjalex/fix_failed_unrolls
- refs/heads/rootjalex/gsoc_codegen
- refs/heads/rootjalex/improve_cbounds_fixed
- refs/heads/rootjalex/improve_constant_bounds
- refs/heads/rootjalex/pitchfork-arm
- refs/heads/rootjalex/reinterpret-simplify
- refs/heads/rootjalex/rts
- refs/heads/rootjalex/super_simplify_bounds
- refs/heads/rootjalex/test_cbounds_fixed
- refs/heads/rootjalex/test_constant_bounds
- refs/heads/rootjalex/trs-codegen
- refs/heads/rootjalex/trs-codegen-cross
- refs/heads/rootjalex/trs-merge
- refs/heads/rootjalex/uint32-int32-cast
- refs/heads/rootjalex/x86-hadds
- refs/heads/rootjalex/x86-optimize
- refs/heads/rootjalex/x86-optimize-test
- refs/heads/rootjalex/x86-test
- refs/heads/rule_removal_experiments
- refs/heads/schedule-output-storage
- refs/heads/separate_bounds_query_entrypoint
- refs/heads/shallow
- refs/heads/shift_amount_type_change
- refs/heads/shoaibkamil/cmake-without-arm
- refs/heads/shoaibkamil/correct_memory_fences
- refs/heads/shoaibkamil/d3d-fixes
- refs/heads/shoaibkamil/deprecate_openglcompute
- refs/heads/shoaibkamil/json
- refs/heads/shoaibkamil/llvm_clone_tag
- refs/heads/shoaibkamil/minor-vcpkg-doc-change
- refs/heads/shoaibkamil/opengl_compute_tests
- refs/heads/shoaibkamil/performance_tests_as_generators
- refs/heads/shoaibkamil/rule_removal_experiments
- refs/heads/shoaibkamil/super_simplify_with_interpreter
- refs/heads/shoaibkamil/windows-arm-fix-attributes
- refs/heads/sim_shlib_addr_print
- refs/heads/simplify-nested-broadcasts
- refs/heads/simplify-vectorreduce-shuffles2
- refs/heads/simplify_mod
- refs/heads/sioutas_2020
- refs/heads/sioutas_2020_autoscheduler
- refs/heads/slomp/gpu-codegen-profiling
- refs/heads/slomp/msvc-static-analysis
- refs/heads/solve_div
- refs/heads/solve_div_simplifier_test
- refs/heads/sr/python-late-binding-defaults
- refs/heads/srj-aaa
- refs/heads/srj-alloc
- refs/heads/srj-alloca
- refs/heads/srj-appmake2
- refs/heads/srj-armv83a
- refs/heads/srj-aslog
- refs/heads/srj-assert
- refs/heads/srj-assoc
- refs/heads/srj-auto-multi
- refs/heads/srj-auto-multi2
- refs/heads/srj-auto_schedule_mat_mul
- refs/heads/srj-autosched
- refs/heads/srj-b2cpphide
- refs/heads/srj-barr
- refs/heads/srj-bits
- refs/heads/srj-blacklist
- refs/heads/srj-bounds
- refs/heads/srj-bufcalltype
- refs/heads/srj-bufcallwrap
- refs/heads/srj-bufcallwrap2
- refs/heads/srj-buffer
- refs/heads/srj-bv
- refs/heads/srj-classic-autotune
- refs/heads/srj-clean
- refs/heads/srj-constcall
- refs/heads/srj-crosscompile
- refs/heads/srj-ctlz
- refs/heads/srj-cvec-patch
- refs/heads/srj-dag
- refs/heads/srj-debug-to-file
- refs/heads/srj-deir
- refs/heads/srj-f16
- refs/heads/srj-fp16
- refs/heads/srj-fsch
- refs/heads/srj-fthru
- refs/heads/srj-g2
- refs/heads/srj-g3
- refs/heads/srj-gha-test-fixes
- refs/heads/srj-hidden
- refs/heads/srj-hide2
- refs/heads/srj-hvx
- refs/heads/srj-hvx-bug
- refs/heads/srj-hvx-codegen-bug
- refs/heads/srj-hvx-nocopy
- refs/heads/srj-hvxshift
- refs/heads/srj-iib
- refs/heads/srj-initshape
- refs/heads/srj-inv
- refs/heads/srj-ir
- refs/heads/srj-irmut2
- refs/heads/srj-iwyu
- refs/heads/srj-iwyu3
- refs/heads/srj-javascript_work_in_progress
- refs/heads/srj-lensblur
- refs/heads/srj-lessinc
- refs/heads/srj-llvm-loop-opt
- refs/heads/srj-mak
- refs/heads/srj-maxthreads
- refs/heads/srj-mod
- refs/heads/srj-msan
- refs/heads/srj-msan-call
- refs/heads/srj-muldivmod
- refs/heads/srj-mut
- refs/heads/srj-outputs-2
- refs/heads/srj-parse
- refs/heads/srj-pch
- refs/heads/srj-printfunc
- refs/heads/srj-pygp
- refs/heads/srj-revertbits
- refs/heads/srj-schedule-storage
- refs/heads/srj-shl-shr-2
- refs/heads/srj-sio
- refs/heads/srj-static-const
- refs/heads/srj-strided-store
- refs/heads/srj-tidyh
- refs/heads/srj-tiff
- refs/heads/srj-trace
- refs/heads/srj-tutorial
- refs/heads/srj-using
- refs/heads/srj-wasmfix
- refs/heads/srj-xor2
- refs/heads/srj/abstract-gen-without-get-output-func-KEEP
- refs/heads/srj/aligned-alloc
- refs/heads/srj/aligned-alloc-2
- refs/heads/srj/aligned-malloc-with-aligned-alloc
- refs/heads/srj/all-explicit-ctor
- refs/heads/srj/anderson-thread-info-ptr
- refs/heads/srj/aot-perf
- refs/heads/srj/argv-signatures
- refs/heads/srj/argv-types
- refs/heads/srj/async-test
- refs/heads/srj/b2cpp-const-data
- refs/heads/srj/better-xt-dispatch
- refs/heads/srj/bfloat1
- refs/heads/srj/bp
- refs/heads/srj/build_halide_h
- refs/heads/srj/c-bool
- refs/heads/srj/cache-clear
- refs/heads/srj/clang-fmt-ignore
- refs/heads/srj/clang-tidy
- refs/heads/srj/clear-c-cache
- refs/heads/srj/cmake-asan
- refs/heads/srj/cmake-asan2
- refs/heads/srj/cmake-jit-generators
- refs/heads/srj/configure-cmake
- refs/heads/srj/cpp-generator-v2-experiment-KEEP
- refs/heads/srj/crosscompile
- refs/heads/srj/csv
- refs/heads/srj/ctad
- refs/heads/srj/delete-introspection
- refs/heads/srj/depr
- refs/heads/srj/deprecation
- refs/heads/srj/device-copy
- refs/heads/srj/example
- refs/heads/srj/experiment
- refs/heads/srj/experiment-6967
- refs/heads/srj/exporting
- refs/heads/srj/expr_t
- refs/heads/srj/external-tensors
- refs/heads/srj/f16-convert
- refs/heads/srj/fix-pytorch
- refs/heads/srj/fixed-rollback
- refs/heads/srj/fopen-fix
- refs/heads/srj/forward
- refs/heads/srj/forward-name
- refs/heads/srj/gen-func
- refs/heads/srj/gen-func-2
- refs/heads/srj/gen-func-3
- refs/heads/srj/gen2-1
- refs/heads/srj/gen_closure
- refs/heads/srj/generator_aot_gpu_multi_context_threaded
- refs/heads/srj/globals
- refs/heads/srj/halide-buffer-crop
- refs/heads/srj/halide-malloc-alignment
- refs/heads/srj/halide-must-use
- refs/heads/srj/halide-runtime-must-use-result
- refs/heads/srj/hallmark
- refs/heads/srj/hang-repro
- refs/heads/srj/hannk
- refs/heads/srj/hannk-aliasing
- refs/heads/srj/hannk-error-checking
- refs/heads/srj/hannk-errors
- refs/heads/srj/hannk-inplace
- refs/heads/srj/hannk-mmap
- refs/heads/srj/hannk-tflite-27
- refs/heads/srj/hannk-verbosity
- refs/heads/srj/hdrs
- refs/heads/srj/html-becomes-viz
- refs/heads/srj/implicit-mult-widening
- refs/heads/srj/issue-7076
- refs/heads/srj/iwyu
- refs/heads/srj/iwyu-2
- refs/heads/srj/iwyu-6
- refs/heads/srj/libHANNK
- refs/heads/srj/llvm_type_of
- refs/heads/srj/lossless-test
- refs/heads/srj/maybe-unused
- refs/heads/srj/meanop
- refs/heads/srj/metadata-calling-convention
- refs/heads/srj/more-tidy
- refs/heads/srj/msan-dtf
- refs/heads/srj/multimeta
- refs/heads/srj/nanobind
- refs/heads/srj/new-rt-1
- refs/heads/srj/no-threadpool
- refs/heads/srj/no-timeout-thread
- refs/heads/srj/oglc-mutexed
- refs/heads/srj/param-map
- refs/heads/srj/pip-15.x
- refs/heads/srj/pip-cron
- refs/heads/srj/possible-uninited
- refs/heads/srj/pr-7566
- refs/heads/srj/printer-size
- refs/heads/srj/profiler-data-race
- refs/heads/srj/ptr-int-cast
- refs/heads/srj/pyapps
- refs/heads/srj/pyext-fix
- refs/heads/srj/pygen-class
- refs/heads/srj/pygen-deux
- refs/heads/srj/pygen-func
- refs/heads/srj/pygen-native-types
- refs/heads/srj/pyinstall
- refs/heads/srj/pypi-try
- refs/heads/srj/pystuff
- refs/heads/srj/python-buffer-unpack
- refs/heads/srj/python-tutorial
- refs/heads/srj/reshape
- refs/heads/srj/rt-error-smallify
- refs/heads/srj/rt-return-types
- refs/heads/srj/runtime-error-handling
- refs/heads/srj/sat-fixes-exp
- refs/heads/srj/sat-fixes-exp-2
- refs/heads/srj/shadow-field
- refs/heads/srj/snprintf
- refs/heads/srj/spirv-license
- refs/heads/srj/stat-buf-deprecations
- refs/heads/srj/static-buffer-generators
- refs/heads/srj/stmt-html
- refs/heads/srj/stringify
- refs/heads/srj/synth-gen-params
- refs/heads/srj/synth-params-python
- refs/heads/srj/test-arm_sve_redux
- refs/heads/srj/test-intrinsics-bounds
- refs/heads/srj/test8076
- refs/heads/srj/test8078
- refs/heads/srj/test8094
- refs/heads/srj/test8105a
- refs/heads/srj/test8115
- refs/heads/srj/test_tmpdir_fix
- refs/heads/srj/tidy
- refs/heads/srj/tidy-format-14
- refs/heads/srj/tidymore
- refs/heads/srj/tidymore2
- refs/heads/srj/tls
- refs/heads/srj/tls-3
- refs/heads/srj/tls-4
- refs/heads/srj/tls-ucon
- refs/heads/srj/tmp-unschedule-experiment
- refs/heads/srj/tot-fix
- refs/heads/srj/try-revert-sat
- refs/heads/srj/type-traits
- refs/heads/srj/typed-func
- refs/heads/srj/ucon-all-const
- refs/heads/srj/ucon-non-const
- refs/heads/srj/visit-warnings
- refs/heads/srj/wasm-atomic2
- refs/heads/srj/wasm-simd
- refs/heads/srj/wasm-stuff
- refs/heads/srj/wasm-threads
- refs/heads/srj/wasm-updates
- refs/heads/srj/wasm-work
- refs/heads/srj/wip
- refs/heads/srj/x-rounding
- refs/heads/srj/xbuf
- refs/heads/srj/xc+plus+size+tmp
- refs/heads/srj/xc-types
- refs/heads/srj/xt-uint-cast-test
- refs/heads/srj/xtensa-arch
- refs/heads/srj/xtensa-merge
- refs/heads/srj/xvc-experimetn
- refs/heads/srj/zlib-embed
- refs/heads/srjj/backport-vulkan-fixes
- refs/heads/standalone_autoscheduler
- refs/heads/standalone_autoscheduler_arm_worker
- refs/heads/standalone_autoscheduler_arm_worker_amazon
- refs/heads/standalone_autoscheduler_gpu
- refs/heads/standalone_autoscheduler_hexagon
- refs/heads/sticky_task_assignments
- refs/heads/store_with
- refs/heads/store_with_solver_for_super_simplify
- refs/heads/strict_float_cse_fix
- refs/heads/super_simplify
- refs/heads/super_simplify_v2
- refs/heads/super_simplify_v3
- refs/heads/transitive_wrapper
- refs/heads/trigger-release-v16
- refs/heads/tzumao-autodiff-boundarycond
- refs/heads/tzumao-gradient-autoscheduler-bug
- refs/heads/tzumao-predicate-store-load
- refs/heads/tzumao-python-buffer
- refs/heads/tzumao_autodiff_unbounded
- refs/heads/tzumao_improve_gradient_autoscheduler
- refs/heads/tzumao_issue_4297
- refs/heads/tzumao_licm_before_BI
- refs/heads/unbounded_bugs
- refs/heads/undo_async_copy_chain_black_list
- refs/heads/use_string_literals_for_blobs
- refs/heads/users/lukas/python-pip
- refs/heads/validate_sched_error_msg
- refs/heads/var_ir_fix
- refs/heads/vksnk/async-experiment
- refs/heads/vksnk/async-multiple-producers
- refs/heads/vksnk/async-order
- refs/heads/vksnk/better-loop-carry
- refs/heads/vksnk/better-message
- refs/heads/vksnk/bound-storage
- refs/heads/vksnk/bounds-widen-right
- refs/heads/vksnk/c-print-type
- refs/heads/vksnk/c-round
- refs/heads/vksnk/check-return-result
- refs/heads/vksnk/compute-with-bug
- refs/heads/vksnk/compute_with_async
- refs/heads/vksnk/dma-limit-channels
- refs/heads/vksnk/dma-min-max
- refs/heads/vksnk/expr-match-shuffle
- refs/heads/vksnk/extract-from-scalar
- refs/heads/vksnk/f16-load
- refs/heads/vksnk/fix-packvr
- refs/heads/vksnk/fix_halide_xtensa_narrow_with_rounding_shift_i16
- refs/heads/vksnk/fused-compute-with
- refs/heads/vksnk/hoist-storage-bug
- refs/heads/vksnk/lerp-intrinsics
- refs/heads/vksnk/lower-signed-shifts
- refs/heads/vksnk/missing-exception
- refs/heads/vksnk/non-widening-halves
- refs/heads/vksnk/optimize-shuffles
- refs/heads/vksnk/replace-all
- refs/heads/vksnk/restrict
- refs/heads/vksnk/roll-buffer
- refs/heads/vksnk/roundeven-arm
- refs/heads/vksnk/rvar-bounds
- refs/heads/vksnk/simplify-slice
- refs/heads/vksnk/skip-semaphores
- refs/heads/vksnk/storage-folding
- refs/heads/vksnk/strided-load-of-4_2
- refs/heads/vksnk/typed-scope
- refs/heads/vksnk/update-simd-driver
- refs/heads/vksnk/vectorize-bug
- refs/heads/vksnk/vectorize-scalarize
- refs/heads/vksnk/widening_absd
- refs/heads/vksnk/xtensa-codegen-fp16
- refs/heads/vksnk/xtensa-dma-improvements
- refs/heads/vksnk/xtensa-regroup-pass
- refs/heads/vksnk/xtensa/lift-allocs
- refs/heads/vulkan
- refs/heads/vulkan-diagnose-alloc-failures
- refs/heads/vulkan-phase0-adts
- refs/heads/vulkan-phase1-spirv
- refs/heads/vulkan-phase2-runtime
- refs/heads/vulkan2
- refs/heads/vulkan_fix_gpu_dynamic_shared_test
- refs/heads/vulkan_fix_subregion_memory_offsets
- refs/heads/webassembly-old
- refs/heads/winograd
- refs/heads/wording_fix
- refs/heads/xtensa-codegen
- refs/heads/xtensa-codegen-parallel
- refs/heads/xuanda/fix-serialize-bad-partition-always
- refs/remotes/origin/rootjalex/add_autosched_caching
- refs/tags/release_2018_02_15
- refs/tags/release_2019_08_27
- refs/tags/release_8.0.0
- refs/tags/v10.0.0
- refs/tags/v10.0.1
- refs/tags/v11.0.0
- refs/tags/v11.0.1
- refs/tags/v12.0.0
- refs/tags/v12.0.1
- refs/tags/v13.0.0
- refs/tags/v13.0.1
- refs/tags/v13.0.2
- refs/tags/v13.0.3
- refs/tags/v13.0.4
- refs/tags/v14.0.0
- refs/tags/v15.0.0
- refs/tags/v15.0.1
- refs/tags/v16.0.0
- refs/tags/v17.0.0
- refs/tags/v17.0.1
- refs/tags/v8.0.0
Take a new snapshot of a software origin
If the archived software origin currently browsed is not synchronized with its upstream version (for instance when new commits have been issued), you can explicitly request Software Heritage to take a new snapshot of it.
Use the form below to proceed. Once a request has been submitted and accepted, it will be processed as soon as possible. You can then check its processing state by visiting this dedicated page.Processing "take a new snapshot" request ...
Permalinks
To reference or cite the objects present in the Software Heritage archive, permalinks based on SoftWare Hash IDentifiers (SWHIDs) must be used.
Select below a type of object currently browsed in order to display its associated SWHID and permalink.
Revision | Author | Date | Message | Commit Date |
---|---|---|---|---|
3a24168 | Steven Johnson | 05 June 2024, 17:10:56 UTC | Merge branch 'abadams/fix_lossless_cast_of_sub' into srj/lossless-test | 05 June 2024, 17:10:56 UTC |
360add6 | Steven Johnson | 05 June 2024, 17:07:21 UTC | Merge branch 'main' into abadams/fix_lossless_cast_of_sub | 05 June 2024, 17:07:21 UTC |
69bfd80 | Steven Johnson | 05 June 2024, 16:35:38 UTC | Merge branch 'main' into xtensa-codegen | 05 June 2024, 16:35:38 UTC |
5b5b0c6 | Steven Johnson | 05 June 2024, 16:26:46 UTC | Merge branch 'main' into xtensa-codegen | 05 June 2024, 16:26:46 UTC |
74b9044 | Andrew Adams | 05 June 2024, 16:24:06 UTC | It's generally a bad idea for simplifier rules to multiply constants (#8234) Fixes #8227 but may break other things. Needs thorough testing. Also, there are more rules like this lurking. | 05 June 2024, 16:24:06 UTC |
c33dbfb | Andrew Adams | 04 June 2024, 23:01:15 UTC | Don't introduce out-of-range shifts in lossless_cast | 04 June 2024, 23:01:15 UTC |
46e866d | Martijn Courteaux | 04 June 2024, 16:32:54 UTC | Report useful error to user if the promise_clamp all fails to losslessly cast. (#8238) Co-authored-by: Steven Johnson <srj@google.com> | 04 June 2024, 16:32:54 UTC |
775bfbf | Jason Ansel | 04 June 2024, 16:31:30 UTC | Python binding support for int64 literals (#8254) This makes >32bit python integers get mapped to `hl.i64` implicitly. Fixes #8224 | 04 June 2024, 16:31:30 UTC |
9c75554 | Shoaib Kamil | 04 June 2024, 15:21:04 UTC | Fix Metal handling for float16 literals (#8260) * Fix Metal handling of float16 from bits, infinity, neg infinity, and nans * Disable test for OpenCL half for now * Formatting | 04 June 2024, 15:21:04 UTC |
7414ee6 | Andrew Adams | 03 June 2024, 20:37:04 UTC | Delete commented-out code | 03 June 2024, 20:37:04 UTC |
9570818 | Andrew Adams | 03 June 2024, 19:09:56 UTC | Fix mul_shift_right expansion | 03 June 2024, 19:09:56 UTC |
c8f7e8f | Andrew Adams | 03 June 2024, 18:39:47 UTC | Fix bugs in lossless_cast | 03 June 2024, 18:39:47 UTC |
ac5b13d | Andrew Adams | 03 June 2024, 18:39:39 UTC | Rework find_mpy_ops to handle more structures | 03 June 2024, 18:39:39 UTC |
1ef63f7 | Steven Johnson | 03 June 2024, 17:49:00 UTC | Merge branch 'main' into xtensa-codegen | 03 June 2024, 17:49:00 UTC |
bf28e00 | Andrew Adams | 02 June 2024, 21:46:23 UTC | Merge remote-tracking branch 'origin/main' into abadams/fix_lossless_cast_of_sub | 02 June 2024, 21:46:23 UTC |
7ca95d8 | Jason Ansel | 02 June 2024, 21:39:44 UTC | Expose BFloat in Python bindings (#8255) There are two parts to support for BFloat16 in Python: 1) Ability to define kernels and AOT compile them [fixed in this PR] 2) Ability to call kernels from Python This fixes part 1, which is what I need for my use case. Part 2 is blocked on bfloat16 support in Python buffer protocols. See #6849 for more details. | 02 June 2024, 21:39:44 UTC |
a0f1d23 | Andrew Adams | 02 June 2024, 21:36:57 UTC | Add constant interval test | 02 June 2024, 21:36:57 UTC |
7cf2951 | Jason Ansel | 02 June 2024, 21:34:36 UTC | Remove max size assert from Anderson2021 (#8253) Fixes #8252 | 02 June 2024, 21:34:36 UTC |
a9b8fbf | Andrew Adams | 02 June 2024, 21:33:45 UTC | Rework the simplifier to use ConstantInterval for bounds (#8222) * Update the simplifier to use ConstantInterval and track the bounds through more types * Move the simplify fuzzer back to a correctness test * Make debug_indent not static Otherwise it causes a race condition in any parallel tests * Track expr info on non-overflowing casts to int * Delete commented-out code * clang-tidy * Delete unused member * Fix cmakelists for the fuzzer removal * Handle contradictions more gracefully in learn_true The contradiction was arising from: if (extent > 0) { ... } else { for (x = 0; x < extent; x++) { In here we can assume extent > 0, but we also know from the if statement that extent <= 0 } } * Better comments * Address review comments * Fix failure to pop loop var info | 02 June 2024, 21:33:45 UTC |
35143d2 | Martijn Courteaux | 02 June 2024, 21:19:04 UTC | Mark host_dirty() and device_dirty() with no_discard. (#8248) Co-authored-by: Steven Johnson <srj@google.com> | 02 June 2024, 21:19:04 UTC |
711dc88 | Cheng Wang | 31 May 2024, 17:53:47 UTC | Add HVX_v68 target to support Hexagon HVX v68. (#8232) | 31 May 2024, 17:53:47 UTC |
3ea4747 | Misha Gutman | 30 May 2024, 17:27:38 UTC | [xtensa] added support for sqrt_f16 (#8247) | 30 May 2024, 17:27:38 UTC |
33d5ba9 | Andrew Adams | 24 May 2024, 19:56:03 UTC | Fix saturating add matching in associativity checking (#8220) * Fix saturating add matching in associativity checking The associative ops table defined saturating add as saturating_narrow(widen(x + y)), instead of saturating_narrow(widen(x) + y) | 24 May 2024, 19:56:03 UTC |
b5f5065 | Andrew Adams | 23 May 2024, 18:17:49 UTC | Add some EVAL_IN_LAMBDAs to Simplify_Sub.cpp (#8230) Massively reduces compile time and peak cl.exe memory consumption on windows (from 9.5gb down to 2.3gb). Simplify_LT.cpp has these same EVAL_IN_LAMBDAs, which is probably why it hasn't been causing build problems. | 23 May 2024, 18:17:49 UTC |
8a316d1 | Misha Gutman | 23 May 2024, 16:23:09 UTC | [xtensa] Added vector load for two vectors for f16 and f32 (#8226) | 23 May 2024, 16:23:09 UTC |
17d4351 | Steven Johnson | 20 May 2024, 16:37:14 UTC | Merge branch 'main' into xtensa-codegen | 20 May 2024, 16:37:14 UTC |
e9f8b04 | Steven Johnson | 15 May 2024, 21:43:17 UTC | Fix for top-of-tree LLVM (#8223) * Fix for top-of-tree LLVM * Update LLVM_Runtime_Linker.cpp | 15 May 2024, 21:43:17 UTC |
16d77e9 | Andrew Adams | 15 May 2024, 17:43:34 UTC | Fix give-up case in ModulusRemainder (#8221) A default-constructed ModulusRemainder means no information, which is what we want here. ModulusRemainder{0, 1} means the constant one! | 15 May 2024, 17:43:34 UTC |
211bafa | Alexander Root | 14 May 2024, 20:15:57 UTC | Fix Reinterpret cmp in IREquality (#8217) fix Reinterpret cmp | 14 May 2024, 20:15:57 UTC |
d61390a | Misha Gutman | 10 May 2024, 17:55:47 UTC | [xtensa] Fixed index conversion for gather_load with undefined ramp (#8215) Fixed index conversion for gather_load with undefined ramp | 10 May 2024, 17:55:47 UTC |
dfaf6ad | Steven Johnson | 30 April 2024, 15:08:26 UTC | Insert apparently-missing `break;` in IREquality.cpp (#8211) * Insert apparently-missing `break;` in IREquality.cpp * Enable -Wimplicit-fallthrough * Also add -Wimplicit-fallthrough to runtime builds * Add missing break to runtime/webgpu.cpp * Also add flag to Makefile --------- Co-authored-by: Andrew Adams <andrew.b.adams@gmail.com> | 30 April 2024, 15:08:26 UTC |
8141197 | Alexander Root | 30 April 2024, 13:38:30 UTC | [x86 & HVX & WASM] Use bounds inference for saturating_narrow instruction selection (#7805) * x86 bounds inference for saturating_narrow * bounds inference for HVX too * use can_represent(ConstantInterval) + clang-format * use bounds inference for WASM IS too + add tests * add tracking issue for scoped constant bounds * add TODO about lossless_cast usage --------- Co-authored-by: Steven Johnson <srj@google.com> | 30 April 2024, 13:38:30 UTC |
d55d82b | Steven Johnson | 29 April 2024, 16:38:30 UTC | Update debug_to_file API to remove type_code (#8183) * Add .npy support to halide_image_io The .npy format is NumPy's native format for storing multidimensional arrays (aka tensors/buffers). Being able to load/save in this format makes it (potentially) a lot easier to interchange data with the Python ecosystem, as well as providing a file format that support floating-point data more robustly than any of the others that we current support. This adds load/save support for a useful subset: - We support the int/uint/float types common in Halide (except for f16/bf16 for now) - We don't support reading or writing files that are in `fortran_order` - We don't support any object/struct/etc files, only numeric primitives - We only support loading files that are in the host's endianness (typically little-endian) Note that at present this doesn't support f16 / bf16 formats, but that could likely be added with minimal difficulty. The tricky bit of this is that the reading code has to parse a (limited) Python dict in text form. Please review that part carefully. TODO: we could probably add this as an option for `debug_to_file()` without too much pain in a followup PR. * clang-tidy * clang-tidy * Address review comments * Allow for "keys" as well as 'keys' * Add .npy support to debug_to_file() Built on top of https://github.com/halide/Halide/pull/8175, this adds .npy as an option. This is actually pretty great because it's easy to do something like ``` ss = numpy.load("my_file.npy") print(ss) ``` in Python and get nicely-formatted output, which can sometimes be a lot easier for debugging that inserting lots of print() statements (see https://github.com/halide/Halide/issues/8176) Did a drive-by change to the correctness test to use this format instead of .mat. * Add float16 support * Add support for Float16 images in npy * Assume little-endian * Remove redundant halide_error() * naming convention * naming convention * Test both mat and npy * Don't call halide_error() * Use old-school parser * clang-tidy * Update debug_to_file API to remove type_code * Clean up into single table * Update CodeGen_LLVM.cpp * Fix tmp codes * Update InjectHostDevBufferCopies.cpp * Update InjectHostDevBufferCopies.cpp * trigger buildbots | 29 April 2024, 16:38:30 UTC |
3101277 | Steven Johnson | 29 April 2024, 15:21:33 UTC | Merge branch 'main' into xtensa-codegen | 29 April 2024, 15:21:33 UTC |
8202163 | Andrew Adams | 28 April 2024, 21:39:41 UTC | More aggressively unify duplicate lets (#8204) * Make unify_duplicate_lets more aggressive The simplifier can also clean up most of these, but it's harder for it because it has to consider that other mutations may have taken place. Beefing this up has no impact on lowering times for most apps, but something pathological was going on for local_laplacian. At 20 pyramid levels, this speeds up lowering by 1.3x. At 50 pyramid levels it's 2.3x. At 100 pyramid levels it's 4.1x. It also slightly reduces binary size. * Clarify comment; Avoid double-lookup into the scope Looking up with an Expr key and deep equality is expensive, so this was bad. * Add a std::move | 28 April 2024, 21:39:41 UTC |
64caf31 | Andrew Adams | 28 April 2024, 21:38:54 UTC | Faster vars used tracking in simplify let visitor (#8205) * Speed up the vars_used visitor in the simplifier let visitor This visitor shows up as the main cost of lowering in very large pipelines. This visitor is for tracking which lets are actually used for real inside the body of a let block (as opposed to the tracking we do when mutating, which is approximate, because we could construct and Expr that uses a Var and then discard it in a later mutation). The old implementation made a map of all variables referenced, and then checked each let name against that map one by one. If there are a small number of lets outside a huge Stmt, this is bad, because the data structure has to hold a number of names proportional to the stmt size instead of proportional to the number of lets. This new implementation instead makes a hash set of the let names, and than traverses the Stmt, removing names from the set as they are encountered. This is a big speed-up. We then make the speed-up larger by about the same factor again doing the following: 1) Only add names to the map that might be used based on the recursive mutate call. These are very very likely to be used, because we saw them at least once, and mutations that remove *all* uses of a Var are rare. 2) The visitor should early out when the map becomes empty. The let variables are often all used immediately, so this is frequent. Speeds up lowering of local laplacian by 1.44x, 2.6x, and 4.8x respectively for 20, 50, and 100 pyramid levels. Speeds up lowering of resnet50 by 1.04x. Speeds up lowering of lens blur by 1.06x * Exploit the ref count of the replacement Expr * Fix is_sole_reference logic in Simplify_Let.cpp * Reduce hash map size | 28 April 2024, 21:38:54 UTC |
302aa1c | Andrew Adams | 25 April 2024, 18:58:23 UTC | Refactor ConstantInterval (#8179) * Make ConstantInterval more of a first-class thing and use it in Monotonic.cpp * Restore bound_correlated_differences calls * Elaborate on TODO * Handle some TODOs Also explicit ignore lossless_cast bugs that will be fixed in #8155 * Fix constant interval mod, clean up constant interval saturating cast * Improve comment * Avoid unsigned overflow * Fix the most obvious bug in lossless_cast, to make the fuzzer pass more * Skip over pipelines that fail the lossless_cast check * Drop iteration count on lossless_cast test * Add test to CMakeLists.txt * Avoid UB in constant_interval test (signed integer overflow of the scalars) * Restore accidentally-deleted line from CMakeLists.txt * Print on success * Handle Lets in constant_integer_bounds Also, plumb the cache through the recursive calls * Delete duplicate operator<< * Just always cast the bounds back to the range of the op type * Address review comments * Redo operator<< for ConstantIntervals * Improve comment; disable buggy code for now | 25 April 2024, 18:58:23 UTC |
e39497b | Andrew Adams | 21 April 2024, 03:43:38 UTC | Make Interval::is_single_point check for deep equality (#8202) * Make is_single_point compare min and max by deep equality Interval::is_single_point() used to only compare expressions by shallow equality to see if they are the same Expr object. However, bounds_of_expr_in_scope is really improved if it uses deep equality instead, so it has a prepass that goes over the provided scope, calls equal(min, max) on everything, and fixes up anything where deep equality is true but shallow equality. This prepass costs O(n) for n things in scope, regardless of how complex the expression being analyzed is. So if you ask for the bounds of '4' say in a context where there are lots of things in the scope, it's absurdly slow. We were doing this! BoxTouched calls bounds_of_expr_in_scope lots of times on small index Exprs within the same very large scope. It's better to just make Interval::is_single_point() check deep equality. This speeds up local laplacian lowering by 1.1x, and resnet50 lowering by 1.5x. There were also places where intervals that were a single point were diverging due to carelessly written code. E.g. the interval [40*8, 40*8], where both of those 40*8s are the same Mul node, was being simplified like this: interval.min = simplify(interval.min); interval.max = simplify(interval.max); Not only does this do double the simplification work it should, but it also caused something that was a single point to diverge into not being a single point, because the repeated constant-folding creates a new Expr. With the new is_single_point this matters a lot less, but even so, I centralized simplification of intervals into a single helper that doesn't do the pointless double-simplification for single points. Some of these shallowly-unequal but deeply-equal Intervals were being created in bounds inference itself after the prepass, which may have been generating suboptimal bounds. This change should fix that in addition to the compile-time benefits. Also added a simplify call in SkipStages because I noticed when it processed specializations it was creating things like (condition) || (!condition). | 21 April 2024, 03:43:38 UTC |
31c52ab | Andrew Adams | 19 April 2024, 19:59:34 UTC | Faster substitute_facts (#8200) * Fix computational complexity of substitute_facts It was O(n) for n facts. This makes it O(log(n)) This was particularly bad for pipelines with lots of inputs or outputs, because those pipelines have lots of asserts, which make for lots of facts to substitute in. Speeds up lowering of local laplacian with 20 pyramid levels (which has only one input and one output) by 1.09x Speeds up lowering of the adams 2019 cost model training pipeline (lots of weight inputs and lots outputs due to derivatives) by 1.5x Speeds up resnet50 (tons of weight inputs) lowering by 7.3x! * Add missing switch breaks * Add missing comments * Elaborate on why we treat NaNs as equal | 19 April 2024, 19:59:34 UTC |
dd1d0e8 | aankit-quic | 19 April 2024, 17:33:44 UTC | [HEXAGON] Keep support for hexagon_remote/Makefile (#8186) Update hexagon_remote/Makefile | 19 April 2024, 17:33:44 UTC |
4e0b313 | Andrew Adams | 18 April 2024, 19:48:59 UTC | Rewrite IREquality to use a more compact stack instead of deep recursion (#8198) * Rewrite IREquality to use a more compact stack instead of deep recursion Deletes a bunch of code and speeds up lowering time of local laplacian with 20 pyramid levels by ~2.5% * clang-tidy * Fold in the version of equal in IRMatch.h/cpp * Add missing switch breaks * Add missing comments * Elaborate on why we treat NaNs as equal | 18 April 2024, 19:48:59 UTC |
7994e70 | Andrew Adams | 16 April 2024, 21:27:43 UTC | Fix corner case in if_then_else simplification (#8189) Co-authored-by: Steven Johnson <srj@google.com> | 16 April 2024, 21:27:43 UTC |
3e712ba | Volodymyr Kysenko | 12 April 2024, 17:29:06 UTC | Disable fused mul-add for f16 while investigating | 12 April 2024, 17:29:06 UTC |
f247636 | Volodymyr Kysenko | 12 April 2024, 17:25:08 UTC | Merge branch 'main' into xtensa-codegen | 12 April 2024, 17:25:08 UTC |
f4c7831 | Andrew Adams | 11 April 2024, 22:07:20 UTC | Don't print on parallel task entry/exit with -debug flag (#8185) Fixes #8184 | 11 April 2024, 22:07:20 UTC |
dc83707 | Steven Johnson | 11 April 2024, 18:04:42 UTC | Add .npy support to debug_to_file() (#8177) * Add .npy support to halide_image_io The .npy format is NumPy's native format for storing multidimensional arrays (aka tensors/buffers). Being able to load/save in this format makes it (potentially) a lot easier to interchange data with the Python ecosystem, as well as providing a file format that support floating-point data more robustly than any of the others that we current support. This adds load/save support for a useful subset: - We support the int/uint/float types common in Halide (except for f16/bf16 for now) - We don't support reading or writing files that are in `fortran_order` - We don't support any object/struct/etc files, only numeric primitives - We only support loading files that are in the host's endianness (typically little-endian) Note that at present this doesn't support f16 / bf16 formats, but that could likely be added with minimal difficulty. The tricky bit of this is that the reading code has to parse a (limited) Python dict in text form. Please review that part carefully. TODO: we could probably add this as an option for `debug_to_file()` without too much pain in a followup PR. * clang-tidy * clang-tidy * Address review comments * Allow for "keys" as well as 'keys' * Add .npy support to debug_to_file() Built on top of https://github.com/halide/Halide/pull/8175, this adds .npy as an option. This is actually pretty great because it's easy to do something like ``` ss = numpy.load("my_file.npy") print(ss) ``` in Python and get nicely-formatted output, which can sometimes be a lot easier for debugging that inserting lots of print() statements (see https://github.com/halide/Halide/issues/8176) Did a drive-by change to the correctness test to use this format instead of .mat. * Add float16 support * Add support for Float16 images in npy * Assume little-endian * Remove redundant halide_error() * naming convention * naming convention * Test both mat and npy * Don't call halide_error() * Use old-school parser * clang-tidy | 11 April 2024, 18:04:42 UTC |
8f3f6cf | Fabian Schuetze | 11 April 2024, 16:58:36 UTC | Update Hexagon Install Instructions (#8182) update Hexagon install instructions | 11 April 2024, 16:58:36 UTC |
e3d3c8c | Martijn Courteaux | 08 April 2024, 15:29:33 UTC | Fix unused variable. (#8180) | 08 April 2024, 15:29:33 UTC |
35f0c29 | Steven Johnson | 06 April 2024, 15:17:25 UTC | Add .npy support to halide_image_io (#8175) * Add .npy support to halide_image_io The .npy format is NumPy's native format for storing multidimensional arrays (aka tensors/buffers). Being able to load/save in this format makes it (potentially) a lot easier to interchange data with the Python ecosystem, as well as providing a file format that support floating-point data more robustly than any of the others that we current support. This adds load/save support for a useful subset: - We support the int/uint/float types common in Halide (except for f16/bf16 for now) - We don't support reading or writing files that are in `fortran_order` - We don't support any object/struct/etc files, only numeric primitives - We only support loading files that are in the host's endianness (typically little-endian) Note that at present this doesn't support f16 / bf16 formats, but that could likely be added with minimal difficulty. The tricky bit of this is that the reading code has to parse a (limited) Python dict in text form. Please review that part carefully. TODO: we could probably add this as an option for `debug_to_file()` without too much pain in a followup PR. * clang-tidy * clang-tidy * Address review comments * Allow for "keys" as well as 'keys' * Add float16 support * Use old-school parser * clang-tidy | 06 April 2024, 15:17:25 UTC |
f42a3d7 | Steven Johnson | 05 April 2024, 16:43:20 UTC | Merge branch 'main' into xtensa-codegen | 05 April 2024, 16:43:20 UTC |
14ae082 | Andrew Adams | 05 April 2024, 16:39:07 UTC | Clarify the meaning of Shuffle::is_broadcast() (#8158) * Fix horrifying bug in lossless_cast of a subtract * A 'broadcast' shuffle is more complex than it seems I was poking at the Shuffle node, and checking its usage, and it seems that despite the comment, Shuffles that return true for is_broadcast are not the same as a Broadcast node. Instead of repeating the input vector some number of times, it repeats a shuffle of the input vector. This means IRPrinter was incorrect. None of the other usages were bad. This PR makes this clearer in the comment, and fixes IRPrinter. * Revert accidental change | 05 April 2024, 16:39:07 UTC |
a462044 | Alexander Root | 05 April 2024, 16:38:46 UTC | Tighten bounds of abs() (#8168) * Tighten bounds of abs() * make abs bounds tight for non-int32 too * make int32 min expression match non-int32 min expression | 05 April 2024, 16:38:46 UTC |
ddab1cf | Andrew Adams | 05 April 2024, 16:30:59 UTC | Fix bug in lossless_negate | 05 April 2024, 16:30:59 UTC |
7d99357 | Steven Johnson | 05 April 2024, 16:07:05 UTC | Add conversion code for Float16 that was missed in #8174 (#8178) * Add conversion code for Float16 that was missed in #8174 * Don't sniff for _Float16 when building ASAN * Update HalideRuntime.h | 05 April 2024, 16:07:05 UTC |
3b8a532 | Steven Johnson | 04 April 2024, 17:19:13 UTC | Add some missing _Float16 support (#8174) (Changes extracted from https://github.com/halide/Halide/pull/8169, which may or may not land in its current form) Some missing support for _Float16 that will likely be handy: - Allow _Float16 to be detected for Clang 15 (since my local XCode Clang 15 definitely supports it) - Expr(_Float16) - HALIDE_DECLARE_EXTERN_SIMPLE_TYPE(_Float16); - Add _Float16 to the convert matrix in halide_image_io.h | 04 April 2024, 17:19:13 UTC |
a4158c0 | Andrew Adams | 03 April 2024, 19:28:25 UTC | fix ub in lower rounding shift right (#8173) * Avoid out-of-range shifts in lower_rounding_shift_left/right Consider `lower_rounding_shift_right(a, (uint8)0)` The term b - 1 becomes 255, and now you have an out-of-range shift, which causes the simplifier to inject a signed_integer_overflow intrinsic, and compilation to fail. This is a little annoying because if b == 0, b_positive is a zero mask, so the result isn't used anyway (this is also why this change is legal). In llvm, it's a poison value, not UB, so masking it off works. If the simplifier were smarter, it might just drop the signed_integer_overflow intrinsic on detecting that it was being bitwise-and-ed with zero. But the safest thing to do is not overflow. saturating_add/sub are typically as cheap as add/sub. 99.9% of the time b is some positive constant anyway, so it's going to get constant-folded. * Add test | 03 April 2024, 19:28:25 UTC |
1737a52 | Andrew Adams | 02 April 2024, 19:34:37 UTC | Add shifts to the lossless cast fuzzer This required a more careful signed-integer-overflow detection routine | 02 April 2024, 19:34:37 UTC |
c652667 | Andrew Adams | 02 April 2024, 19:11:42 UTC | Avoid UB in lowering of rounding_shift_right/left | 02 April 2024, 19:11:42 UTC |
6bcc66a | Andrew Adams | 02 April 2024, 19:11:26 UTC | Fix bad test when lowering mul_shift_right b_shift + b_shift < missing_q | 02 April 2024, 19:11:26 UTC |
c6065ff | Andrew Adams | 02 April 2024, 19:10:41 UTC | Delete bad rewrite rules | 02 April 2024, 19:10:41 UTC |
0fb8d38 | Andrew Adams | 02 April 2024, 19:10:14 UTC | Stronger assert in Simplify_Div | 02 April 2024, 19:10:14 UTC |
66c56f1 | Andrew Adams | 01 April 2024, 20:35:01 UTC | Fix some UB | 01 April 2024, 20:35:01 UTC |
ecfae44 | Andrew Adams | 01 April 2024, 20:32:27 UTC | It's too late to change the semantics of fixed point intrinsics | 01 April 2024, 20:32:27 UTC |
16a706d | Andrew Adams | 01 April 2024, 20:30:50 UTC | Remove bad TODO I can't think of a single case that could cause this | 01 April 2024, 20:30:50 UTC |
0856319 | Andrew Adams | 01 April 2024, 20:13:03 UTC | clear_bounds_info -> clear_expr_info | 01 April 2024, 20:13:03 UTC |
4a293b1 | Andrew Adams | 01 April 2024, 20:08:33 UTC | Add missing comment | 01 April 2024, 20:08:33 UTC |
854122f | Andrew Adams | 01 April 2024, 20:01:16 UTC | Remove redundant helpers | 01 April 2024, 20:01:16 UTC |
413b4a6 | Andrew Adams | 01 April 2024, 19:59:33 UTC | Account for more aggressive simplification in fuse test | 01 April 2024, 19:59:33 UTC |
b053ec6 | Andrew Adams | 01 April 2024, 19:54:51 UTC | Add missing files | 01 April 2024, 19:54:51 UTC |
26efb7c | Andrew Adams | 01 April 2024, 19:28:57 UTC | Misc cleanups and test improvements | 01 April 2024, 19:28:57 UTC |
2f14881 | Andrew Adams | 01 April 2024, 18:09:06 UTC | Add a simplifier rule which is apparently now necessary | 01 April 2024, 18:09:06 UTC |
cffadd8 | Andrew Adams | 01 April 2024, 18:08:29 UTC | Fix ConstantInterval multiplication | 01 April 2024, 18:08:29 UTC |
cff71e1 | Steven Johnson | 29 March 2024, 17:11:43 UTC | Merge branch 'main' into xtensa-codegen | 29 March 2024, 17:11:43 UTC |
f308a8c | Andrew Adams | 28 March 2024, 18:09:48 UTC | Add cache for constant bounds queries | 28 March 2024, 18:09:48 UTC |
6434210 | Andrew Adams | 28 March 2024, 18:08:45 UTC | Fix * operator. Add min/max/mod | 28 March 2024, 18:08:45 UTC |
7f4bb38 | Andrew Adams | 25 March 2024, 22:29:43 UTC | Handle bounds of narrower types in the simplifier too | 25 March 2024, 22:29:43 UTC |
bee38ce | Andrew Adams | 25 March 2024, 21:37:54 UTC | Make the simplifier use ConstantInterval | 25 March 2024, 21:37:54 UTC |
67855a5 | Andrew Adams | 25 March 2024, 15:44:24 UTC | Move new classes to new files Also fix up Monotonic.cpp | 25 March 2024, 15:44:24 UTC |
214f0fd | Andrew Adams | 22 March 2024, 18:00:17 UTC | Using constant_integer_bounds to strengthen FindIntrinsics In particular, we can do better instruction selection for pmulhrsw | 22 March 2024, 18:00:17 UTC |
e0f9f8e | Andrew Adams | 21 March 2024, 17:27:52 UTC | Fix ARM and HVX instruction selection Also added more TODOs | 21 March 2024, 17:27:52 UTC |
8864e8a | Roman Lebedev | 18 March 2024, 23:09:09 UTC | Python bindings: `add_python_test()`: do set `HL_JIT_TARGET` too (#8156) This one took quite a bit of digging. I wanted to enable opencl tests on debian package, and `boundary_conditions.py`+`division.py` were failing when run with `HL_TARGET=host OCL_ICD_VENDORS=no-opencl-please.missing` env variables with `clGetPlatformIDs failed`, which made no sense to me. Empty `HL_JIT_TARGET` results in `opencl` being detected, unsurprisingly. | 18 March 2024, 23:09:09 UTC |
9c33c94 | Andrew Adams | 18 March 2024, 22:43:41 UTC | Use constant integer intervals to analyze safety for lossless_cast TODO: - Dedup the constant integer code with the same code in the simplifier. - Move constant interval arithmetic operations out of the class. - Make the ConstantInterval part of the return type of lossless_cast (and turn it into an inner helper) so that it isn't constantly recomputed. | 18 March 2024, 22:43:41 UTC |
f60781f | Steven Johnson | 15 March 2024, 22:07:53 UTC | Merge branch 'main' into xtensa-codegen | 15 March 2024, 22:07:53 UTC |
a132246 | Andrew Adams | 15 March 2024, 21:04:44 UTC | Fix two compute_with bugs. (#8152) * Fix two compute_with bugs. This PR fixes a bug in compute_with, and another bug I found while fixing it (we could really use a compute_with fuzzer). The first bug is that you can get into situations where the bounds of a producer func will refer directly to the loop variable of a consumer func, where the consumer is in a compute_with fused group. In main, that loop variable may not be defined because fused loop names have been rewritten to include the token ".fused.". This PR adds let stmts to define it just inside the fused loop body. The second bug is that not all parent loops in compute_with fused groups were having their bounds expanded to cover the region to be computed of all children, because the logic for deciding which loops to expand only considered the non-specialized pure definition. So e.g. compute_with applied to an update stage would fail to compute values of the child Func where they do not overlap with the parent Func. This PR visits all definitions of the parent Func of the fused group, instead of just the unspecialized pure definition of the parent Func. Fixes #8149 * clang-tidy | 15 March 2024, 21:04:44 UTC |
76a7dd4 | Zalman Stern | 15 March 2024, 20:01:51 UTC | Support for ARM SVE2. (#8051) * Checkpoint SVE2 restart. * Remove dead code. Add new test. * Update cmake for new file. * Checkpoint progress on SVE2. * Checkpoint ARM SVE2 support. Passes correctness_simd_op_check_sve2 test at 128 and 256 bits. * Remove an opportunity for RISC V codegen to change due to SVE2 support. * Ensure SVE intrinsics get vscale vectors and non-SVE ones get fixed vectors. Use proper prefix for neon intrinsics. Comment cleanups. * Checkpoint SVE2 work. Generally passes test, though using both NEON and SVE2 with simd_op_check_sve2 fails as both posibilities need to be allowed for 128-bit or smaller operations. * Remove an unfavored implementation possibility. * Fix opcode recognition in test to handle some cases that show up. Change name of test class to avoid confusion. * Formatting fixes. Replace internal_error with nop return for CodeGen_LLVM::match_vector_type_scalable called on scalar. * Formatting fix. * Limit SVE2 test to LLVM 19. Remove dead code. * Fix a degenerate case asking for zero sized vectors via a HAlide type with lanes of zero, which is not correct. * Fix confusion about Neon64/Neon128 and make it clear this is just the width multiplier applied to intrinsics. * REmove extraneous commented out line. * Address some review feedback. Mostly comment fixes. * Fix missed conflict resolution. * Fix some TODOs in SVE code. Move utility function to Util.h and common code the other obvious use. * Formatting. * Add missed refactor change. * Add issue to TODO comment. * Remove TODOs that don't seem necessary. * Add issue for TODO. * Add issue for TODO. * Remove dubious looking FP to int code that was ifdef'ed out. Doesn't look like a TODO is needed anymore. * Add issues for TODOs. * Update simd_op_check_sve2.cpp * Make a deep copy of each piece of test IR so that we can parallelize * Fix two clang-tidy warnings * Remove try/catch block from simd-op-check-sve2 * Don't try to run SVE2 code if vector_bits doesn't match host. * Add support for fcvtm/p, make scalars go through pattern matching too (#8151) * Don't do arm neon instruction selection on scalars This revealed a bug. FindIntrinsics was not enabled for scalars anyway, so it was semi-pointless. --------- Co-authored-by: Zalman Stern <zalman@macbook-pro.lan> Co-authored-by: Steven Johnson <srj@google.com> Co-authored-by: Andrew Adams <andrew.b.adams@gmail.com> | 15 March 2024, 20:01:51 UTC |
7d80f8b | Andrew Adams | 14 March 2024, 21:37:38 UTC | Fix horrifying bug in lossless_cast of a subtract | 14 March 2024, 21:37:38 UTC |
f841a27 | Volodymyr Kysenko | 14 March 2024, 19:53:17 UTC | Bound allocation extents for hoist_storage using loop variables one-by-one (#8154) * Bound allocation extents using loop variable one-by-one * Use emplace_back | 14 March 2024, 19:53:17 UTC |
83616f2 | Andrew Adams | 13 March 2024, 00:00:49 UTC | Fix three nits (#8137) 1) has_gpu_feature already includes Vulkan, so there's no need to check for it. 2) Use emplace(...) instead of insert(make_pair(...)) 3) Fixed a place that should be using a ScopedValue | 13 March 2024, 00:00:49 UTC |
4988ab5 | Martijn Courteaux | 12 March 2024, 23:58:14 UTC | Feature: mark a Func as no_profiling, to prevent injection of profiling. (2nd implementation) (#8143) * Small feature to allow you to specify that a (typically small inner loop) Func should not be profiled. * Simplified the tuple name handling. * Optimize tuple name normalization in Profiling.cpp * Clang-format * Feedback on Function already being a pointer. Bump the Patch version of the serialization. | 12 March 2024, 23:58:14 UTC |
bf0d611 | Andrew Adams | 12 March 2024, 16:49:26 UTC | Rewrite the pass that adds mutexes for atomic nodes (#8105) * Avoid redundant scope lookups This pattern has been bugging me for a long time: ``` if (scope.contains(key)) { Foo f = scope.get(key); } ``` This redundantly looks up the key in the scope twice. I've finally gotten around to fixing it. I've introduced a find method that either returns a const pointer to the value, if it exists, or null. It also searches any containing scopes, which are held by const pointer, so the method has to return a const pointer. ``` if (const Foo *f = scope.find(key)) { } ``` For cases where you want to get and then mutate, I added shallow_find, which doesn't search enclosing scopes, but returns a mutable pointer. We were also doing redundant scope lookups in ScopedBinding. We stored the key in the helper object, and then did a pop on that key in the ScopedBinding destructor. This commit changes Scope so that Scope::push returns an opaque token that you can pass to Scope::pop to have it remove that element without doing a fresh lookup. ScopedBinding now uses this. Under the hood it's just an iterator on the underlying map (map iterators are not invalidated on inserting or removing other stuff). The net effect is to speed up local laplacian lowering by about 5% I also considered making it look more like an stl class, and having find return an iterator, but it doesn't really work. The iterator it returns might point to an entry in an enclosing scope, in which case you can't compare it to the .end() method of the scope you have. Scopes are different enough from maps that the interface really needs to be distinct. * Pacify clang-tidy * Rewrite the pass that injects mutexes to support atomics For O(n) nested allocate nodes, this pass was quadratic in n, even if there was no use of atomics. This commit rewrites it to use a linear-time algorithm, and skips it entirely after the first validation pass if there aren't any atomic nodes. It also needlessly used IRGraphMutators, which slowed things down, didn't handle LargeBuffers (could overflow in the allocation), incorrectly thought every producer/consumer node was associated with an output buffer, and didn't print the realization name when printing the atomic node (the body of an atomic node is only atomic w.r.t. a specific realization). I noticed all this because it stuck out in a profile. For resnet 50, the rewrite that changed to a linear algorithm took this stage from 185ms down to 6.7ms, and then skipping it entirely when it doesn't find any atomic nodes added 1.5 for the single IRVisitor check. For local laplacian with 100 pyramid levels (which contains many nested allocate nodes due to a large number of skip connections), the times are 5846 ms -> 16 ms -> 4.6 ms This is built on top of #8103 * Fix unintentional mutation of interval in scope --------- Co-authored-by: Steven Johnson <srj@google.com> | 12 March 2024, 16:49:26 UTC |
3c2d809 | Andrew Adams | 12 March 2024, 00:05:44 UTC | Use python itself to get the extension suffix, not python-config (#8148) * Use python itself to get the extension suffix, not python-config * Add a comment | 12 March 2024, 00:05:44 UTC |
30f29fc | Steven Johnson | 08 March 2024, 21:20:30 UTC | Merge branch 'main' into xtensa-codegen | 08 March 2024, 21:20:30 UTC |
009fe7a | Andrew Adams | 08 March 2024, 16:50:20 UTC | Handle loads of broadcasts in FlattenNestedRamps (#8139) With sufficiently perverse schedules, it's possible to end up with a load of a broadcast index (rather than a broadcast of a scalar load). This made FlattenNestedRamps divide by zero. Unfortunately this happened in a complex production pipeline, so I'm not entirely sure how to reproduce it. For that pipeline, this change fixes it and produces correct output. | 08 March 2024, 16:50:20 UTC |
8cc4f02 | Steven Johnson | 08 March 2024, 02:13:56 UTC | Fix for top-of-tree LLVM (#8145) | 08 March 2024, 02:13:56 UTC |
22868a4 | Prasoon Mishra | 06 March 2024, 21:40:00 UTC | Add sobel in hexagon benchmarks app for CMake builds (#8127) * Add sobel in hexagon_benchmarks app for CMake builds Resolved compilation errors caused by the eliminate interleave pass, which changed the instruction from halide.hexagon.pack_satub.vuh to halide.hexagon.trunc_satub.vuh. The latter is only available in v65 or later. This commit ensures compatibility with v65 and later versions. * Minor fix to address the issue. --------- Co-authored-by: Steven Johnson <srj@google.com> | 06 March 2024, 21:40:00 UTC |
754e6ec | Derek Gerstmann | 06 March 2024, 19:46:23 UTC | [vulkan] Add conform API methods to memory allocator to fix block allocations (#8130) * Add conform API methods to block and region allocator classes Override conform requests for Vulkan memory allocator Cleanup memory requirement constraints for Vulkan Add conform test cases to block_allocator runtime test. * Clang format/tidy pas * Fix unsigned int comparisons * Clang format pass * Fix other unsigned int comparisons * Fix mismatched template types for max() * Fix whitespace for clang format --------- Co-authored-by: Derek Gerstmann <dgerstmann@adobe.com> | 06 March 2024, 19:46:23 UTC |
aad94de | Volodymyr Kysenko | 05 March 2024, 22:43:44 UTC | Disable halide_xtensa_mul_add_f32 temporarily | 05 March 2024, 22:43:44 UTC |
b6449b3 | Steven Johnson | 05 March 2024, 17:54:38 UTC | Merge branch 'main' into xtensa-codegen | 05 March 2024, 17:54:38 UTC |
10e07e6 | Zalman Stern | 05 March 2024, 17:53:29 UTC | Add class template type deduction guides to avoid CTAD warning. (#8135) * Add class template type dedeuction guides to avoid CTAD warning. * Formatting. | 05 March 2024, 17:53:29 UTC |
05ae15a | Andrew Adams | 05 March 2024, 17:50:19 UTC | Make gpu thread and block for loop names opaque (#8133) This is one of our largest remaining type of magic name. These were explicitly constructed in lots of places and then explicitly checked for with ends_with in lots of places. This PR makes the names opaque. Only CanonicalizeGPUVars.cpp knows what they are, and they don't have to be a single fixed thing as long as they're consistent within a process. Also reduced the number of GPU dimensions to three more uniformly. We were already asserting this, but there was lots of dead code in lowering passes after gpu loop validation that allowed for four. Also fixed a bug I found in is_block_uniform. It didn't consider that the dependence on a gpu thread variable in a load index could be because a let variable encountered depends on a gpu thread variable. | 05 March 2024, 17:50:19 UTC |