https://github.com/halide/Halide
- HEAD
- refs/heads/Halide_unsharp
- refs/heads/abadams-patch-1
- refs/heads/abadams/aggressive_is_single_point
- refs/heads/abadams/aggressive_unify_duplicate_lets
- refs/heads/abadams/align_strided_const_loads
- refs/heads/abadams/alloca
- refs/heads/abadams/atomic_parallel_compiled_in
- refs/heads/abadams/atomic_vector_non_recursive
- refs/heads/abadams/averaging_tree
- refs/heads/abadams/avoid_name_mangling_in_cross_module_dependencies
- refs/heads/abadams/better_absd
- refs/heads/abadams/better_codegen_for_non_const_ramps
- refs/heads/abadams/bgu_cholesky
- refs/heads/abadams/braces_around_statements
- refs/heads/abadams/cache_tighten_producer_consumer_nodes
- refs/heads/abadams/check_reorder_dups
- refs/heads/abadams/clarify_broadcast_shuffle
- refs/heads/abadams/compositing_app
- refs/heads/abadams/cond_wait_spin
- refs/heads/abadams/constant_interval_simplifier
- refs/heads/abadams/cse_in_unroll_split_tuples
- refs/heads/abadams/custom_cuda_context
- refs/heads/abadams/custom_cuda_context_2
- refs/heads/abadams/custom_cuda_context_3
- refs/heads/abadams/d3d12abi
- refs/heads/abadams/deflake_mullapudi_reorder
- refs/heads/abadams/delete_division_rule_that_makes_large_constants
- refs/heads/abadams/delete_prepare_for_early_exit
- refs/heads/abadams/depthwise_separable_conv
- refs/heads/abadams/diagnose_boundary_condition_failure
- refs/heads/abadams/disable_onnx_app_on_mac
- refs/heads/abadams/divide_using_pavgw
- refs/heads/abadams/dont_link_to_cudart
- refs/heads/abadams/dont_reinterpret_concat
- refs/heads/abadams/early_out
- refs/heads/abadams/enable_f16c
- refs/heads/abadams/extract_concat_bits
- refs/heads/abadams/fast_integer_divide_round_to_zero
- refs/heads/abadams/faster_runtime_integer_division
- refs/heads/abadams/faster_substitute_facts
- refs/heads/abadams/faster_unroll
- refs/heads/abadams/faster_vars_used_in_simplify_let
- refs/heads/abadams/fix-arm-seg2
- refs/heads/abadams/fix_4211
- refs/heads/abadams/fix_5323
- refs/heads/abadams/fix_5329
- refs/heads/abadams/fix_5889
- refs/heads/abadams/fix_6984
- refs/heads/abadams/fix_7229
- refs/heads/abadams/fix_7260
- refs/heads/abadams/fix_7365
- refs/heads/abadams/fix_7374
- refs/heads/abadams/fix_7504
- refs/heads/abadams/fix_7514
- refs/heads/abadams/fix_7531
- refs/heads/abadams/fix_7584
- refs/heads/abadams/fix_7584_v2
- refs/heads/abadams/fix_7742
- refs/heads/abadams/fix_7756
- refs/heads/abadams/fix_7761
- refs/heads/abadams/fix_7768
- refs/heads/abadams/fix_7786
- refs/heads/abadams/fix_7810
- refs/heads/abadams/fix_7811
- refs/heads/abadams/fix_7815
- refs/heads/abadams/fix_7867
- refs/heads/abadams/fix_7871
- refs/heads/abadams/fix_7872
- refs/heads/abadams/fix_7873
- refs/heads/abadams/fix_7888
- refs/heads/abadams/fix_7890
- refs/heads/abadams/fix_7891
- refs/heads/abadams/fix_7892
- refs/heads/abadams/fix_7893
- refs/heads/abadams/fix_7906
- refs/heads/abadams/fix_7909
- refs/heads/abadams/fix_7968
- refs/heads/abadams/fix_8038
- refs/heads/abadams/fix_8054
- refs/heads/abadams/fix_8170
- refs/heads/abadams/fix_8184
- refs/heads/abadams/fix_8280
- refs/heads/abadams/fix_8309
- refs/heads/abadams/fix_8312
- refs/heads/abadams/fix_arm_fcvtmp
- refs/heads/abadams/fix_associative_ops_saturating_add
- refs/heads/abadams/fix_autoschedule_feature_transposition
- refs/heads/abadams/fix_cse_name_collisions
- refs/heads/abadams/fix_cuda_mat_mul_assert
- refs/heads/abadams/fix_deinterleave_bug
- refs/heads/abadams/fix_deinterleave_for_reinterpret
- refs/heads/abadams/fix_div_round_to_zero
- refs/heads/abadams/fix_fft_compile_time_regression
- refs/heads/abadams/fix_generate_output_snippets
- refs/heads/abadams/fix_if_nesting_condition
- refs/heads/abadams/fix_leaks_in_memoize_test
- refs/heads/abadams/fix_lgtm_warnings
- refs/heads/abadams/fix_links_to_master
- refs/heads/abadams/fix_load_of_broadcast
- refs/heads/abadams/fix_onnx_app
- refs/heads/abadams/fix_pointless_lower_condition
- refs/heads/abadams/fix_potential_gpu_deadlock
- refs/heads/abadams/fix_realize_condition_depends_on_tuple
- refs/heads/abadams/fix_reduce_expr_modulo_of_vector
- refs/heads/abadams/fix_riscv_vx_vi
- refs/heads/abadams/fix_round
- refs/heads/abadams/fix_stencil_chain_gpu_schedule
- refs/heads/abadams/fix_track_bounds_intervals
- refs/heads/abadams/fix_tutorial_2
- refs/heads/abadams/fix_ub_in_lower_rounding_shift_right
- refs/heads/abadams/forward_partition_methods
- refs/heads/abadams/fully_fused_depthwise_separable_conv
- refs/heads/abadams/future_clang_tidy_fixes
- refs/heads/abadams/fuzz_sliding_window
- refs/heads/abadams/gaussian_blur_app
- refs/heads/abadams/generator_infinite_default_timeout
- refs/heads/abadams/gpu_autoscheduler_parallel_random_probes
- refs/heads/abadams/include_riscv_in_readme
- refs/heads/abadams/interleave_nested_vector
- refs/heads/abadams/ir_match_by_ref
- refs/heads/abadams/lerp_plus_cast
- refs/heads/abadams/local_laplacian_code_size
- refs/heads/abadams/lower_halving_sub
- refs/heads/abadams/lower_rounding_shift_right
- refs/heads/abadams/mac-arm-fixes
- refs/heads/abadams/make_fast_inverse_test_throughput_limited
- refs/heads/abadams/makefile_serialization_support
- refs/heads/abadams/mismatched_new_delete
- refs/heads/abadams/mixed_sign_mul_shift_right
- refs/heads/abadams/mixed_width_mul_shift_right
- refs/heads/abadams/multiple_scatter
- refs/heads/abadams/mux_intrinsic
- refs/heads/abadams/name_helpers
- refs/heads/abadams/narrow_predicates
- refs/heads/abadams/nested_vectorization_compile_time_regression_fix
- refs/heads/abadams/nested_vectorization_tweaks
- refs/heads/abadams/parallel_simd_op_check
- refs/heads/abadams/partially_backport_lossless_cast_fix
- refs/heads/abadams/precompute_shared_mem_size
- refs/heads/abadams/prefer_no_gather
- refs/heads/abadams/print_uncaught_exception
- refs/heads/abadams/promise_clamped_is_a_tag
- refs/heads/abadams/promote_fixed_point_intrinsics
- refs/heads/abadams/psabdw
- refs/heads/abadams/random_pipelines
- refs/heads/abadams/rationalize_gpu_for_loop_names
- refs/heads/abadams/reenable_unscheduled_stage_warning
- refs/heads/abadams/refactor_constant_interval
- refs/heads/abadams/reinterpret_vector
- refs/heads/abadams/remove_arch_os_for_shaders
- refs/heads/abadams/remove_bad_pruning
- refs/heads/abadams/remove_llvm_zstd
- refs/heads/abadams/remove_parameter_self_references
- refs/heads/abadams/remove_readnone_on_functions
- refs/heads/abadams/remove_use_of_python_config_in_onnx_makefile
- refs/heads/abadams/reschedule_bgu
- refs/heads/abadams/reschedule_bilateral_grid
- refs/heads/abadams/rewrite_atomic_pass
- refs/heads/abadams/rewrite_ir_equality
- refs/heads/abadams/rounding_shift_right_use_average
- refs/heads/abadams/rungenmain_error
- refs/heads/abadams/sampling_profiler_overhead_v2
- refs/heads/abadams/scope_improvements
- refs/heads/abadams/simpler_broadcasts
- refs/heads/abadams/simplify_correlated_pyramid
- refs/heads/abadams/simplify_sub_eval_in_lambda
- refs/heads/abadams/siotas_20
- refs/heads/abadams/sioutas_20
- refs/heads/abadams/slide_over_split_loop
- refs/heads/abadams/sorting_network_working_branch
- refs/heads/abadams/stable_topological_order
- refs/heads/abadams/string_view
- refs/heads/abadams/strip_asserts_last
- refs/heads/abadams/switch_stmt
- refs/heads/abadams/target_specific_lerp
- refs/heads/abadams/time_lowering_passes
- refs/heads/abadams/track_failedness_through_solver_lets
- refs/heads/abadams/turn_off_slp_vectorization_for_avx512
- refs/heads/abadams/tweak_unpack_buffers
- refs/heads/abadams/undo_pointless_widening
- refs/heads/abadams/unordered_blocks
- refs/heads/abadams/unsigned_demosaic
- refs/heads/abadams/update_makefile_for_llvm_19
- refs/heads/abadams/use_arm_for_runtime_triple
- refs/heads/abadams/use_pmaddubsw_for_downsample
- refs/heads/abadams/validate_gpu_schedules
- refs/heads/abadams/vector_reduce_hexagon_predicate
- refs/heads/abadams/vector_scan
- refs/heads/abadams/vst_type_fix
- refs/heads/abadams/widening_let_bug
- refs/heads/abadams/x86_avg
- refs/heads/abadams/zen4
- refs/heads/adadams/profile_allocator
- refs/heads/add_image_checks_after_bounds_inference_plus_new_rules
- refs/heads/add_outermost_to_extern
- refs/heads/add_vectorization_to_search_space
- refs/heads/aelphy/broadcast_q8
- refs/heads/aelphy/f16x16
- refs/heads/aelphy/feature_cadence_changes
- refs/heads/aelphy/float_extracts
- refs/heads/aelphy/i32_sat_widening_shift_right
- refs/heads/aelphy/sqrt_f16
- refs/heads/aelphy/vector_loads_f16_f32
- refs/heads/alexreinking/pip-metadata
- refs/heads/align_loads_comment_fix
- refs/heads/alina-strided-store
- refs/heads/another_buffer_copy_fix
- refs/heads/arm_sve_redux
- refs/heads/ataei-block_asserts-codegen
- refs/heads/ataei-debug_info
- refs/heads/ataei-fix-pow
- refs/heads/ataei-gen_str_param
- refs/heads/ataei-implicit_lhs_vars
- refs/heads/ataei-onnx
- refs/heads/ataei-onnx_converter_update
- refs/heads/ataei-onnx_pybind
- refs/heads/ataei-resnet50_benchmarks
- refs/heads/ataei-standalone_autoscheduler
- refs/heads/ataei_lots_of_inputs
- refs/heads/auto_sched_benchmarks
- refs/heads/auto_sched_estimates
- refs/heads/auto_sched_inline
- refs/heads/auto_sched_test_notparallel
- refs/heads/autoschedule_top_down
- refs/heads/autoschedule_with_convnet
- refs/heads/autoscheduler_scalar_imageparam_fix
- refs/heads/backports/10.x
- refs/heads/backports/11.x
- refs/heads/backports/12.x
- refs/heads/backports/13.x
- refs/heads/balance_expressions
- refs/heads/bazel
- refs/heads/benchmarks
- refs/heads/blaze
- refs/heads/bounds_buffer_lets_fix
- refs/heads/bounds_correct_vs_bounds_loaded_reduced
- refs/heads/buffer_device_api_target
- refs/heads/bug_device_free
- refs/heads/bug_inline_unbounded
- refs/heads/build/bundle-static
- refs/heads/build/pip-packaging
- refs/heads/circ_buffer
- refs/heads/cmake-no-runtime-debug-symbols
- refs/heads/cmake_wasm_features
- refs/heads/compute_at_guard_with_if_goes_on_stack
- refs/heads/compute_with_at
- refs/heads/compute_with_check
- refs/heads/compute_with_excessive_bounds
- refs/heads/compute_with_inlined
- refs/heads/compute_with_remove_is_right_level
- refs/heads/cpack/nuget
- refs/heads/ctest/wrappers
- refs/heads/cuda-constant
- refs/heads/d3d12-allocation-cache
- refs/heads/deferred_cse_after_inlining
- refs/heads/destructor_calls_deinit
- refs/heads/dg/deserialize_unmapped_objects
- refs/heads/dg/fix_vulkan_codegen_bool_conversion
- refs/heads/dg/fix_vulkan_gpu_vars
- refs/heads/dg/vulkan_conform_api
- refs/heads/dg/vulkan_load_lib
- refs/heads/dg/vulkan_region_allocator_fixes
- refs/heads/dgerstmann/fix-vulkan-memory-config-init
- refs/heads/disable_acquire_release_test_vulkan
- refs/heads/distinct_wrapper_names
- refs/heads/dkg/6863_asan_fixes
- refs/heads/dkg/vulkan
- refs/heads/dpalermo_dmabuf
- refs/heads/dpalermo_dmabuf_libion
- refs/heads/dpalermo_hexagon_remote_202003
- refs/heads/dpalermo_sdk4_2_0_2
- refs/heads/ds/buffer-get-pure
- refs/heads/ds/opt-tile-size
- refs/heads/ds/tail-none
- refs/heads/ds/while
- refs/heads/dsharletg/bitwise-intrinsics
- refs/heads/dsharletg/find-vector-reduce
- refs/heads/dsharletg/jit-optimization
- refs/heads/dsharletg/memcpy-copy_from
- refs/heads/dsharletg/pattern-headroom
- refs/heads/dsharletg/refactor-host-alignment
- refs/heads/dsharletg/runtime-size
- refs/heads/dsharletg/simplify-abs
- refs/heads/dsharletg/simplify-type-bounds
- refs/heads/dsharletg/specialize-bounds
- refs/heads/dsharletg/upsample-channels
- refs/heads/empty_prefetch
- refs/heads/emscripten_vector_fix
- refs/heads/export_all-wsmoses
- refs/heads/expr_auto_sched
- refs/heads/extern_bugs
- refs/heads/extern_host_alloc
- refs/heads/factor_parallel_codegen_hack
- refs/heads/fast_sync_tsan
- refs/heads/faster_integer_division
- refs/heads/feature/apps-external
- refs/heads/feature/cmake-presets
- refs/heads/feature/convert
- refs/heads/feature/f16_interleave
- refs/heads/feature/gather_load_q7
- refs/heads/feature/gather_load_undefined_ramp
- refs/heads/feature/llvm-codemodel
- refs/heads/feature/load_predicated
- refs/heads/feature/luma_regression
- refs/heads/feature/maintanence
- refs/heads/feature/reinterprets
- refs/heads/feature/tcm_bump_allocator
- refs/heads/feature/xtensa_fix_interleave_q8
- refs/heads/feature/xtensa_q8_tests
- refs/heads/find_intrinsics_issue
- refs/heads/find_intrinsics_widening_lets
- refs/heads/fix-7854
- refs/heads/fix-floated-pure-stage
- refs/heads/fix-race-condition
- refs/heads/fix_hexagon_alignment
- refs/heads/fix_hvx_intrinsics
- refs/heads/fix_prefetch_test
- refs/heads/fix_windows_vs15_build
- refs/heads/fixed_length_vectors
- refs/heads/fixed_point_local_laplac
- refs/heads/gemmlowp
- refs/heads/generate
- refs/heads/gha/pip
- refs/heads/gpu_canon_fix
- refs/heads/halide_extended_exp
- refs/heads/halide_ir_flatbuffer
- refs/heads/hex_dma2_async
- refs/heads/hexagon_le_runtime
- refs/heads/hexagon_priority
- refs/heads/hexagon_setpriority
- refs/heads/hexagon_strided_pred_load
- refs/heads/hexagon_sysmon_markers
- refs/heads/imaging-synthesis
- refs/heads/includes_fix
- refs/heads/ios_fast_sync_fix
- refs/heads/jia-kai-fix-runtime-cuda-init
- refs/heads/kamil-openglcompute-infinity
- refs/heads/kamil/name_pthread_workers
- refs/heads/kp_bit_shift
- refs/heads/line_buffer
- refs/heads/loop_carry_not_working
- refs/heads/lower_on_huge_stack
- refs/heads/main
- refs/heads/memoize_with_extents
- refs/heads/metal_float16
- refs/heads/metaprogrammed_simplifier_mod
- refs/heads/mohamedadaly-vmlal
- refs/heads/more_powerful_sliding
- refs/heads/new_autoschedule_with_new_simplifier_arm_worker_branch
- refs/heads/new_autoscheduler
- refs/heads/new_simplifier_rule_testing
- refs/heads/newer_ion_ioctl
- refs/heads/no_bounds_query_when_bounds_used
- refs/heads/opengl_compute_buffer_types_fix
- refs/heads/openglcompute_reuse_shared_allocations
- refs/heads/optmize_reorder
- refs/heads/par_for_opt
- refs/heads/pdb/fix_7806
- refs/heads/pdb/hexagon_remote_cmake
- refs/heads/pdb_add_libcpp_makefile_inc
- refs/heads/pdb_eliminate_interleaves_test
- refs/heads/pdb_fix_clang_build
- refs/heads/pdb_fix_install_qc
- refs/heads/pdb_fix_loop_carry
- refs/heads/pdb_fix_simd_op_check_hvx
- refs/heads/pdb_mul_div_mod_multi_thread
- refs/heads/pdb_remove_hvx_v64
- refs/heads/perform_inline_with_order
- refs/heads/performance_linters
- refs/heads/pr/2572
- refs/heads/pr/2676
- refs/heads/pr/2975
- refs/heads/pr/3017
- refs/heads/pr/3081
- refs/heads/pr/3387
- refs/heads/pr/3939
- refs/heads/pr/3960
- refs/heads/pr/4380
- refs/heads/pr/4414
- refs/heads/pr/5331
- refs/heads/pr/5438
- refs/heads/pr/5455
- refs/heads/pr/5758_2
- refs/heads/predicated_vector
- refs/heads/prefetch_specialize
- refs/heads/print_schedule
- refs/heads/profile_hardware_counters
- refs/heads/random-pipelines
- refs/heads/rdom_with_pure_vars
- refs/heads/readme-fix-gcd
- refs/heads/realization_order
- refs/heads/refactor_module
- refs/heads/register_promotion
- refs/heads/release/10.x
- refs/heads/release/11.x
- refs/heads/release/12.x
- refs/heads/release/13.x
- refs/heads/release/14.x
- refs/heads/release/15.x
- refs/heads/release/16.x
- refs/heads/release/17.x
- refs/heads/release/18.x
- refs/heads/release/8.x
- refs/heads/remove_max_on_fuse_factor
- refs/heads/reorder_rvar
- refs/heads/reset_unique_counter
- refs/heads/revert-3612-ataei-speedup_compiletime
- refs/heads/revert-7009-rootjalex/distribute-w_shl
- refs/heads/revert-7601-compile_hexagon_remote
- refs/heads/riscv_update
- refs/heads/rl_simplifier_rules
- refs/heads/rootjalex/add_simpl_rules
- refs/heads/rootjalex/arm-optimize
- refs/heads/rootjalex/autoscheduler_mcts
- refs/heads/rootjalex/bounds-rewriter
- refs/heads/rootjalex/bounds_synthesis
- refs/heads/rootjalex/cbounds
- refs/heads/rootjalex/cbounds_predicated
- refs/heads/rootjalex/fix-reinterpret-cmp
- refs/heads/rootjalex/fix-sat-overflow
- refs/heads/rootjalex/fix_estimate_issue
- refs/heads/rootjalex/fix_failed_unrolls
- refs/heads/rootjalex/gsoc_codegen
- refs/heads/rootjalex/improve_cbounds_fixed
- refs/heads/rootjalex/improve_constant_bounds
- refs/heads/rootjalex/pitchfork-arm
- refs/heads/rootjalex/reinterpret-simplify
- refs/heads/rootjalex/rts
- refs/heads/rootjalex/super_simplify_bounds
- refs/heads/rootjalex/test_cbounds_fixed
- refs/heads/rootjalex/test_constant_bounds
- refs/heads/rootjalex/trs-codegen
- refs/heads/rootjalex/trs-codegen-cross
- refs/heads/rootjalex/trs-merge
- refs/heads/rootjalex/uint32-int32-cast
- refs/heads/rootjalex/x86-hadds
- refs/heads/rootjalex/x86-optimize
- refs/heads/rootjalex/x86-optimize-test
- refs/heads/rootjalex/x86-test
- refs/heads/rule_removal_experiments
- refs/heads/schedule-output-storage
- refs/heads/separate_bounds_query_entrypoint
- refs/heads/shallow
- refs/heads/shift_amount_type_change
- refs/heads/shoaibkamil/cmake-without-arm
- refs/heads/shoaibkamil/correct_memory_fences
- refs/heads/shoaibkamil/d3d-fixes
- refs/heads/shoaibkamil/deprecate_openglcompute
- refs/heads/shoaibkamil/json
- refs/heads/shoaibkamil/llvm_clone_tag
- refs/heads/shoaibkamil/minor-vcpkg-doc-change
- refs/heads/shoaibkamil/opengl_compute_tests
- refs/heads/shoaibkamil/performance_tests_as_generators
- refs/heads/shoaibkamil/rule_removal_experiments
- refs/heads/shoaibkamil/super_simplify_with_interpreter
- refs/heads/shoaibkamil/windows-arm-fix-attributes
- refs/heads/sim_shlib_addr_print
- refs/heads/simplify-nested-broadcasts
- refs/heads/simplify-vectorreduce-shuffles2
- refs/heads/simplify_mod
- refs/heads/sioutas_2020
- refs/heads/sioutas_2020_autoscheduler
- refs/heads/slomp/gpu-codegen-profiling
- refs/heads/slomp/msvc-static-analysis
- refs/heads/solve_div
- refs/heads/solve_div_simplifier_test
- refs/heads/sr/python-late-binding-defaults
- refs/heads/srj-aaa
- refs/heads/srj-alloc
- refs/heads/srj-alloca
- refs/heads/srj-appmake2
- refs/heads/srj-aslog
- refs/heads/srj-assert
- refs/heads/srj-assoc
- refs/heads/srj-auto-multi
- refs/heads/srj-auto-multi2
- refs/heads/srj-auto_schedule_mat_mul
- refs/heads/srj-autosched
- refs/heads/srj-b2cpphide
- refs/heads/srj-barr
- refs/heads/srj-bits
- refs/heads/srj-blacklist
- refs/heads/srj-bounds
- refs/heads/srj-bufcalltype
- refs/heads/srj-bufcallwrap
- refs/heads/srj-bufcallwrap2
- refs/heads/srj-buffer
- refs/heads/srj-bv
- refs/heads/srj-classic-autotune
- refs/heads/srj-clean
- refs/heads/srj-constcall
- refs/heads/srj-crosscompile
- refs/heads/srj-ctlz
- refs/heads/srj-cvec-patch
- refs/heads/srj-dag
- refs/heads/srj-debug-to-file
- refs/heads/srj-deir
- refs/heads/srj-f16
- refs/heads/srj-fp16
- refs/heads/srj-fsch
- refs/heads/srj-fthru
- refs/heads/srj-g2
- refs/heads/srj-g3
- refs/heads/srj-gha-test-fixes
- refs/heads/srj-hidden
- refs/heads/srj-hide2
- refs/heads/srj-hvx
- refs/heads/srj-hvx-bug
- refs/heads/srj-hvx-codegen-bug
- refs/heads/srj-hvx-nocopy
- refs/heads/srj-hvxshift
- refs/heads/srj-iib
- refs/heads/srj-initshape
- refs/heads/srj-inv
- refs/heads/srj-ir
- refs/heads/srj-irmut2
- refs/heads/srj-iwyu
- refs/heads/srj-iwyu3
- refs/heads/srj-javascript_work_in_progress
- refs/heads/srj-lensblur
- refs/heads/srj-lessinc
- refs/heads/srj-llvm-loop-opt
- refs/heads/srj-mak
- refs/heads/srj-maxthreads
- refs/heads/srj-mod
- refs/heads/srj-msan
- refs/heads/srj-msan-call
- refs/heads/srj-muldivmod
- refs/heads/srj-mut
- refs/heads/srj-outputs-2
- refs/heads/srj-parse
- refs/heads/srj-pch
- refs/heads/srj-printfunc
- refs/heads/srj-pygp
- refs/heads/srj-revertbits
- refs/heads/srj-schedule-storage
- refs/heads/srj-shl-shr-2
- refs/heads/srj-sio
- refs/heads/srj-static-const
- refs/heads/srj-strided-store
- refs/heads/srj-tidyh
- refs/heads/srj-tiff
- refs/heads/srj-trace
- refs/heads/srj-tutorial
- refs/heads/srj-using
- refs/heads/srj-wasmfix
- refs/heads/srj-xor2
- refs/heads/srj/abstract-gen-without-get-output-func-KEEP
- refs/heads/srj/aligned-alloc
- refs/heads/srj/aligned-alloc-2
- refs/heads/srj/aligned-malloc-with-aligned-alloc
- refs/heads/srj/all-explicit-ctor
- refs/heads/srj/anderson-thread-info-ptr
- refs/heads/srj/aot-perf
- refs/heads/srj/argv-signatures
- refs/heads/srj/argv-types
- refs/heads/srj/arm64e
- refs/heads/srj/async-test
- refs/heads/srj/atomic-32
- refs/heads/srj/b2cpp-const-data
- refs/heads/srj/better-xt-dispatch
- refs/heads/srj/bfloat1
- refs/heads/srj/bp
- refs/heads/srj/build_halide_h
- refs/heads/srj/c-bool
- refs/heads/srj/cache-clear
- refs/heads/srj/clang-fmt-ignore
- refs/heads/srj/clang-tidy
- refs/heads/srj/clear-c-cache
- refs/heads/srj/cmake-asan
- refs/heads/srj/cmake-asan2
- refs/heads/srj/cmake-jit-generators
- refs/heads/srj/configure-cmake
- refs/heads/srj/cpp-generator-v2-experiment-KEEP
- refs/heads/srj/crosscompile
- refs/heads/srj/csv
- refs/heads/srj/ctad
- refs/heads/srj/depr
- refs/heads/srj/deprecation
- refs/heads/srj/device-copy
- refs/heads/srj/example
- refs/heads/srj/experiment
- refs/heads/srj/experiment-6967
- refs/heads/srj/exporting
- refs/heads/srj/expr_t
- refs/heads/srj/external-tensors
- refs/heads/srj/f16-convert
- refs/heads/srj/fix-pytorch
- refs/heads/srj/fixed-rollback
- refs/heads/srj/fopen-fix
- refs/heads/srj/forward-name
- refs/heads/srj/gen-func
- refs/heads/srj/gen-func-2
- refs/heads/srj/gen-func-3
- refs/heads/srj/gen2-1
- refs/heads/srj/gen_closure
- refs/heads/srj/generator_aot_gpu_multi_context_threaded
- refs/heads/srj/globals
- refs/heads/srj/halide-buffer-crop
- refs/heads/srj/halide-malloc-alignment
- refs/heads/srj/halide-must-use
- refs/heads/srj/halide-runtime-must-use-result
- refs/heads/srj/hallmark
- refs/heads/srj/hang-repro
- refs/heads/srj/hannk
- refs/heads/srj/hannk-aliasing
- refs/heads/srj/hannk-error-checking
- refs/heads/srj/hannk-errors
- refs/heads/srj/hannk-inplace
- refs/heads/srj/hannk-mmap
- refs/heads/srj/hannk-tflite-27
- refs/heads/srj/hannk-verbosity
- refs/heads/srj/hdrs
- refs/heads/srj/html-becomes-viz
- refs/heads/srj/implicit-mult-widening
- refs/heads/srj/issue-7076
- refs/heads/srj/iwyu
- refs/heads/srj/iwyu-2
- refs/heads/srj/iwyu-6
- refs/heads/srj/libHANNK
- refs/heads/srj/linter
- refs/heads/srj/llvm_type_of
- refs/heads/srj/lossless-test
- refs/heads/srj/maybe-unused
- refs/heads/srj/meanop
- refs/heads/srj/metadata-calling-convention
- refs/heads/srj/more-tidy
- refs/heads/srj/msan-dtf
- refs/heads/srj/multimeta
- refs/heads/srj/nanobind
- refs/heads/srj/new-rt-1
- refs/heads/srj/no-threadpool
- refs/heads/srj/no-timeout-thread
- refs/heads/srj/oglc-mutexed
- refs/heads/srj/param-map
- refs/heads/srj/pip-15.x
- refs/heads/srj/pip-cron
- refs/heads/srj/possible-uninited
- refs/heads/srj/pr-7566
- refs/heads/srj/printer-size
- refs/heads/srj/profiler-data-race
- refs/heads/srj/ptr-int-cast
- refs/heads/srj/py-float32
- refs/heads/srj/pyapps
- refs/heads/srj/pyext-fix
- refs/heads/srj/pygen-class
- refs/heads/srj/pygen-deux
- refs/heads/srj/pygen-func
- refs/heads/srj/pygen-native-types
- refs/heads/srj/pyinstall
- refs/heads/srj/pystuff
- refs/heads/srj/python-buffer-unpack
- refs/heads/srj/python-tutorial
- refs/heads/srj/reshape
- refs/heads/srj/rt-error-smallify
- refs/heads/srj/rt-return-types
- refs/heads/srj/runtime-error-handling
- refs/heads/srj/sat-fixes-exp
- refs/heads/srj/sat-fixes-exp-2
- refs/heads/srj/shadow-field
- refs/heads/srj/snprintf
- refs/heads/srj/spirv-license
- refs/heads/srj/stat-buf-deprecations
- refs/heads/srj/static-buffer-generators
- refs/heads/srj/stmt-html
- refs/heads/srj/stringify
- refs/heads/srj/synth-gen-params
- refs/heads/srj/synth-params-python
- refs/heads/srj/test-arm_sve_redux
- refs/heads/srj/test-intrinsics-bounds
- refs/heads/srj/test8076
- refs/heads/srj/test8078
- refs/heads/srj/test8094
- refs/heads/srj/test8105a
- refs/heads/srj/test8115
- refs/heads/srj/test_tmpdir_fix
- refs/heads/srj/tidy
- refs/heads/srj/tidy-format-14
- refs/heads/srj/tidymore
- refs/heads/srj/tidymore2
- refs/heads/srj/tls
- refs/heads/srj/tls-3
- refs/heads/srj/tls-4
- refs/heads/srj/tls-ucon
- refs/heads/srj/tmp-unschedule-experiment
- refs/heads/srj/tot-fix
- refs/heads/srj/try-revert-sat
- refs/heads/srj/type-traits
- refs/heads/srj/typed-func
- refs/heads/srj/ucon-all-const
- refs/heads/srj/ucon-non-const
- refs/heads/srj/version
- refs/heads/srj/visit-warnings
- refs/heads/srj/wasm-atomic2
- refs/heads/srj/wasm-simd
- refs/heads/srj/wasm-stuff
- refs/heads/srj/wasm-threads
- refs/heads/srj/wasm-updates
- refs/heads/srj/wasm-work
- refs/heads/srj/wip
- refs/heads/srj/x-rounding
- refs/heads/srj/xbuf
- refs/heads/srj/xc+plus+size+tmp
- refs/heads/srj/xc-types
- refs/heads/srj/xt-uint-cast-test
- refs/heads/srj/xtensa-arch
- refs/heads/srj/xtensa-merge
- refs/heads/srj/xvc-experimetn
- refs/heads/srj/zlib-embed
- refs/heads/standalone_autoscheduler
- refs/heads/standalone_autoscheduler_arm_worker
- refs/heads/standalone_autoscheduler_arm_worker_amazon
- refs/heads/standalone_autoscheduler_gpu
- refs/heads/standalone_autoscheduler_hexagon
- refs/heads/sticky_task_assignments
- refs/heads/store_with
- refs/heads/store_with_solver_for_super_simplify
- refs/heads/strict_float_cse_fix
- refs/heads/super_simplify
- refs/heads/super_simplify_v2
- refs/heads/super_simplify_v3
- refs/heads/transitive_wrapper
- refs/heads/trigger-release-v16
- refs/heads/tzumao-autodiff-boundarycond
- refs/heads/tzumao-gradient-autoscheduler-bug
- refs/heads/tzumao-predicate-store-load
- refs/heads/tzumao-python-buffer
- refs/heads/tzumao_autodiff_unbounded
- refs/heads/tzumao_improve_gradient_autoscheduler
- refs/heads/tzumao_issue_4297
- refs/heads/tzumao_licm_before_BI
- refs/heads/unbounded_bugs
- refs/heads/undo_async_copy_chain_black_list
- refs/heads/use_string_literals_for_blobs
- refs/heads/users/lukas/python-pip
- refs/heads/validate_sched_error_msg
- refs/heads/var_ir_fix
- refs/heads/vksnk/async-experiment
- refs/heads/vksnk/async-multiple-producers
- refs/heads/vksnk/async-order
- refs/heads/vksnk/better-loop-carry
- refs/heads/vksnk/better-message
- refs/heads/vksnk/bound-storage
- refs/heads/vksnk/bounds-widen-right
- refs/heads/vksnk/c-print-type
- refs/heads/vksnk/c-round
- refs/heads/vksnk/check-return-result
- refs/heads/vksnk/compute-with-bug
- refs/heads/vksnk/compute_with_async
- refs/heads/vksnk/dma-limit-channels
- refs/heads/vksnk/dma-min-max
- refs/heads/vksnk/expr-match-shuffle
- refs/heads/vksnk/extract-from-scalar
- refs/heads/vksnk/f16-load
- refs/heads/vksnk/fix-packvr
- refs/heads/vksnk/fix_halide_xtensa_narrow_with_rounding_shift_i16
- refs/heads/vksnk/fused-compute-with
- refs/heads/vksnk/hoist-storage-bug
- refs/heads/vksnk/lerp-intrinsics
- refs/heads/vksnk/lower-signed-shifts
- refs/heads/vksnk/missing-exception
- refs/heads/vksnk/non-widening-halves
- refs/heads/vksnk/optimize-shuffles
- refs/heads/vksnk/replace-all
- refs/heads/vksnk/restrict
- refs/heads/vksnk/roll-buffer
- refs/heads/vksnk/roundeven-arm
- refs/heads/vksnk/rvar-bounds
- refs/heads/vksnk/simplify-slice
- refs/heads/vksnk/skip-semaphores
- refs/heads/vksnk/storage-folding
- refs/heads/vksnk/strided-load-of-4_2
- refs/heads/vksnk/typed-scope
- refs/heads/vksnk/update-simd-driver
- refs/heads/vksnk/vectorize-bug
- refs/heads/vksnk/vectorize-scalarize
- refs/heads/vksnk/widening_absd
- refs/heads/vksnk/xtensa-codegen-fp16
- refs/heads/vksnk/xtensa-dma-improvements
- refs/heads/vksnk/xtensa-regroup-pass
- refs/heads/vksnk/xtensa/lift-allocs
- refs/heads/vulkan
- refs/heads/vulkan-diagnose-alloc-failures
- refs/heads/vulkan-phase0-adts
- refs/heads/vulkan-phase1-spirv
- refs/heads/vulkan-phase2-runtime
- refs/heads/vulkan2
- refs/heads/vulkan_fix_gpu_dynamic_shared_test
- refs/heads/vulkan_fix_subregion_memory_offsets
- refs/heads/webassembly-old
- refs/heads/winograd
- refs/heads/wording_fix
- refs/heads/xtensa-codegen
- refs/heads/xtensa-codegen-parallel
- refs/heads/xuanda/fix-serialize-bad-partition-always
- refs/remotes/origin/rootjalex/add_autosched_caching
- refs/tags/release_2018_02_15
- refs/tags/release_2019_08_27
- refs/tags/release_8.0.0
- refs/tags/v10.0.0
- refs/tags/v10.0.1
- refs/tags/v11.0.0
- refs/tags/v11.0.1
- refs/tags/v12.0.0
- refs/tags/v12.0.1
- refs/tags/v13.0.0
- refs/tags/v13.0.1
- refs/tags/v13.0.2
- refs/tags/v13.0.3
- refs/tags/v13.0.4
- refs/tags/v14.0.0
- refs/tags/v15.0.0
- refs/tags/v15.0.1
- refs/tags/v16.0.0
- refs/tags/v17.0.0
- refs/tags/v17.0.1
- refs/tags/v17.0.2
- refs/tags/v18.0.0
- refs/tags/v8.0.0
- c8dcb4ca69ca6159ab27d7dd5db89c2523ab6a65
Take a new snapshot of a software origin
If the archived software origin currently browsed is not synchronized with its upstream version (for instance when new commits have been issued), you can explicitly request Software Heritage to take a new snapshot of it.
Use the form below to proceed. Once a request has been submitted and accepted, it will be processed as soon as possible. You can then check its processing state by visiting this dedicated page.Processing "take a new snapshot" request ...
Revision | Author | Date | Message | Commit Date |
---|---|---|---|---|
c8dcb4c | Steven Johnson | 17 September 2024, 15:37:41 UTC | Fix for top-of-tree LLVM (#8421) * Fix for top-of-tree LLVM * Update simd_op_check_sve2.cpp | 17 September 2024, 15:37:41 UTC |
4d368bf | Andrew Adams | 15 September 2024, 14:33:28 UTC | Reschedule the matrix multiply performance app (#8418) Someone was using this as a reference expert schedule, but it was stale and a bit simplistic for large matrices. I rescheduled it to get a better fraction of peak. This also now demonstrates how to use rfactor to block an sgemm over the k axis. | 15 September 2024, 14:33:28 UTC |
6fb13b7 | Andrew Adams | 15 September 2024, 14:32:24 UTC | Add missing backslash (#8419) | 15 September 2024, 14:32:24 UTC |
a65221b | Alex Reinking | 10 September 2024, 02:54:29 UTC | Include our Markdown documentation in the Doxygen site. (#8417) A few quirks in the Markdown parser were worked around here. The most notable is that the sequence `]:` causes Doxygen to interpret a would-be link as a trailing reference even if it is not at the start of a line. Duplicating the single bracket reference is a portable workaround, i.e. [winget] ~> [winget][winget] It also doesn't stop interpreting `@` directives inside inline code, so it warns about our use of the `@` as a decorator symbol inside Python.md. | 10 September 2024, 02:54:29 UTC |
3e6e7e0 | Alex Reinking | 09 September 2024, 23:08:26 UTC | Link to PyPI from Doxygen index.html (#8415) | 09 September 2024, 23:08:26 UTC |
07fecc9 | Alex Reinking | 09 September 2024, 23:08:09 UTC | Make run-clang-tidy.sh work on macOS (#8416) | 09 September 2024, 23:08:09 UTC |
f658eec | Alex Reinking | 07 September 2024, 17:03:44 UTC | Fix classifier spelling (#8413) PyPI rejected this because of a spacing issue. | 07 September 2024, 17:03:44 UTC |
37300e3 | Alex Reinking | 06 September 2024, 19:33:54 UTC | Merge pull request #8412 * Update pip package metadata * Link to the CMake package docs from Doxygen * Fix invalid Doxygen annotation in Serialization.h | 06 September 2024, 19:33:54 UTC |
63609cc | Alex Reinking | 06 September 2024, 18:19:27 UTC | Document how to find Halide from a pip installation (#8411) | 06 September 2024, 18:19:27 UTC |
3a34741 | Alex Reinking | 05 September 2024, 21:17:03 UTC | Big documentation update (#8410) | 05 September 2024, 21:17:03 UTC |
95ebd01 | Alex Reinking | 04 September 2024, 03:43:07 UTC | Pip packaging at last! (#8405) | 04 September 2024, 03:43:07 UTC |
b3c8c8b | Alex Reinking | 04 September 2024, 00:53:03 UTC | Support CMAKE_OSX_ARCHITECTURES (#8390) | 04 September 2024, 00:53:03 UTC |
97eeaf0 | Andrew Adams | 02 September 2024, 04:32:50 UTC | Update README.md (#8404) The instructions for which llvm to acquire were stale | 02 September 2024, 04:32:50 UTC |
b87f2b1 | Alex Reinking | 29 August 2024, 16:54:41 UTC | Fix _Float16 detection on ARM64 GCC<13 (#8401) GCC 12 only supports _Float16 on x86. Support for ARM was added in GCC 13. This causes a build failure in the manylinux_2_28 images. | 29 August 2024, 16:54:41 UTC |
45518ac | Steven Johnson | 23 August 2024, 17:00:23 UTC | Fix incorrect std::array sizes in Target.cpp (#8396) | 23 August 2024, 17:00:23 UTC |
9864bd4 | Alex Reinking | 16 August 2024, 20:51:21 UTC | Fix bundling error on buildbots (#8392) LLVM as it is built on the buildbots depends on `-lrt`, which is not a target. Filter out non-target dependencies from consideration. | 16 August 2024, 20:51:21 UTC |
4f30d2b | Andrew Adams | 16 August 2024, 18:41:55 UTC | Partially apply clang-tidy fixes we don't enforce yet (#8376) * Partially apply clang-tidy fixes we don't use yet - Put a bunch of stuff into anonymous namespaces - Delete some redundant casts (e.g. casting an int to int) - Add some const refs to avoid copies - Remove meaningless inline qualifiers on in-class definitions and constexpr functions - Remove return-with-value from functions returning void - Delete a little dead code - Use std::min/max where appropriate - Don't use a variable after std::forwarding it. It may have been moved from. - Use std::string::empty instead of comparing length to zero * Undo unintentional formatting change * Restore some necessary casts * Add NOLINT to silence older clang-tidy | 16 August 2024, 18:41:55 UTC |
818f42d | Martijn Courteaux | 13 August 2024, 22:03:31 UTC | Fix for the removed DataLayout constructor. (#8391) * Fix for the removed DataLayout constructor. * Update CodeGen_LLVM.cpp * Update CodeGen_LLVM.cpp * Update CodeGen_LLVM.cpp --------- Co-authored-by: Steven Johnson <srj@google.com> | 13 August 2024, 22:03:31 UTC |
6dcdfb5 | Alex Reinking | 12 August 2024, 02:25:00 UTC | Support using vcpkg to build dependencies on all platforms (#8387) This PR adds support for using vcpkg to acquire and build Halide's dependencies on all platforms. It adds a top-level `vcpkg.json` file that explains the relationship between Halide's features and its dependencies. These features include the various LLVM `target-`s (which merely imply a dependency on the corresponding LLVM backend), `serialization` (flatbuffers), the `python-bindings` (pybind11), the `wasm-executor` (wabt), and a few meta-features: * `jit`: enables LLVM targets corresponding to the host system * `target-all`: enables all LLVM targets * `tests`: depends on everything needed for the tests and apps * `developer`: includes all other features All of these are optional (since x86 and WebAssembly are forced), but `jit` and `serialization` are on by default. vcpkg is intended to be an eventual replacement for FetchContent, at least on the buildbots. It will accelerate builds beyond ccache by directly restoring binary caches for our dependencies. Unlike FetchContent, it does not pollute our build with third-party CMake code. Indeed, our build has no idea at all when vcpkg is in use. The primary drawback is that vcpkg installation happens during (or ahead of) configuration time, so there is some initial wait. ## Try it! I have provided many CMake presets to ease adoption. As long as you have `VCPKG_ROOT` set to a fresh clone of `vcpkg`, they should work. They come in two flavors: * `vcpkg`: this acquires dependencies from the main vcpkg registry, but applies our own overlay, which disables building Python 3 (really!) and LLVM. The system is searched for these as usual. * `vcpkg-full`: this disables the Halide overlay and attempts to build ALL dependencies. All these presets enable the `developer` feature in `VCPKG_MANIFEST_FEATURES`, which can be overridden in the usual way. Here are the commands you should use to try it locally: * On Linux or Windows: `cmake --preset release-vcpkg` * On macOS: `cmake --preset macOS-vcpkg` * To use Visual Studio: `cmake --preset win32`. Here, `vcpkg` is implied and `-vcpkg-full` can be added to build LLVM. | 12 August 2024, 02:25:00 UTC |
6dc2b3e | Alex Reinking | 12 August 2024, 02:24:25 UTC | Rewrite bundle_static to be much more efficient. (#8386) The `bundle_static` function now detects the private static dependencies on the given target (in our case, always Halide) and uses the platform librarian tool to merge static dependencies into a static library. It picks which tool to use by checking, in order: * When targeting Windows, it looks for `lib.exe`. * When targeting macOS, it checks if `libtool` is the Apple libtool. * Whether `ar` is GNU ar and if so, generates an MRI script. * Otherwise, a `FATAL_ERROR` is issued. To mark a static library for bundling, we link privately and use the `$<BUILD_LOCAL_INTERFACE:...>` generator expression. This prevents it from being exported, too. The generator expression that implements this logic is quite complex. It involves meta-programming generator expressions during evaluation and then evaluating them. Even so, this saves a considerable amount of time unpacking LLVM into a temporary directory and adding the objects to the link line (the previous approach). | 12 August 2024, 02:24:25 UTC |
3cdeb53 | Alex Reinking | 10 August 2024, 02:55:07 UTC | Scan generated export files to determine dependencies. (#8385) This commit contains a module for declaring that an export file might depend on another CMake package that was found by find_package. Such dependencies are collected in a project-wide property (rather than a variable) along with a snippet of code that reconstructs the original call. Then, after we have installed an export file via install(EXPORT), we can call a helper to add install rules that will read the file as-generated by CMake to check whether any of these packages could be required. CMake does not like to expose this information, in part because generator expressions make computing the eventual link set undecidable. Even so, for our purposes if Pkg:: appears in our link-libraries list, then we need to find_package(Pkg). This module implements that heuristic. So why is this hard? It's because checking whether a dependency is actually included is very complicated. A library will appear if: 1. It is SHARED or MODULE 2. It linked privately to a STATIC target - These appear as $<LINK_ONLY:${dep}> 3. It is STATIC and linked publicly to a SHARED target; 4. It is INTERFACE or ALIAS and linked publicly 5. It is included transitively via (4) and meets (1), (2), or (3) 6. I am not sure this set of rules is exhaustive. There is an experimental feature in CMake 3.30 that will some day replace this module. | 10 August 2024, 02:55:07 UTC |
7b53a88 | Alex Reinking | 09 August 2024, 22:30:51 UTC | Introduce HalideFeatures system for optional components (#8384) Previously, our `option()` declarations were scattered and not well documented. They certainly weren't self-documenting. Some of them depended on other options and used various ways to handle conflicts. Sometimes inconsistencies were handled with fatal errors, other times by silently overriding an option. With this PR, we introduce a new `Halide_feature` function that is designed to handle interdependent options and default initialization in a much more regular way. It behaves very much like option in its first three parameters: Halide_feature(CMAKE_FLAG "documentation string" DEFAULT_VALUE) Only now `DEFAULT_VALUE` can be more intelligent than simply `ON` or `OFF`. It can also be `TOP_LEVEL`, which is `ON` iff `CMAKE_PROJECT_TOP_LEVEL` is true. It can also be `AUTO` which is `ON` iff the `DEPENDS` clause is defined and true. For example, Halide_feature(WITH_TEST_RUNTIME "Build runtime tests" AUTO DEPENDS NOT MSVC) If a feature is set to `ON` but its `DEPENDS` clause is false, a warning will be issued and the feature will be forced `OFF` in the cache. Furthermore, these features register their documentation strings with the built-in `FeatureSummary` system so now instead of a stream of easy-to-miss messages, the configuration ends with a summary of what is enabled and disabled: -- The following features have been enabled: * Halide_ENABLE_EXCEPTIONS, Enable exceptions in Halide * Halide_ENABLE_RTTI, Enable RTTI in Halide * WITH_AUTOSCHEDULERS, Build the Halide autoschedulers * WITH_PACKAGING, Halide's CMake package install rules * WITH_PYTHON_BINDINGS, Halide's native Python module (not the whole pip package) * WITH_SERIALIZATION, Include experimental Serialization/Deserialization code * WITH_TESTS, Halide's unit test suite * WITH_TUTORIALS, Halide's tutorial code * WITH_UTILS, Optional utility programs for Halide, including HalideTraceViz * WITH_TEST_AUTO_SCHEDULE, Build autoscheduler tests * WITH_TEST_CORRECTNESS, Build correctness tests * WITH_TEST_ERROR, Build error tests * WITH_TEST_WARNING, Build warning tests * WITH_TEST_PERFORMANCE, Build performance tests * WITH_TEST_GENERATOR, Build generator tests * WITH_TEST_RUNTIME, Build runtime tests -- The following features have been disabled: * WITH_DOCS, Halide's Doxygen documentation * WITH_TEST_FUZZ, Build fuzz tests A feature may be marked as `ADVANCED`, which excludes it from the feature summary unless the log level is set to verbose. It also marks it as advanced in the cache, which hides it from the default view in the CMake GUI and the curses-TUI. Finally, features are computed early in the build so that subdirectories see a consistent view. Some generator tests that were broken under static Halide (meaning no autoschedulers) are now properly skipped by directly checking `WITH_AUTOSCHEDULERS`. | 09 August 2024, 22:30:51 UTC |
ff538b1 | Alex Reinking | 09 August 2024, 19:10:20 UTC | Reflow src/CMakeLists.txt in logical groups (#8383) * style: move core features closer to library definition * style: move target export script to its own section * style: group LLVM and GPU backends together | 09 August 2024, 19:10:20 UTC |
ba08522 | Alex Reinking | 09 August 2024, 16:58:42 UTC | Remove vestigial AMDGPU backend (#8382) The backend was started in 2018 but never completed. Removing the stale references reduces confusion. | 09 August 2024, 16:58:42 UTC |
0058528 | Alex Reinking | 09 August 2024, 16:25:32 UTC | Rework LLVM into Find module and enact new component policy. (#8379) Our usage of LLVM now requires at least the X86 and WebAssembly backends. We also now unconditionally enable all backends supported by the LLVM we found. | 09 August 2024, 16:25:32 UTC |
8643007 | Alex Reinking | 09 August 2024, 16:23:15 UTC | Fix Numpy 2.0 compatibility bug in lesson 10 (#8381) Numpy 2.0 no longer performs narrowing conversions automatically. We manually mask here instead. Fixes #8380 | 09 August 2024, 16:23:15 UTC |
6f650c6 | Roman Lebedev | 09 August 2024, 16:18:19 UTC | Two more build fixes (#8371) * Integration test: do forward C/CXX compiler to the inner CMake invocation * `_Float16`: on i386, needs gcc14 + SSE2 It is not known by GCC13: https://ci.debian.net/packages/h/halide/testing/i386/50047733/ and fails with ``` /usr/bin/g++ -DHALIDE_ENABLE_RTTI -DHALIDE_VERSION_MAJOR=18 -DHALIDE_VERSION_MINOR=0 -DHALIDE_VERSION_PATCH=0 -DHALIDE_WITH_EXCEPTIONS -isystem /usr/include/halide18 -O3 -DNDEBUG -MD -MT CMakeFiles/main.dir/main.cpp.o -MF CMakeFiles/main.dir/main.cpp.o.d -o CMakeFiles/main.dir/main.cpp.o -c /tmp/autopkgtest.pviDWM/build.Sjp/src/test/integration/jit/main.cpp In file included from /tmp/autopkgtest.pviDWM/build.Sjp/src/test/integration/jit/main.cpp:1: /usr/include/halide18/Halide.h: In member function ‘Halide::float16_t::operator _Float16() const’: /usr/include/halide18/Halide.h:3054:40: error: SSE register return with SSE2 disabled 3054 | explicit operator _Float16() const { | ^ /usr/include/halide18/Halide.h:3057:16: error: SSE register return with SSE2 disabled 3057 | return result; | ^~~~~~ /usr/include/halide18/Halide.h: In constructor ‘Halide::Expr::Expr(_Float16)’: /usr/include/halide18/Halide.h:4679:64: error: invalid conversion from type ‘_Float16’ without option ‘-msse2’ 4679 | : IRHandle(Internal::FloatImm::make(Float(16), (double)x)) { | ^ ninja: build stopped: subcommand failed. ``` with GCC14. | 09 August 2024, 16:18:19 UTC |
56f14c8 | Alex Reinking | 09 August 2024, 01:40:20 UTC | Replace FetchContent with a custom dependency provider (#8378) The build no longer uses FetchContent, instead using find_package always and everywhere. When Halide is the top-level project, it will (by default) inject a dependency provider that overrides the wabt, flatbuffers, and pybind11 packages with FetchContent. Users can opt out by setting Halide_USE_FETCHCONTENT=NO. This also bumps the required wabt version to the latest release (1.0.36). This version includes a patch I submitted that fixes the CMake package when wabt is built with OpenSSL rather than picosha2. Here are relevant links to the docs: * https://cmake.org/cmake/help/latest/guide/using-dependencies/index.html#dependency-providers * https://cmake.org/cmake/help/latest/command/cmake_language.html#dependency-providers | 09 August 2024, 01:40:20 UTC |
3ed55b4 | Alex Reinking | 08 August 2024, 18:23:44 UTC | Move dependencies/wasm to use sites (#8377) Also replace WITH_WABT and WITH_V8 with Halide_WASM_BACKEND, which can be either wabt, V8, or a CMake false-y value such as OFF. Deprecation notices are provided to ease user transitions. | 08 August 2024, 18:23:44 UTC |
8feee81 | Alex Reinking | 08 August 2024, 03:39:50 UTC | Use a Find module for NodeJS (#8374) | 08 August 2024, 03:39:50 UTC |
206c03f | Alex Reinking | 08 August 2024, 03:39:31 UTC | Use a Find module for V8 (#8373) This also adjusts the cache variable names to follow the conventions set forth in the CMake documentation, here: https://cmake.org/cmake/help/latest/manual/cmake-developer.7.html#standard-variable-names | 08 August 2024, 03:39:31 UTC |
59da730 | Alex Reinking | 07 August 2024, 22:31:19 UTC | Clean up autoscheduler dependencies (#8372) | 07 August 2024, 22:31:19 UTC |
40ab265 | Alex Reinking | 07 August 2024, 14:57:28 UTC | List headers with target_sources FILE_SETS (#8370) Removes instances of target_include_directories and installation rules based on those directories. These are now automatically computed from the BASE_DIRS (defaults to current source dir) argument to target_sources. This models the build more accurately and avoids accidental installation of unwanted headers. Also forces us to think about the linking relationships between components; ideally this will result in a more accurate build graph. | 07 August 2024, 14:57:28 UTC |
17bd517 | Alex Reinking | 06 August 2024, 21:44:19 UTC | Clean up serialization build code (#8369) | 06 August 2024, 21:44:19 UTC |
37ab461 | Alex Reinking | 06 August 2024, 15:36:19 UTC | Distribute GenGen as a static library (#8367) Also use a mutex and timestamp checking to ensure that multiple generators in the same directory do not race to place Halide.dll next to them on Windows. | 06 August 2024, 15:36:19 UTC |
1a7b914 | Alex Reinking | 02 August 2024, 18:45:58 UTC | Quick CMake fixes enabled by 3.28 (#8365) * Use FindCUDAToolkit in apps/cuda_mat_mul * Replace dummy FindHalide.cmake with pkg redirects Having the dummy file on disk is confusing and is easy to accidentally install when modifying CMake install rules. Better to use the CMake 3.24+ feature of the package redirects dir to truly disable find_package for Halide inside the build. * Avoid creating a dummy file for Halide_Python * Fix formatting in CMakeLists.txt * Add SpirvIR.h to the list of Halide sources * Use BUILD_LOCAL_INTERFACE in PyStubs * Consistently use HALIDE_H variable * Comment overriding POSITION_INDEPENDENT_CODE * Check Halide_STATIC_DEFINE at configure time. This avoids sending a generator expression downstream. These are functionally identical, but it's just one less thing to evaluate. * Use BUILD_LOCAL_INTERFACE for SPIRV-Headers | 02 August 2024, 18:45:58 UTC |
14035e3 | Ralf W. Grosse-Kunstleve | 02 August 2024, 17:29:08 UTC | Make pybind11 minimum version check compatible with pybind11 v3. (#8366) Concretely: https://github.com/pybind/pybind11/blob/48f25275c44d52d0ceade122e328dc1f2e48ef44/include/pybind11/detail/common.h#L12-L14 This is needed for a Google-internal deployment, but is a useful fix regardless. | 02 August 2024, 17:29:08 UTC |
1872788 | Steven Johnson | 01 August 2024, 16:11:07 UTC | Add helper functions to query properties of the lowered Target (#8192) (#8359) * Add helper functions to query properties of the lowered Target (#8192) * Add Python bindings * clang-format * clang-format * Add comments | 01 August 2024, 16:11:07 UTC |
837308f | Alex Reinking | 01 August 2024, 13:41:55 UTC | Bump CMake minimum version to 3.28 (#8363) This is in line with our policy to track the version included in the latest Ubuntu LTS. Version 24.04 LTS now includes CMake 3.28. | 01 August 2024, 13:41:55 UTC |
77e5dd1 | Alex Reinking | 01 August 2024, 02:25:59 UTC | Remove warning for unsupported compilers (#8362) | 01 August 2024, 02:25:59 UTC |
423df3c | Roman Lebedev | 31 July 2024, 16:15:47 UTC | `Python_bindings`-test-as-installed (#8355) Support not building python bindings, while running python tests against installed halide, and call `enable_testing()` there so that `ctest` can work. | 31 July 2024, 16:15:47 UTC |
15c181f | Steven Johnson | 29 July 2024, 21:04:49 UTC | Drop support for LLVM 16 in main (#8358) * Drop support for LLVM 16 in main Per policy, Halide 19 will support LLVM 17, 18, 19 (plus top-of-tree which is 20) * clang-format | 29 July 2024, 21:04:49 UTC |
c7e1b99 | Steven Johnson | 29 July 2024, 20:52:40 UTC | Bump Halide version to 19 in main branch (#8357) * Bump Halide version to 19 in main branch * Update setup.py | 29 July 2024, 20:52:40 UTC |
e9b9bdc | Steven Johnson | 26 July 2024, 18:13:49 UTC | Don't use le32/le64 (#8344) Use i386/x86-64 and wasm32/wasm64 targets instead of le32/le64 for the runtime. | 26 July 2024, 18:13:49 UTC |
5d1472f | Steven Johnson | 23 July 2024, 21:56:54 UTC | Allow LLVM 20 (#8352) | 23 July 2024, 21:56:54 UTC |
bebb888 | Steven Johnson | 23 July 2024, 17:37:37 UTC | Add ARMv8.x feature flags (#4489) * Add ARMv8.3a feature flag This allows selecting the ARMv8.3a feature set via a new Feature flag. We don't (yet) add any specialization to our codegen (beyond what LLVM will do for us under the hood). * Update CodeGen_ARM.cpp * Update CodeGen_ARM.cpp * Update CodeGen_ARM.cpp * Add Features for all the ARM v8.x architectures * Update CodeGen_ARM.cpp * Fixes * get_runtime_compatible_target() should use meet * Add ARMv8a * trigger buildbots | 23 July 2024, 17:37:37 UTC |
b741d9c | Martijn Courteaux | 16 July 2024, 15:34:53 UTC | Adaptive Dark colorscheme for Stmt HTML. Ability to programmatically export conceptual stmt files. (#8327) * A few color tweaks for a darker colorscheme. * Dark color scheme for Stmt HTML. Ability to programatically export the conceptual stmt files. * Toolbar for HTML Stmt viewer with various settings. * Cleanup. | 16 July 2024, 15:34:53 UTC |
a05f459 | Martijn Courteaux | 16 July 2024, 15:34:08 UTC | Fix injection of GPU buffers that do not go by a Func name (i.e. alloc groups). (#8333) * Fix injection of GPU buffers that do not go by a Func name (i.e. alloc groups). * Cleanup | 16 July 2024, 15:34:08 UTC |
0f34e2f | Alex Reinking | 15 July 2024, 16:15:40 UTC | Detect ARM CPU features for host target and in runtime (#8298) Adds feature detection for ARM CPUs to the runtime library and to the host target feature computation. Supports Windows, macOS, Linux, iOS, and Android. Also fix bug in Type::max() and Type::min() for float16. Fixes #4727 Fixes #6106 Fixes #7901 Fixes #7979 Fixes #8340 | 15 July 2024, 16:15:40 UTC |
461c128 | Li-Huai (Allan) Lin | 28 June 2024, 05:21:44 UTC | Fix incorrect output in Python tutorial, lesson 5 (#8331) | 28 June 2024, 05:21:44 UTC |
a6f5ca4 | Steven Johnson | 27 June 2024, 00:01:10 UTC | Remove remaining dregs of tuple_select (oops) (#8329) * Remove remaining dregs of tuple_select (oops) * Update tuple_select.py | 27 June 2024, 00:01:10 UTC |
a4a7531 | Andrew Adams | 26 June 2024, 21:30:20 UTC | Consider *all* Exprs a func uses, not just the RHS, in Li2018 (#8326) Fixes #8312 | 26 June 2024, 21:30:20 UTC |
cab27d8 | Andrew Adams | 26 June 2024, 16:08:15 UTC | Fix horrifying bug in lossless_cast of a subtract (#8155) * Fix horrifying bug in lossless_cast of a subtract * Use constant integer intervals to analyze safety for lossless_cast TODO: - Dedup the constant integer code with the same code in the simplifier. - Move constant interval arithmetic operations out of the class. - Make the ConstantInterval part of the return type of lossless_cast (and turn it into an inner helper) so that it isn't constantly recomputed. * Fix ARM and HVX instruction selection Also added more TODOs * Using constant_integer_bounds to strengthen FindIntrinsics In particular, we can do better instruction selection for pmulhrsw * Move new classes to new files Also fix up Monotonic.cpp * Make the simplifier use ConstantInterval * Handle bounds of narrower types in the simplifier too * Fix * operator. Add min/max/mod * Add cache for constant bounds queries * Fix ConstantInterval multiplication * Add a simplifier rule which is apparently now necessary * Misc cleanups and test improvements * Add missing files * Account for more aggressive simplification in fuse test * Remove redundant helpers * Add missing comment * clear_bounds_info -> clear_expr_info * Remove bad TODO I can't think of a single case that could cause this * It's too late to change the semantics of fixed point intrinsics * Fix some UB * Stronger assert in Simplify_Div * Delete bad rewrite rules * Fix bad test when lowering mul_shift_right b_shift + b_shift < missing_q * Avoid UB in lowering of rounding_shift_right/left * Add shifts to the lossless cast fuzzer This required a more careful signed-integer-overflow detection routine * Fix bug in lossless_negate * Add constant interval test * Rework find_mpy_ops to handle more structures * Fix bugs in lossless_cast * Fix mul_shift_right expansion * Delete commented-out code * Don't introduce out-of-range shifts in lossless_cast * Some constant folding only happens after lowering intrinsics in codegen --------- Co-authored-by: Steven Johnson <srj@google.com> | 26 June 2024, 16:08:15 UTC |
9b703f3 | Alex Reinking | 25 June 2024, 23:59:43 UTC | Provide a minimum OS version for MachO objects (#8323) This gives LLVM enough information to generate a "platform load-command" in the object file. Fixes #7941 | 25 June 2024, 23:59:43 UTC |
dd6c98b | Steven Johnson | 25 June 2024, 23:02:16 UTC | Correct the Halide version number in setup.py (#8325) | 25 June 2024, 23:02:16 UTC |
3d20677 | Steven Johnson | 25 June 2024, 22:24:12 UTC | Remove deprecated operators (#8321) tuple_select and the Internal versions of various fixed-point helpers were deprecated in Halide 17; we should remove them entirely for Halide 18. | 25 June 2024, 22:24:12 UTC |
a0e1dc0 | Martijn Courteaux | 25 June 2024, 22:23:57 UTC | Fix device slices for Buffer with fixed dimensionality in template. (#8313) Co-authored-by: Steven Johnson <srj@google.com> | 25 June 2024, 22:23:57 UTC |
8ff261e | Andrew Adams | 25 June 2024, 20:10:43 UTC | Per-pipeline-invocation profiling (#8153) * Profiler tracks per-invocation state, instead of global state This should give better results when multiple Halide pipelines are running at the same time. * Profiler improvements - Don't profile bounds queries - Simplify layout calculation - Bill time after decrementing main thread as overhead, not waiting on parallel tasks - Change waiting on parallel tasks label * name hygiene * Fix signature * Fix tracking of pipeline-level memory statistics * Address review comments * Pacify clang-tidy * [Hexagon] Profiling changes for abadams/per_instance_profiling (#8187) * Get abadams/per_instance_profiling working on hvx * More changes * Add Hexagon libraries * Fix multiple instances of profiler_state * Update hexagon libraries * clang-format --------- Co-authored-by: Steven Johnson <srj@google.com> Co-authored-by: aankit-quic <166656642+aankit-quic@users.noreply.github.com> | 25 June 2024, 20:10:43 UTC |
1449692 | Steven Johnson | 25 June 2024, 17:30:08 UTC | Remove Introspection (#8273) * Remove Introspection Introspection (to provide better error messages + automatic var/func/etc names) has always been kinda handy but kinda fragile, and with the evolution of the DWARF standard it's become broken for newer compilers. We don't have the bandwidth to fix it, and many large customers (e.g. Google) have never been able to rely on it, and given that it can cause crashes in some unusual situations (e.g. when embedded inside a Go app), it's time to say goodbye. Alas! Poor Introspection. I knew him, Horatio. A feature of infinite jest, of most excellent fancy. It hath borne me on his back a thousand times. * Update Deserialization.cpp --------- Co-authored-by: Andrew Adams <andrew.b.adams@gmail.com> | 25 June 2024, 17:30:08 UTC |
8c836b3 | Yueming Hao | 25 June 2024, 16:44:18 UTC | Update README_cmake.md (#8322) The requirements.txt is in the root of the repository now. | 25 June 2024, 16:44:18 UTC |
84bb8ee | Steven Johnson | 24 June 2024, 23:15:59 UTC | Fixes for top-of-tree LLVM (#8314) | 24 June 2024, 23:15:59 UTC |
5f6fc26 | Derek Gerstmann | 23 June 2024, 21:00:55 UTC | [vulkan] Dynamically load Vulkan loader library. Avoid Validation Layer crash on exit. (#8289) * Remove the compile-time link dependency for the Vulkan loader, and resolve the instance methods dynamically. Update the Vulkan readme to match the latest information regarding the SDK packages. * Formatting pass.w * Add runtime check to verify shared memory amount used in pipeline can be run on device * Fix platform ifdefs for Vulkan library names (normal ones arent defined when the runtime is compiled). * Detect if VK_LAYER_KHRONOS_validation is enabled, and bypass the module destructor which calls halide_vulkan_device_release() to avoid a segfault (at the cost of leaking!). See https://github.com/halide/Halide/issues/8290. Refactor and cleanup halide_vulkan_device_release(). Add vk_destroy_context() methods. * Fix GPU object lifetime AOT test to use TEST_VULKAN macro. * Fix clang tidy warning for usage of static in anonymous namespace * Disable Vulkan validation layer for leak tests (or we'll leak). * Add vk_validate_shader_for_device() method to check shader bindings against device limits prior to compiling to verify shader compatibility. --------- Co-authored-by: Derek Gerstmann <dgerstmann@adobe.com> Co-authored-by: Andrew Adams <andrew.b.adams@gmail.com> | 23 June 2024, 21:00:55 UTC |
61df9ba | Andrew Adams | 23 June 2024, 02:37:18 UTC | Add ability to pass explicit RDom to Function::define_update (#8284) * Add ability to pass explicit RDom to Function::define_update And use it in rfactor. There are cases where an RDom is attached to the original Func but not actually referred to in the LHS or RHS. Fixes #8282 * Fix comment | 23 June 2024, 02:37:18 UTC |
22367de | Andrew Adams | 23 June 2024, 02:37:05 UTC | Don't try to codegen predicated atomic stores (#8285) * Don't try to codegen predicated atomic stores By disabling predication if an Atomic node is found. Fixes #8280. * Add clarifying comment | 23 June 2024, 02:37:05 UTC |
155d693 | Andrew Adams | 22 June 2024, 23:15:23 UTC | Fix incorrect type in emulation of float16 is_inf/nan (#8310) Fixes #8309 | 22 June 2024, 23:15:23 UTC |
198c25e | Martijn Courteaux | 21 June 2024, 23:03:52 UTC | scoped_truth for the loop variable being always less than the loop extent. (#8306) * scoped_truth for the loop variable being always less than the loop extent. * Correctify the range. * Complementary scoped_truth for the loop lower bound. | 21 June 2024, 23:03:52 UTC |
b921710 | Alex Reinking | 20 June 2024, 19:12:15 UTC | Fix OpenCL positive and negative INF constants. (#8266) | 20 June 2024, 19:12:15 UTC |
ea775cc | Alex Reinking | 20 June 2024, 15:59:42 UTC | Use upstream interface for consuming SPIR-V (#8265) | 20 June 2024, 15:59:42 UTC |
f9ccd5c | Shoaib Kamil | 14 June 2024, 18:24:17 UTC | No longer silently hide errors in Metal completion handlers (alternative approach) (#8240) * No longer silently hide errors in Metal completion handlers * Actually implement alternative * clang-format * Implement new API * Implement test and refine the API * Format. * Remove some debug code * Add missing includes. * Add comment noting why we manually null-terminate after strncpy * Reverse engineer Objective-C API for passing void* in a block; it turns out to be much simpler than I thought * Formatting * Don't add const-ness to declaration. --------- Co-authored-by: Steven Johnson <srj@google.com> | 14 June 2024, 18:24:17 UTC |
6c8a491 | Andrew Adams | 10 June 2024, 16:38:57 UTC | Fix typo in Simplify_Let.cpp (#8274) | 10 June 2024, 16:38:57 UTC |
340136f | Andrew Adams | 07 June 2024, 18:28:21 UTC | Stop region costs from complaining about new intrinsics (#8262) Now by default it will treat them as cost one, unless you tell it otherwise. | 07 June 2024, 18:28:21 UTC |
4b67712 | Derek Gerstmann | 05 June 2024, 21:43:13 UTC | [vulkan] Fix Vulkan SIMT mappings for GPU loop vars. (#8259) * Fix Vulkan SIMT mappings for GPU loop vars. Previous refactoring accidentally used the fully qualified var name rather than the categorized vulkan intrinsic name. * Avoid formatting the GPU kernel to a string for Vulkan (since it's binary SPIR-V needs to remain intact). --------- Co-authored-by: Derek Gerstmann <dgerstmann@adobe.com> Co-authored-by: Steven Johnson <srj@google.com> | 05 June 2024, 21:43:13 UTC |
74b9044 | Andrew Adams | 05 June 2024, 16:24:06 UTC | It's generally a bad idea for simplifier rules to multiply constants (#8234) Fixes #8227 but may break other things. Needs thorough testing. Also, there are more rules like this lurking. | 05 June 2024, 16:24:06 UTC |
46e866d | Martijn Courteaux | 04 June 2024, 16:32:54 UTC | Report useful error to user if the promise_clamp all fails to losslessly cast. (#8238) Co-authored-by: Steven Johnson <srj@google.com> | 04 June 2024, 16:32:54 UTC |
775bfbf | Jason Ansel | 04 June 2024, 16:31:30 UTC | Python binding support for int64 literals (#8254) This makes >32bit python integers get mapped to `hl.i64` implicitly. Fixes #8224 | 04 June 2024, 16:31:30 UTC |
9c75554 | Shoaib Kamil | 04 June 2024, 15:21:04 UTC | Fix Metal handling for float16 literals (#8260) * Fix Metal handling of float16 from bits, infinity, neg infinity, and nans * Disable test for OpenCL half for now * Formatting | 04 June 2024, 15:21:04 UTC |
7ca95d8 | Jason Ansel | 02 June 2024, 21:39:44 UTC | Expose BFloat in Python bindings (#8255) There are two parts to support for BFloat16 in Python: 1) Ability to define kernels and AOT compile them [fixed in this PR] 2) Ability to call kernels from Python This fixes part 1, which is what I need for my use case. Part 2 is blocked on bfloat16 support in Python buffer protocols. See #6849 for more details. | 02 June 2024, 21:39:44 UTC |
7cf2951 | Jason Ansel | 02 June 2024, 21:34:36 UTC | Remove max size assert from Anderson2021 (#8253) Fixes #8252 | 02 June 2024, 21:34:36 UTC |
a9b8fbf | Andrew Adams | 02 June 2024, 21:33:45 UTC | Rework the simplifier to use ConstantInterval for bounds (#8222) * Update the simplifier to use ConstantInterval and track the bounds through more types * Move the simplify fuzzer back to a correctness test * Make debug_indent not static Otherwise it causes a race condition in any parallel tests * Track expr info on non-overflowing casts to int * Delete commented-out code * clang-tidy * Delete unused member * Fix cmakelists for the fuzzer removal * Handle contradictions more gracefully in learn_true The contradiction was arising from: if (extent > 0) { ... } else { for (x = 0; x < extent; x++) { In here we can assume extent > 0, but we also know from the if statement that extent <= 0 } } * Better comments * Address review comments * Fix failure to pop loop var info | 02 June 2024, 21:33:45 UTC |
35143d2 | Martijn Courteaux | 02 June 2024, 21:19:04 UTC | Mark host_dirty() and device_dirty() with no_discard. (#8248) Co-authored-by: Steven Johnson <srj@google.com> | 02 June 2024, 21:19:04 UTC |
711dc88 | Cheng Wang | 31 May 2024, 17:53:47 UTC | Add HVX_v68 target to support Hexagon HVX v68. (#8232) | 31 May 2024, 17:53:47 UTC |
33d5ba9 | Andrew Adams | 24 May 2024, 19:56:03 UTC | Fix saturating add matching in associativity checking (#8220) * Fix saturating add matching in associativity checking The associative ops table defined saturating add as saturating_narrow(widen(x + y)), instead of saturating_narrow(widen(x) + y) | 24 May 2024, 19:56:03 UTC |
b5f5065 | Andrew Adams | 23 May 2024, 18:17:49 UTC | Add some EVAL_IN_LAMBDAs to Simplify_Sub.cpp (#8230) Massively reduces compile time and peak cl.exe memory consumption on windows (from 9.5gb down to 2.3gb). Simplify_LT.cpp has these same EVAL_IN_LAMBDAs, which is probably why it hasn't been causing build problems. | 23 May 2024, 18:17:49 UTC |
e9f8b04 | Steven Johnson | 15 May 2024, 21:43:17 UTC | Fix for top-of-tree LLVM (#8223) * Fix for top-of-tree LLVM * Update LLVM_Runtime_Linker.cpp | 15 May 2024, 21:43:17 UTC |
16d77e9 | Andrew Adams | 15 May 2024, 17:43:34 UTC | Fix give-up case in ModulusRemainder (#8221) A default-constructed ModulusRemainder means no information, which is what we want here. ModulusRemainder{0, 1} means the constant one! | 15 May 2024, 17:43:34 UTC |
211bafa | Alexander Root | 14 May 2024, 20:15:57 UTC | Fix Reinterpret cmp in IREquality (#8217) fix Reinterpret cmp | 14 May 2024, 20:15:57 UTC |
dfaf6ad | Steven Johnson | 30 April 2024, 15:08:26 UTC | Insert apparently-missing `break;` in IREquality.cpp (#8211) * Insert apparently-missing `break;` in IREquality.cpp * Enable -Wimplicit-fallthrough * Also add -Wimplicit-fallthrough to runtime builds * Add missing break to runtime/webgpu.cpp * Also add flag to Makefile --------- Co-authored-by: Andrew Adams <andrew.b.adams@gmail.com> | 30 April 2024, 15:08:26 UTC |
8141197 | Alexander Root | 30 April 2024, 13:38:30 UTC | [x86 & HVX & WASM] Use bounds inference for saturating_narrow instruction selection (#7805) * x86 bounds inference for saturating_narrow * bounds inference for HVX too * use can_represent(ConstantInterval) + clang-format * use bounds inference for WASM IS too + add tests * add tracking issue for scoped constant bounds * add TODO about lossless_cast usage --------- Co-authored-by: Steven Johnson <srj@google.com> | 30 April 2024, 13:38:30 UTC |
d55d82b | Steven Johnson | 29 April 2024, 16:38:30 UTC | Update debug_to_file API to remove type_code (#8183) * Add .npy support to halide_image_io The .npy format is NumPy's native format for storing multidimensional arrays (aka tensors/buffers). Being able to load/save in this format makes it (potentially) a lot easier to interchange data with the Python ecosystem, as well as providing a file format that support floating-point data more robustly than any of the others that we current support. This adds load/save support for a useful subset: - We support the int/uint/float types common in Halide (except for f16/bf16 for now) - We don't support reading or writing files that are in `fortran_order` - We don't support any object/struct/etc files, only numeric primitives - We only support loading files that are in the host's endianness (typically little-endian) Note that at present this doesn't support f16 / bf16 formats, but that could likely be added with minimal difficulty. The tricky bit of this is that the reading code has to parse a (limited) Python dict in text form. Please review that part carefully. TODO: we could probably add this as an option for `debug_to_file()` without too much pain in a followup PR. * clang-tidy * clang-tidy * Address review comments * Allow for "keys" as well as 'keys' * Add .npy support to debug_to_file() Built on top of https://github.com/halide/Halide/pull/8175, this adds .npy as an option. This is actually pretty great because it's easy to do something like ``` ss = numpy.load("my_file.npy") print(ss) ``` in Python and get nicely-formatted output, which can sometimes be a lot easier for debugging that inserting lots of print() statements (see https://github.com/halide/Halide/issues/8176) Did a drive-by change to the correctness test to use this format instead of .mat. * Add float16 support * Add support for Float16 images in npy * Assume little-endian * Remove redundant halide_error() * naming convention * naming convention * Test both mat and npy * Don't call halide_error() * Use old-school parser * clang-tidy * Update debug_to_file API to remove type_code * Clean up into single table * Update CodeGen_LLVM.cpp * Fix tmp codes * Update InjectHostDevBufferCopies.cpp * Update InjectHostDevBufferCopies.cpp * trigger buildbots | 29 April 2024, 16:38:30 UTC |
8202163 | Andrew Adams | 28 April 2024, 21:39:41 UTC | More aggressively unify duplicate lets (#8204) * Make unify_duplicate_lets more aggressive The simplifier can also clean up most of these, but it's harder for it because it has to consider that other mutations may have taken place. Beefing this up has no impact on lowering times for most apps, but something pathological was going on for local_laplacian. At 20 pyramid levels, this speeds up lowering by 1.3x. At 50 pyramid levels it's 2.3x. At 100 pyramid levels it's 4.1x. It also slightly reduces binary size. * Clarify comment; Avoid double-lookup into the scope Looking up with an Expr key and deep equality is expensive, so this was bad. * Add a std::move | 28 April 2024, 21:39:41 UTC |
64caf31 | Andrew Adams | 28 April 2024, 21:38:54 UTC | Faster vars used tracking in simplify let visitor (#8205) * Speed up the vars_used visitor in the simplifier let visitor This visitor shows up as the main cost of lowering in very large pipelines. This visitor is for tracking which lets are actually used for real inside the body of a let block (as opposed to the tracking we do when mutating, which is approximate, because we could construct and Expr that uses a Var and then discard it in a later mutation). The old implementation made a map of all variables referenced, and then checked each let name against that map one by one. If there are a small number of lets outside a huge Stmt, this is bad, because the data structure has to hold a number of names proportional to the stmt size instead of proportional to the number of lets. This new implementation instead makes a hash set of the let names, and than traverses the Stmt, removing names from the set as they are encountered. This is a big speed-up. We then make the speed-up larger by about the same factor again doing the following: 1) Only add names to the map that might be used based on the recursive mutate call. These are very very likely to be used, because we saw them at least once, and mutations that remove *all* uses of a Var are rare. 2) The visitor should early out when the map becomes empty. The let variables are often all used immediately, so this is frequent. Speeds up lowering of local laplacian by 1.44x, 2.6x, and 4.8x respectively for 20, 50, and 100 pyramid levels. Speeds up lowering of resnet50 by 1.04x. Speeds up lowering of lens blur by 1.06x * Exploit the ref count of the replacement Expr * Fix is_sole_reference logic in Simplify_Let.cpp * Reduce hash map size | 28 April 2024, 21:38:54 UTC |
302aa1c | Andrew Adams | 25 April 2024, 18:58:23 UTC | Refactor ConstantInterval (#8179) * Make ConstantInterval more of a first-class thing and use it in Monotonic.cpp * Restore bound_correlated_differences calls * Elaborate on TODO * Handle some TODOs Also explicit ignore lossless_cast bugs that will be fixed in #8155 * Fix constant interval mod, clean up constant interval saturating cast * Improve comment * Avoid unsigned overflow * Fix the most obvious bug in lossless_cast, to make the fuzzer pass more * Skip over pipelines that fail the lossless_cast check * Drop iteration count on lossless_cast test * Add test to CMakeLists.txt * Avoid UB in constant_interval test (signed integer overflow of the scalars) * Restore accidentally-deleted line from CMakeLists.txt * Print on success * Handle Lets in constant_integer_bounds Also, plumb the cache through the recursive calls * Delete duplicate operator<< * Just always cast the bounds back to the range of the op type * Address review comments * Redo operator<< for ConstantIntervals * Improve comment; disable buggy code for now | 25 April 2024, 18:58:23 UTC |
e39497b | Andrew Adams | 21 April 2024, 03:43:38 UTC | Make Interval::is_single_point check for deep equality (#8202) * Make is_single_point compare min and max by deep equality Interval::is_single_point() used to only compare expressions by shallow equality to see if they are the same Expr object. However, bounds_of_expr_in_scope is really improved if it uses deep equality instead, so it has a prepass that goes over the provided scope, calls equal(min, max) on everything, and fixes up anything where deep equality is true but shallow equality. This prepass costs O(n) for n things in scope, regardless of how complex the expression being analyzed is. So if you ask for the bounds of '4' say in a context where there are lots of things in the scope, it's absurdly slow. We were doing this! BoxTouched calls bounds_of_expr_in_scope lots of times on small index Exprs within the same very large scope. It's better to just make Interval::is_single_point() check deep equality. This speeds up local laplacian lowering by 1.1x, and resnet50 lowering by 1.5x. There were also places where intervals that were a single point were diverging due to carelessly written code. E.g. the interval [40*8, 40*8], where both of those 40*8s are the same Mul node, was being simplified like this: interval.min = simplify(interval.min); interval.max = simplify(interval.max); Not only does this do double the simplification work it should, but it also caused something that was a single point to diverge into not being a single point, because the repeated constant-folding creates a new Expr. With the new is_single_point this matters a lot less, but even so, I centralized simplification of intervals into a single helper that doesn't do the pointless double-simplification for single points. Some of these shallowly-unequal but deeply-equal Intervals were being created in bounds inference itself after the prepass, which may have been generating suboptimal bounds. This change should fix that in addition to the compile-time benefits. Also added a simplify call in SkipStages because I noticed when it processed specializations it was creating things like (condition) || (!condition). | 21 April 2024, 03:43:38 UTC |
31c52ab | Andrew Adams | 19 April 2024, 19:59:34 UTC | Faster substitute_facts (#8200) * Fix computational complexity of substitute_facts It was O(n) for n facts. This makes it O(log(n)) This was particularly bad for pipelines with lots of inputs or outputs, because those pipelines have lots of asserts, which make for lots of facts to substitute in. Speeds up lowering of local laplacian with 20 pyramid levels (which has only one input and one output) by 1.09x Speeds up lowering of the adams 2019 cost model training pipeline (lots of weight inputs and lots outputs due to derivatives) by 1.5x Speeds up resnet50 (tons of weight inputs) lowering by 7.3x! * Add missing switch breaks * Add missing comments * Elaborate on why we treat NaNs as equal | 19 April 2024, 19:59:34 UTC |
dd1d0e8 | aankit-quic | 19 April 2024, 17:33:44 UTC | [HEXAGON] Keep support for hexagon_remote/Makefile (#8186) Update hexagon_remote/Makefile | 19 April 2024, 17:33:44 UTC |
4e0b313 | Andrew Adams | 18 April 2024, 19:48:59 UTC | Rewrite IREquality to use a more compact stack instead of deep recursion (#8198) * Rewrite IREquality to use a more compact stack instead of deep recursion Deletes a bunch of code and speeds up lowering time of local laplacian with 20 pyramid levels by ~2.5% * clang-tidy * Fold in the version of equal in IRMatch.h/cpp * Add missing switch breaks * Add missing comments * Elaborate on why we treat NaNs as equal | 18 April 2024, 19:48:59 UTC |
7994e70 | Andrew Adams | 16 April 2024, 21:27:43 UTC | Fix corner case in if_then_else simplification (#8189) Co-authored-by: Steven Johnson <srj@google.com> | 16 April 2024, 21:27:43 UTC |
f4c7831 | Andrew Adams | 11 April 2024, 22:07:20 UTC | Don't print on parallel task entry/exit with -debug flag (#8185) Fixes #8184 | 11 April 2024, 22:07:20 UTC |
dc83707 | Steven Johnson | 11 April 2024, 18:04:42 UTC | Add .npy support to debug_to_file() (#8177) * Add .npy support to halide_image_io The .npy format is NumPy's native format for storing multidimensional arrays (aka tensors/buffers). Being able to load/save in this format makes it (potentially) a lot easier to interchange data with the Python ecosystem, as well as providing a file format that support floating-point data more robustly than any of the others that we current support. This adds load/save support for a useful subset: - We support the int/uint/float types common in Halide (except for f16/bf16 for now) - We don't support reading or writing files that are in `fortran_order` - We don't support any object/struct/etc files, only numeric primitives - We only support loading files that are in the host's endianness (typically little-endian) Note that at present this doesn't support f16 / bf16 formats, but that could likely be added with minimal difficulty. The tricky bit of this is that the reading code has to parse a (limited) Python dict in text form. Please review that part carefully. TODO: we could probably add this as an option for `debug_to_file()` without too much pain in a followup PR. * clang-tidy * clang-tidy * Address review comments * Allow for "keys" as well as 'keys' * Add .npy support to debug_to_file() Built on top of https://github.com/halide/Halide/pull/8175, this adds .npy as an option. This is actually pretty great because it's easy to do something like ``` ss = numpy.load("my_file.npy") print(ss) ``` in Python and get nicely-formatted output, which can sometimes be a lot easier for debugging that inserting lots of print() statements (see https://github.com/halide/Halide/issues/8176) Did a drive-by change to the correctness test to use this format instead of .mat. * Add float16 support * Add support for Float16 images in npy * Assume little-endian * Remove redundant halide_error() * naming convention * naming convention * Test both mat and npy * Don't call halide_error() * Use old-school parser * clang-tidy | 11 April 2024, 18:04:42 UTC |
8f3f6cf | Fabian Schuetze | 11 April 2024, 16:58:36 UTC | Update Hexagon Install Instructions (#8182) update Hexagon install instructions | 11 April 2024, 16:58:36 UTC |