https://github.com/halide/Halide
- HEAD
- refs/heads/Halide_unsharp
- refs/heads/abadams-patch-1
- refs/heads/abadams/aggressive_is_single_point
- refs/heads/abadams/aggressive_unify_duplicate_lets
- refs/heads/abadams/align_strided_const_loads
- refs/heads/abadams/alloca
- refs/heads/abadams/atomic_parallel_compiled_in
- refs/heads/abadams/atomic_vector_non_recursive
- refs/heads/abadams/averaging_tree
- refs/heads/abadams/avoid_name_mangling_in_cross_module_dependencies
- refs/heads/abadams/better_absd
- refs/heads/abadams/better_codegen_for_non_const_ramps
- refs/heads/abadams/bgu_cholesky
- refs/heads/abadams/braces_around_statements
- refs/heads/abadams/cache_tighten_producer_consumer_nodes
- refs/heads/abadams/check_reorder_dups
- refs/heads/abadams/clarify_broadcast_shuffle
- refs/heads/abadams/compositing_app
- refs/heads/abadams/cond_wait_spin
- refs/heads/abadams/constant_interval_simplifier
- refs/heads/abadams/cse_in_unroll_split_tuples
- refs/heads/abadams/custom_cuda_context
- refs/heads/abadams/custom_cuda_context_2
- refs/heads/abadams/custom_cuda_context_3
- refs/heads/abadams/d3d12abi
- refs/heads/abadams/deflake_mullapudi_reorder
- refs/heads/abadams/delete_division_rule_that_makes_large_constants
- refs/heads/abadams/delete_prepare_for_early_exit
- refs/heads/abadams/depthwise_separable_conv
- refs/heads/abadams/diagnose_boundary_condition_failure
- refs/heads/abadams/disable_onnx_app_on_mac
- refs/heads/abadams/divide_using_pavgw
- refs/heads/abadams/dont_link_to_cudart
- refs/heads/abadams/dont_reinterpret_concat
- refs/heads/abadams/early_out
- refs/heads/abadams/enable_f16c
- refs/heads/abadams/extract_concat_bits
- refs/heads/abadams/fast_integer_divide_round_to_zero
- refs/heads/abadams/faster_runtime_integer_division
- refs/heads/abadams/faster_substitute_facts
- refs/heads/abadams/faster_unroll
- refs/heads/abadams/faster_vars_used_in_simplify_let
- refs/heads/abadams/fix-arm-seg2
- refs/heads/abadams/fix_4211
- refs/heads/abadams/fix_5323
- refs/heads/abadams/fix_5329
- refs/heads/abadams/fix_5889
- refs/heads/abadams/fix_6984
- refs/heads/abadams/fix_7229
- refs/heads/abadams/fix_7260
- refs/heads/abadams/fix_7365
- refs/heads/abadams/fix_7374
- refs/heads/abadams/fix_7504
- refs/heads/abadams/fix_7514
- refs/heads/abadams/fix_7531
- refs/heads/abadams/fix_7584
- refs/heads/abadams/fix_7584_v2
- refs/heads/abadams/fix_7742
- refs/heads/abadams/fix_7756
- refs/heads/abadams/fix_7761
- refs/heads/abadams/fix_7768
- refs/heads/abadams/fix_7786
- refs/heads/abadams/fix_7810
- refs/heads/abadams/fix_7811
- refs/heads/abadams/fix_7815
- refs/heads/abadams/fix_7867
- refs/heads/abadams/fix_7871
- refs/heads/abadams/fix_7872
- refs/heads/abadams/fix_7873
- refs/heads/abadams/fix_7888
- refs/heads/abadams/fix_7890
- refs/heads/abadams/fix_7891
- refs/heads/abadams/fix_7892
- refs/heads/abadams/fix_7893
- refs/heads/abadams/fix_7906
- refs/heads/abadams/fix_7909
- refs/heads/abadams/fix_7968
- refs/heads/abadams/fix_8038
- refs/heads/abadams/fix_8054
- refs/heads/abadams/fix_8170
- refs/heads/abadams/fix_8184
- refs/heads/abadams/fix_8280
- refs/heads/abadams/fix_8309
- refs/heads/abadams/fix_8312
- refs/heads/abadams/fix_arm_fcvtmp
- refs/heads/abadams/fix_associative_ops_saturating_add
- refs/heads/abadams/fix_autoschedule_feature_transposition
- refs/heads/abadams/fix_cse_name_collisions
- refs/heads/abadams/fix_cuda_mat_mul_assert
- refs/heads/abadams/fix_deinterleave_bug
- refs/heads/abadams/fix_deinterleave_for_reinterpret
- refs/heads/abadams/fix_div_round_to_zero
- refs/heads/abadams/fix_fft_compile_time_regression
- refs/heads/abadams/fix_generate_output_snippets
- refs/heads/abadams/fix_if_nesting_condition
- refs/heads/abadams/fix_leaks_in_memoize_test
- refs/heads/abadams/fix_lgtm_warnings
- refs/heads/abadams/fix_links_to_master
- refs/heads/abadams/fix_load_of_broadcast
- refs/heads/abadams/fix_onnx_app
- refs/heads/abadams/fix_pointless_lower_condition
- refs/heads/abadams/fix_potential_gpu_deadlock
- refs/heads/abadams/fix_realize_condition_depends_on_tuple
- refs/heads/abadams/fix_reduce_expr_modulo_of_vector
- refs/heads/abadams/fix_riscv_vx_vi
- refs/heads/abadams/fix_round
- refs/heads/abadams/fix_stencil_chain_gpu_schedule
- refs/heads/abadams/fix_track_bounds_intervals
- refs/heads/abadams/fix_tutorial_2
- refs/heads/abadams/fix_ub_in_lower_rounding_shift_right
- refs/heads/abadams/forward_partition_methods
- refs/heads/abadams/fully_fused_depthwise_separable_conv
- refs/heads/abadams/future_clang_tidy_fixes
- refs/heads/abadams/fuzz_sliding_window
- refs/heads/abadams/gaussian_blur_app
- refs/heads/abadams/generator_infinite_default_timeout
- refs/heads/abadams/gpu_autoscheduler_parallel_random_probes
- refs/heads/abadams/include_riscv_in_readme
- refs/heads/abadams/interleave_nested_vector
- refs/heads/abadams/ir_match_by_ref
- refs/heads/abadams/lerp_plus_cast
- refs/heads/abadams/local_laplacian_code_size
- refs/heads/abadams/lower_halving_sub
- refs/heads/abadams/lower_rounding_shift_right
- refs/heads/abadams/mac-arm-fixes
- refs/heads/abadams/make_fast_inverse_test_throughput_limited
- refs/heads/abadams/makefile_serialization_support
- refs/heads/abadams/mismatched_new_delete
- refs/heads/abadams/mixed_sign_mul_shift_right
- refs/heads/abadams/mixed_width_mul_shift_right
- refs/heads/abadams/multiple_scatter
- refs/heads/abadams/mux_intrinsic
- refs/heads/abadams/name_helpers
- refs/heads/abadams/narrow_predicates
- refs/heads/abadams/nested_vectorization_compile_time_regression_fix
- refs/heads/abadams/nested_vectorization_tweaks
- refs/heads/abadams/parallel_simd_op_check
- refs/heads/abadams/partially_backport_lossless_cast_fix
- refs/heads/abadams/precompute_shared_mem_size
- refs/heads/abadams/prefer_no_gather
- refs/heads/abadams/print_uncaught_exception
- refs/heads/abadams/promise_clamped_is_a_tag
- refs/heads/abadams/promote_fixed_point_intrinsics
- refs/heads/abadams/psabdw
- refs/heads/abadams/random_pipelines
- refs/heads/abadams/rationalize_gpu_for_loop_names
- refs/heads/abadams/reenable_unscheduled_stage_warning
- refs/heads/abadams/refactor_constant_interval
- refs/heads/abadams/reinterpret_vector
- refs/heads/abadams/remove_arch_os_for_shaders
- refs/heads/abadams/remove_bad_pruning
- refs/heads/abadams/remove_llvm_zstd
- refs/heads/abadams/remove_parameter_self_references
- refs/heads/abadams/remove_readnone_on_functions
- refs/heads/abadams/remove_use_of_python_config_in_onnx_makefile
- refs/heads/abadams/reschedule_bgu
- refs/heads/abadams/reschedule_bilateral_grid
- refs/heads/abadams/rewrite_atomic_pass
- refs/heads/abadams/rewrite_ir_equality
- refs/heads/abadams/rounding_shift_right_use_average
- refs/heads/abadams/rungenmain_error
- refs/heads/abadams/sampling_profiler_overhead_v2
- refs/heads/abadams/scope_improvements
- refs/heads/abadams/simpler_broadcasts
- refs/heads/abadams/simplify_correlated_pyramid
- refs/heads/abadams/simplify_sub_eval_in_lambda
- refs/heads/abadams/siotas_20
- refs/heads/abadams/sioutas_20
- refs/heads/abadams/slide_over_split_loop
- refs/heads/abadams/sorting_network_working_branch
- refs/heads/abadams/stable_topological_order
- refs/heads/abadams/string_view
- refs/heads/abadams/strip_asserts_last
- refs/heads/abadams/switch_stmt
- refs/heads/abadams/target_specific_lerp
- refs/heads/abadams/time_lowering_passes
- refs/heads/abadams/track_failedness_through_solver_lets
- refs/heads/abadams/turn_off_slp_vectorization_for_avx512
- refs/heads/abadams/tweak_unpack_buffers
- refs/heads/abadams/undo_pointless_widening
- refs/heads/abadams/unordered_blocks
- refs/heads/abadams/unsigned_demosaic
- refs/heads/abadams/update_makefile_for_llvm_19
- refs/heads/abadams/use_arm_for_runtime_triple
- refs/heads/abadams/use_pmaddubsw_for_downsample
- refs/heads/abadams/validate_gpu_schedules
- refs/heads/abadams/vector_reduce_hexagon_predicate
- refs/heads/abadams/vector_scan
- refs/heads/abadams/vst_type_fix
- refs/heads/abadams/widening_let_bug
- refs/heads/abadams/x86_avg
- refs/heads/abadams/zen4
- refs/heads/adadams/profile_allocator
- refs/heads/add_image_checks_after_bounds_inference_plus_new_rules
- refs/heads/add_outermost_to_extern
- refs/heads/add_vectorization_to_search_space
- refs/heads/aelphy/broadcast_q8
- refs/heads/aelphy/f16x16
- refs/heads/aelphy/feature_cadence_changes
- refs/heads/aelphy/float_extracts
- refs/heads/aelphy/i32_sat_widening_shift_right
- refs/heads/aelphy/sqrt_f16
- refs/heads/aelphy/vector_loads_f16_f32
- refs/heads/alexreinking/pip-metadata
- refs/heads/align_loads_comment_fix
- refs/heads/alina-strided-store
- refs/heads/another_buffer_copy_fix
- refs/heads/arm_sve_redux
- refs/heads/ataei-block_asserts-codegen
- refs/heads/ataei-debug_info
- refs/heads/ataei-fix-pow
- refs/heads/ataei-gen_str_param
- refs/heads/ataei-implicit_lhs_vars
- refs/heads/ataei-onnx
- refs/heads/ataei-onnx_converter_update
- refs/heads/ataei-onnx_pybind
- refs/heads/ataei-resnet50_benchmarks
- refs/heads/ataei-standalone_autoscheduler
- refs/heads/ataei_lots_of_inputs
- refs/heads/auto_sched_benchmarks
- refs/heads/auto_sched_estimates
- refs/heads/auto_sched_inline
- refs/heads/auto_sched_test_notparallel
- refs/heads/autoschedule_top_down
- refs/heads/autoschedule_with_convnet
- refs/heads/autoscheduler_scalar_imageparam_fix
- refs/heads/backports/10.x
- refs/heads/backports/11.x
- refs/heads/backports/12.x
- refs/heads/backports/13.x
- refs/heads/balance_expressions
- refs/heads/bazel
- refs/heads/benchmarks
- refs/heads/blaze
- refs/heads/bounds_buffer_lets_fix
- refs/heads/bounds_correct_vs_bounds_loaded_reduced
- refs/heads/buffer_device_api_target
- refs/heads/bug_device_free
- refs/heads/bug_inline_unbounded
- refs/heads/build/bundle-static
- refs/heads/build/pip-packaging
- refs/heads/circ_buffer
- refs/heads/cmake-no-runtime-debug-symbols
- refs/heads/cmake_wasm_features
- refs/heads/compute_at_guard_with_if_goes_on_stack
- refs/heads/compute_with_at
- refs/heads/compute_with_check
- refs/heads/compute_with_excessive_bounds
- refs/heads/compute_with_inlined
- refs/heads/compute_with_remove_is_right_level
- refs/heads/cpack/nuget
- refs/heads/ctest/wrappers
- refs/heads/cuda-constant
- refs/heads/d3d12-allocation-cache
- refs/heads/deferred_cse_after_inlining
- refs/heads/destructor_calls_deinit
- refs/heads/dg/deserialize_unmapped_objects
- refs/heads/dg/fix_vulkan_codegen_bool_conversion
- refs/heads/dg/fix_vulkan_gpu_vars
- refs/heads/dg/vulkan_conform_api
- refs/heads/dg/vulkan_load_lib
- refs/heads/dg/vulkan_region_allocator_fixes
- refs/heads/dgerstmann/fix-vulkan-memory-config-init
- refs/heads/disable_acquire_release_test_vulkan
- refs/heads/distinct_wrapper_names
- refs/heads/dkg/6863_asan_fixes
- refs/heads/dkg/vulkan
- refs/heads/dpalermo_dmabuf
- refs/heads/dpalermo_dmabuf_libion
- refs/heads/dpalermo_hexagon_remote_202003
- refs/heads/dpalermo_sdk4_2_0_2
- refs/heads/ds/buffer-get-pure
- refs/heads/ds/opt-tile-size
- refs/heads/ds/tail-none
- refs/heads/ds/while
- refs/heads/dsharletg/bitwise-intrinsics
- refs/heads/dsharletg/find-vector-reduce
- refs/heads/dsharletg/jit-optimization
- refs/heads/dsharletg/memcpy-copy_from
- refs/heads/dsharletg/pattern-headroom
- refs/heads/dsharletg/refactor-host-alignment
- refs/heads/dsharletg/runtime-size
- refs/heads/dsharletg/simplify-abs
- refs/heads/dsharletg/simplify-type-bounds
- refs/heads/dsharletg/specialize-bounds
- refs/heads/dsharletg/upsample-channels
- refs/heads/empty_prefetch
- refs/heads/emscripten_vector_fix
- refs/heads/export_all-wsmoses
- refs/heads/expr_auto_sched
- refs/heads/extern_bugs
- refs/heads/extern_host_alloc
- refs/heads/factor_parallel_codegen_hack
- refs/heads/fast_sync_tsan
- refs/heads/faster_integer_division
- refs/heads/feature/apps-external
- refs/heads/feature/cmake-presets
- refs/heads/feature/convert
- refs/heads/feature/f16_interleave
- refs/heads/feature/gather_load_q7
- refs/heads/feature/gather_load_undefined_ramp
- refs/heads/feature/llvm-codemodel
- refs/heads/feature/load_predicated
- refs/heads/feature/luma_regression
- refs/heads/feature/maintanence
- refs/heads/feature/reinterprets
- refs/heads/feature/tcm_bump_allocator
- refs/heads/feature/xtensa_fix_interleave_q8
- refs/heads/feature/xtensa_q8_tests
- refs/heads/find_intrinsics_issue
- refs/heads/find_intrinsics_widening_lets
- refs/heads/fix-7854
- refs/heads/fix-floated-pure-stage
- refs/heads/fix-race-condition
- refs/heads/fix_hexagon_alignment
- refs/heads/fix_hvx_intrinsics
- refs/heads/fix_prefetch_test
- refs/heads/fix_windows_vs15_build
- refs/heads/fixed_length_vectors
- refs/heads/fixed_point_local_laplac
- refs/heads/gemmlowp
- refs/heads/generate
- refs/heads/gha/pip
- refs/heads/gpu_canon_fix
- refs/heads/halide_extended_exp
- refs/heads/halide_ir_flatbuffer
- refs/heads/hex_dma2_async
- refs/heads/hexagon_le_runtime
- refs/heads/hexagon_priority
- refs/heads/hexagon_setpriority
- refs/heads/hexagon_strided_pred_load
- refs/heads/hexagon_sysmon_markers
- refs/heads/imaging-synthesis
- refs/heads/includes_fix
- refs/heads/ios_fast_sync_fix
- refs/heads/jia-kai-fix-runtime-cuda-init
- refs/heads/kamil-openglcompute-infinity
- refs/heads/kamil/name_pthread_workers
- refs/heads/kp_bit_shift
- refs/heads/line_buffer
- refs/heads/loop_carry_not_working
- refs/heads/lower_on_huge_stack
- refs/heads/main
- refs/heads/memoize_with_extents
- refs/heads/metal_float16
- refs/heads/metaprogrammed_simplifier_mod
- refs/heads/mohamedadaly-vmlal
- refs/heads/more_powerful_sliding
- refs/heads/new_autoschedule_with_new_simplifier_arm_worker_branch
- refs/heads/new_autoscheduler
- refs/heads/new_simplifier_rule_testing
- refs/heads/newer_ion_ioctl
- refs/heads/no_bounds_query_when_bounds_used
- refs/heads/opengl_compute_buffer_types_fix
- refs/heads/openglcompute_reuse_shared_allocations
- refs/heads/optmize_reorder
- refs/heads/par_for_opt
- refs/heads/pdb/fix_7806
- refs/heads/pdb/hexagon_remote_cmake
- refs/heads/pdb_add_libcpp_makefile_inc
- refs/heads/pdb_eliminate_interleaves_test
- refs/heads/pdb_fix_clang_build
- refs/heads/pdb_fix_install_qc
- refs/heads/pdb_fix_loop_carry
- refs/heads/pdb_fix_simd_op_check_hvx
- refs/heads/pdb_mul_div_mod_multi_thread
- refs/heads/pdb_remove_hvx_v64
- refs/heads/perform_inline_with_order
- refs/heads/performance_linters
- refs/heads/pr/2572
- refs/heads/pr/2676
- refs/heads/pr/2975
- refs/heads/pr/3017
- refs/heads/pr/3081
- refs/heads/pr/3387
- refs/heads/pr/3939
- refs/heads/pr/3960
- refs/heads/pr/4380
- refs/heads/pr/4414
- refs/heads/pr/5331
- refs/heads/pr/5438
- refs/heads/pr/5455
- refs/heads/pr/5758_2
- refs/heads/predicated_vector
- refs/heads/prefetch_specialize
- refs/heads/print_schedule
- refs/heads/profile_hardware_counters
- refs/heads/random-pipelines
- refs/heads/rdom_with_pure_vars
- refs/heads/readme-fix-gcd
- refs/heads/realization_order
- refs/heads/refactor_module
- refs/heads/register_promotion
- refs/heads/release/10.x
- refs/heads/release/11.x
- refs/heads/release/12.x
- refs/heads/release/13.x
- refs/heads/release/14.x
- refs/heads/release/15.x
- refs/heads/release/16.x
- refs/heads/release/17.x
- refs/heads/release/18.x
- refs/heads/release/8.x
- refs/heads/remove_max_on_fuse_factor
- refs/heads/reorder_rvar
- refs/heads/reset_unique_counter
- refs/heads/revert-3612-ataei-speedup_compiletime
- refs/heads/revert-7009-rootjalex/distribute-w_shl
- refs/heads/revert-7601-compile_hexagon_remote
- refs/heads/riscv_update
- refs/heads/rl_simplifier_rules
- refs/heads/rootjalex/add_simpl_rules
- refs/heads/rootjalex/arm-optimize
- refs/heads/rootjalex/autoscheduler_mcts
- refs/heads/rootjalex/bounds-rewriter
- refs/heads/rootjalex/bounds_synthesis
- refs/heads/rootjalex/cbounds
- refs/heads/rootjalex/cbounds_predicated
- refs/heads/rootjalex/fix-reinterpret-cmp
- refs/heads/rootjalex/fix-sat-overflow
- refs/heads/rootjalex/fix_estimate_issue
- refs/heads/rootjalex/fix_failed_unrolls
- refs/heads/rootjalex/gsoc_codegen
- refs/heads/rootjalex/improve_cbounds_fixed
- refs/heads/rootjalex/improve_constant_bounds
- refs/heads/rootjalex/pitchfork-arm
- refs/heads/rootjalex/reinterpret-simplify
- refs/heads/rootjalex/rts
- refs/heads/rootjalex/super_simplify_bounds
- refs/heads/rootjalex/test_cbounds_fixed
- refs/heads/rootjalex/test_constant_bounds
- refs/heads/rootjalex/trs-codegen
- refs/heads/rootjalex/trs-codegen-cross
- refs/heads/rootjalex/trs-merge
- refs/heads/rootjalex/uint32-int32-cast
- refs/heads/rootjalex/x86-hadds
- refs/heads/rootjalex/x86-optimize
- refs/heads/rootjalex/x86-optimize-test
- refs/heads/rootjalex/x86-test
- refs/heads/rule_removal_experiments
- refs/heads/schedule-output-storage
- refs/heads/separate_bounds_query_entrypoint
- refs/heads/shallow
- refs/heads/shift_amount_type_change
- refs/heads/shoaibkamil/cmake-without-arm
- refs/heads/shoaibkamil/correct_memory_fences
- refs/heads/shoaibkamil/d3d-fixes
- refs/heads/shoaibkamil/deprecate_openglcompute
- refs/heads/shoaibkamil/json
- refs/heads/shoaibkamil/llvm_clone_tag
- refs/heads/shoaibkamil/minor-vcpkg-doc-change
- refs/heads/shoaibkamil/opengl_compute_tests
- refs/heads/shoaibkamil/performance_tests_as_generators
- refs/heads/shoaibkamil/rule_removal_experiments
- refs/heads/shoaibkamil/super_simplify_with_interpreter
- refs/heads/shoaibkamil/windows-arm-fix-attributes
- refs/heads/sim_shlib_addr_print
- refs/heads/simplify-nested-broadcasts
- refs/heads/simplify-vectorreduce-shuffles2
- refs/heads/simplify_mod
- refs/heads/sioutas_2020
- refs/heads/sioutas_2020_autoscheduler
- refs/heads/slomp/gpu-codegen-profiling
- refs/heads/slomp/msvc-static-analysis
- refs/heads/solve_div
- refs/heads/solve_div_simplifier_test
- refs/heads/sr/python-late-binding-defaults
- refs/heads/srj-aaa
- refs/heads/srj-alloc
- refs/heads/srj-alloca
- refs/heads/srj-appmake2
- refs/heads/srj-aslog
- refs/heads/srj-assert
- refs/heads/srj-assoc
- refs/heads/srj-auto-multi
- refs/heads/srj-auto-multi2
- refs/heads/srj-auto_schedule_mat_mul
- refs/heads/srj-autosched
- refs/heads/srj-b2cpphide
- refs/heads/srj-barr
- refs/heads/srj-bits
- refs/heads/srj-blacklist
- refs/heads/srj-bounds
- refs/heads/srj-bufcalltype
- refs/heads/srj-bufcallwrap
- refs/heads/srj-bufcallwrap2
- refs/heads/srj-buffer
- refs/heads/srj-bv
- refs/heads/srj-classic-autotune
- refs/heads/srj-clean
- refs/heads/srj-constcall
- refs/heads/srj-crosscompile
- refs/heads/srj-ctlz
- refs/heads/srj-cvec-patch
- refs/heads/srj-dag
- refs/heads/srj-debug-to-file
- refs/heads/srj-deir
- refs/heads/srj-f16
- refs/heads/srj-fp16
- refs/heads/srj-fsch
- refs/heads/srj-fthru
- refs/heads/srj-g2
- refs/heads/srj-g3
- refs/heads/srj-gha-test-fixes
- refs/heads/srj-hidden
- refs/heads/srj-hide2
- refs/heads/srj-hvx
- refs/heads/srj-hvx-bug
- refs/heads/srj-hvx-codegen-bug
- refs/heads/srj-hvx-nocopy
- refs/heads/srj-hvxshift
- refs/heads/srj-iib
- refs/heads/srj-initshape
- refs/heads/srj-inv
- refs/heads/srj-ir
- refs/heads/srj-irmut2
- refs/heads/srj-iwyu
- refs/heads/srj-iwyu3
- refs/heads/srj-javascript_work_in_progress
- refs/heads/srj-lensblur
- refs/heads/srj-lessinc
- refs/heads/srj-llvm-loop-opt
- refs/heads/srj-mak
- refs/heads/srj-maxthreads
- refs/heads/srj-mod
- refs/heads/srj-msan
- refs/heads/srj-msan-call
- refs/heads/srj-muldivmod
- refs/heads/srj-mut
- refs/heads/srj-outputs-2
- refs/heads/srj-parse
- refs/heads/srj-pch
- refs/heads/srj-printfunc
- refs/heads/srj-pygp
- refs/heads/srj-revertbits
- refs/heads/srj-schedule-storage
- refs/heads/srj-shl-shr-2
- refs/heads/srj-sio
- refs/heads/srj-static-const
- refs/heads/srj-strided-store
- refs/heads/srj-tidyh
- refs/heads/srj-tiff
- refs/heads/srj-trace
- refs/heads/srj-tutorial
- refs/heads/srj-using
- refs/heads/srj-wasmfix
- refs/heads/srj-xor2
- refs/heads/srj/abstract-gen-without-get-output-func-KEEP
- refs/heads/srj/aligned-alloc
- refs/heads/srj/aligned-alloc-2
- refs/heads/srj/aligned-malloc-with-aligned-alloc
- refs/heads/srj/all-explicit-ctor
- refs/heads/srj/anderson-thread-info-ptr
- refs/heads/srj/aot-perf
- refs/heads/srj/argv-signatures
- refs/heads/srj/argv-types
- refs/heads/srj/arm64e
- refs/heads/srj/async-test
- refs/heads/srj/atomic-32
- refs/heads/srj/b2cpp-const-data
- refs/heads/srj/better-xt-dispatch
- refs/heads/srj/bfloat1
- refs/heads/srj/bp
- refs/heads/srj/build_halide_h
- refs/heads/srj/c-bool
- refs/heads/srj/cache-clear
- refs/heads/srj/clang-fmt-ignore
- refs/heads/srj/clang-tidy
- refs/heads/srj/clear-c-cache
- refs/heads/srj/cmake-asan
- refs/heads/srj/cmake-asan2
- refs/heads/srj/cmake-jit-generators
- refs/heads/srj/configure-cmake
- refs/heads/srj/cpp-generator-v2-experiment-KEEP
- refs/heads/srj/crosscompile
- refs/heads/srj/csv
- refs/heads/srj/ctad
- refs/heads/srj/depr
- refs/heads/srj/deprecation
- refs/heads/srj/device-copy
- refs/heads/srj/example
- refs/heads/srj/experiment
- refs/heads/srj/experiment-6967
- refs/heads/srj/exporting
- refs/heads/srj/expr_t
- refs/heads/srj/external-tensors
- refs/heads/srj/f16-convert
- refs/heads/srj/fix-pytorch
- refs/heads/srj/fixed-rollback
- refs/heads/srj/fopen-fix
- refs/heads/srj/forward-name
- refs/heads/srj/gen-func
- refs/heads/srj/gen-func-2
- refs/heads/srj/gen-func-3
- refs/heads/srj/gen2-1
- refs/heads/srj/gen_closure
- refs/heads/srj/generator_aot_gpu_multi_context_threaded
- refs/heads/srj/globals
- refs/heads/srj/halide-buffer-crop
- refs/heads/srj/halide-malloc-alignment
- refs/heads/srj/halide-must-use
- refs/heads/srj/halide-runtime-must-use-result
- refs/heads/srj/hallmark
- refs/heads/srj/hang-repro
- refs/heads/srj/hannk
- refs/heads/srj/hannk-aliasing
- refs/heads/srj/hannk-error-checking
- refs/heads/srj/hannk-errors
- refs/heads/srj/hannk-inplace
- refs/heads/srj/hannk-mmap
- refs/heads/srj/hannk-tflite-27
- refs/heads/srj/hannk-verbosity
- refs/heads/srj/hdrs
- refs/heads/srj/html-becomes-viz
- refs/heads/srj/implicit-mult-widening
- refs/heads/srj/issue-7076
- refs/heads/srj/iwyu
- refs/heads/srj/iwyu-2
- refs/heads/srj/iwyu-6
- refs/heads/srj/libHANNK
- refs/heads/srj/linter
- refs/heads/srj/llvm_type_of
- refs/heads/srj/lossless-test
- refs/heads/srj/maybe-unused
- refs/heads/srj/meanop
- refs/heads/srj/metadata-calling-convention
- refs/heads/srj/more-tidy
- refs/heads/srj/msan-dtf
- refs/heads/srj/multimeta
- refs/heads/srj/nanobind
- refs/heads/srj/new-rt-1
- refs/heads/srj/no-threadpool
- refs/heads/srj/no-timeout-thread
- refs/heads/srj/oglc-mutexed
- refs/heads/srj/param-map
- refs/heads/srj/pip-15.x
- refs/heads/srj/pip-cron
- refs/heads/srj/possible-uninited
- refs/heads/srj/pr-7566
- refs/heads/srj/printer-size
- refs/heads/srj/profiler-data-race
- refs/heads/srj/ptr-int-cast
- refs/heads/srj/py-float32
- refs/heads/srj/pyapps
- refs/heads/srj/pyext-fix
- refs/heads/srj/pygen-class
- refs/heads/srj/pygen-deux
- refs/heads/srj/pygen-func
- refs/heads/srj/pygen-native-types
- refs/heads/srj/pyinstall
- refs/heads/srj/pystuff
- refs/heads/srj/python-buffer-unpack
- refs/heads/srj/python-tutorial
- refs/heads/srj/reshape
- refs/heads/srj/rt-error-smallify
- refs/heads/srj/rt-return-types
- refs/heads/srj/runtime-error-handling
- refs/heads/srj/sat-fixes-exp
- refs/heads/srj/sat-fixes-exp-2
- refs/heads/srj/shadow-field
- refs/heads/srj/snprintf
- refs/heads/srj/spirv-license
- refs/heads/srj/stat-buf-deprecations
- refs/heads/srj/static-buffer-generators
- refs/heads/srj/stmt-html
- refs/heads/srj/stringify
- refs/heads/srj/synth-gen-params
- refs/heads/srj/synth-params-python
- refs/heads/srj/test-arm_sve_redux
- refs/heads/srj/test-intrinsics-bounds
- refs/heads/srj/test8076
- refs/heads/srj/test8078
- refs/heads/srj/test8094
- refs/heads/srj/test8105a
- refs/heads/srj/test8115
- refs/heads/srj/test_tmpdir_fix
- refs/heads/srj/tidy
- refs/heads/srj/tidy-format-14
- refs/heads/srj/tidymore
- refs/heads/srj/tidymore2
- refs/heads/srj/tls
- refs/heads/srj/tls-3
- refs/heads/srj/tls-4
- refs/heads/srj/tls-ucon
- refs/heads/srj/tmp-unschedule-experiment
- refs/heads/srj/tot-fix
- refs/heads/srj/try-revert-sat
- refs/heads/srj/type-traits
- refs/heads/srj/typed-func
- refs/heads/srj/ucon-all-const
- refs/heads/srj/ucon-non-const
- refs/heads/srj/version
- refs/heads/srj/visit-warnings
- refs/heads/srj/wasm-atomic2
- refs/heads/srj/wasm-simd
- refs/heads/srj/wasm-stuff
- refs/heads/srj/wasm-threads
- refs/heads/srj/wasm-updates
- refs/heads/srj/wasm-work
- refs/heads/srj/wip
- refs/heads/srj/x-rounding
- refs/heads/srj/xbuf
- refs/heads/srj/xc+plus+size+tmp
- refs/heads/srj/xc-types
- refs/heads/srj/xt-uint-cast-test
- refs/heads/srj/xtensa-arch
- refs/heads/srj/xtensa-merge
- refs/heads/srj/xvc-experimetn
- refs/heads/srj/zlib-embed
- refs/heads/standalone_autoscheduler
- refs/heads/standalone_autoscheduler_arm_worker
- refs/heads/standalone_autoscheduler_arm_worker_amazon
- refs/heads/standalone_autoscheduler_gpu
- refs/heads/standalone_autoscheduler_hexagon
- refs/heads/sticky_task_assignments
- refs/heads/store_with
- refs/heads/store_with_solver_for_super_simplify
- refs/heads/strict_float_cse_fix
- refs/heads/super_simplify
- refs/heads/super_simplify_v2
- refs/heads/super_simplify_v3
- refs/heads/transitive_wrapper
- refs/heads/trigger-release-v16
- refs/heads/tzumao-autodiff-boundarycond
- refs/heads/tzumao-gradient-autoscheduler-bug
- refs/heads/tzumao-predicate-store-load
- refs/heads/tzumao-python-buffer
- refs/heads/tzumao_autodiff_unbounded
- refs/heads/tzumao_improve_gradient_autoscheduler
- refs/heads/tzumao_issue_4297
- refs/heads/tzumao_licm_before_BI
- refs/heads/unbounded_bugs
- refs/heads/undo_async_copy_chain_black_list
- refs/heads/use_string_literals_for_blobs
- refs/heads/users/lukas/python-pip
- refs/heads/validate_sched_error_msg
- refs/heads/var_ir_fix
- refs/heads/vksnk/async-experiment
- refs/heads/vksnk/async-multiple-producers
- refs/heads/vksnk/async-order
- refs/heads/vksnk/better-loop-carry
- refs/heads/vksnk/better-message
- refs/heads/vksnk/bound-storage
- refs/heads/vksnk/bounds-widen-right
- refs/heads/vksnk/c-print-type
- refs/heads/vksnk/c-round
- refs/heads/vksnk/check-return-result
- refs/heads/vksnk/compute-with-bug
- refs/heads/vksnk/compute_with_async
- refs/heads/vksnk/dma-limit-channels
- refs/heads/vksnk/dma-min-max
- refs/heads/vksnk/expr-match-shuffle
- refs/heads/vksnk/extract-from-scalar
- refs/heads/vksnk/f16-load
- refs/heads/vksnk/fix-packvr
- refs/heads/vksnk/fix_halide_xtensa_narrow_with_rounding_shift_i16
- refs/heads/vksnk/fused-compute-with
- refs/heads/vksnk/hoist-storage-bug
- refs/heads/vksnk/lerp-intrinsics
- refs/heads/vksnk/lower-signed-shifts
- refs/heads/vksnk/missing-exception
- refs/heads/vksnk/non-widening-halves
- refs/heads/vksnk/optimize-shuffles
- refs/heads/vksnk/replace-all
- refs/heads/vksnk/restrict
- refs/heads/vksnk/roll-buffer
- refs/heads/vksnk/roundeven-arm
- refs/heads/vksnk/rvar-bounds
- refs/heads/vksnk/simplify-slice
- refs/heads/vksnk/skip-semaphores
- refs/heads/vksnk/storage-folding
- refs/heads/vksnk/strided-load-of-4_2
- refs/heads/vksnk/typed-scope
- refs/heads/vksnk/update-simd-driver
- refs/heads/vksnk/vectorize-bug
- refs/heads/vksnk/vectorize-scalarize
- refs/heads/vksnk/widening_absd
- refs/heads/vksnk/xtensa-codegen-fp16
- refs/heads/vksnk/xtensa-dma-improvements
- refs/heads/vksnk/xtensa-regroup-pass
- refs/heads/vksnk/xtensa/lift-allocs
- refs/heads/vulkan
- refs/heads/vulkan-diagnose-alloc-failures
- refs/heads/vulkan-phase0-adts
- refs/heads/vulkan-phase1-spirv
- refs/heads/vulkan-phase2-runtime
- refs/heads/vulkan2
- refs/heads/vulkan_fix_gpu_dynamic_shared_test
- refs/heads/vulkan_fix_subregion_memory_offsets
- refs/heads/webassembly-old
- refs/heads/winograd
- refs/heads/wording_fix
- refs/heads/xtensa-codegen
- refs/heads/xtensa-codegen-parallel
- refs/heads/xuanda/fix-serialize-bad-partition-always
- refs/remotes/origin/rootjalex/add_autosched_caching
- refs/tags/release_2018_02_15
- refs/tags/release_2019_08_27
- refs/tags/release_8.0.0
- refs/tags/v10.0.0
- refs/tags/v10.0.1
- refs/tags/v11.0.0
- refs/tags/v11.0.1
- refs/tags/v12.0.0
- refs/tags/v12.0.1
- refs/tags/v13.0.0
- refs/tags/v13.0.1
- refs/tags/v13.0.2
- refs/tags/v13.0.3
- refs/tags/v13.0.4
- refs/tags/v14.0.0
- refs/tags/v15.0.0
- refs/tags/v15.0.1
- refs/tags/v16.0.0
- refs/tags/v17.0.0
- refs/tags/v17.0.1
- refs/tags/v17.0.2
- refs/tags/v18.0.0
- refs/tags/v8.0.0
Take a new snapshot of a software origin
If the archived software origin currently browsed is not synchronized with its upstream version (for instance when new commits have been issued), you can explicitly request Software Heritage to take a new snapshot of it.
Use the form below to proceed. Once a request has been submitted and accepted, it will be processed as soon as possible. You can then check its processing state by visiting this dedicated page.Processing "take a new snapshot" request ...
Permalinks
To reference or cite the objects present in the Software Heritage archive, permalinks based on SoftWare Hash IDentifiers (SWHIDs) must be used.
Select below a type of object currently browsed in order to display its associated SWHID and permalink.
Revision | Author | Date | Message | Commit Date |
---|---|---|---|---|
f65053a | Steven Johnson | 25 January 2022, 01:13:08 UTC | Merge branch 'srj/static-buffer-dimensions' into srj/xc+plus+size+tmp | 25 January 2022, 01:13:08 UTC |
fbe3dd9 | Steven Johnson | 25 January 2022, 01:12:35 UTC | Merge branch 'master' into xtensa-codegen | 25 January 2022, 01:12:35 UTC |
b2fe5cc | Steven Johnson | 25 January 2022, 01:12:26 UTC | Merge branch 'master' into srj/static-buffer-dimensions | 25 January 2022, 01:12:26 UTC |
eff874a | Andrew Adams | 20 January 2022, 16:41:42 UTC | Remove incorrect not-multiple-of-16 claim | 22 January 2022, 22:34:32 UTC |
6a09503 | Steven Johnson | 21 January 2022, 23:21:25 UTC | Merge branch 'master' into xtensa-codegen | 21 January 2022, 23:21:25 UTC |
6a63609 | Steven Johnson | 21 January 2022, 23:20:41 UTC | Allow calling scheduling methods on Output<Buffer[]> (#6577) The constness of the inherited operator[] meant you couldn't call scheduling methods on array-of-output-buffer in a Generator. Fixed and added a test case. | 21 January 2022, 23:20:41 UTC |
954e740 | Steven Johnson | 21 January 2022, 22:48:33 UTC | DynamicDims -> BufferDimsUnconstrained | 21 January 2022, 22:48:33 UTC |
f1f5c53 | Steven Johnson | 21 January 2022, 22:08:43 UTC | Fix for top-of-tree LLVM (#6579) | 21 January 2022, 22:08:43 UTC |
67e75af | Steven Johnson | 20 January 2022, 23:44:01 UTC | Update Generator.h | 20 January 2022, 23:44:01 UTC |
1e7a5ef | Steven Johnson | 20 January 2022, 23:31:37 UTC | Update Generator.h | 20 January 2022, 23:31:37 UTC |
c016a10 | Steven Johnson | 20 January 2022, 23:01:36 UTC | Update metadata_tester_generator.cpp | 20 January 2022, 23:01:36 UTC |
279ecd6 | Steven Johnson | 20 January 2022, 22:57:41 UTC | wip | 20 January 2022, 22:57:41 UTC |
5fba150 | Steven Johnson | 20 January 2022, 18:04:36 UTC | Add `explicit` to a handful of Generator-related ctors. (#6569) * Update Generator.h * Update generators using now-forbidden ctors | 20 January 2022, 18:04:36 UTC |
1d46a28 | Steven Johnson | 20 January 2022, 00:27:14 UTC | Fix typo in comment in HalideBuffer.h (#6570) | 20 January 2022, 00:27:14 UTC |
9af74e7 | Volodymyr Kysenko | 19 January 2022, 22:39:59 UTC | Merge branch 'xtensa-codegen' of https://github.com/halide/Halide into xtensa-codegen Change-Id: I66d73eabd0fca1b3f1e55b35f8757bb436d9ec70 | 19 January 2022, 22:39:59 UTC |
40d7b7c | Volodymyr Kysenko | 19 January 2022, 22:33:07 UTC | Changes: * rounding_shift_right for i32 * interleave for wider types * remove buffer size check Change-Id: If08e6f2925462b6fa73d24377aece9d87392a55e | 19 January 2022, 22:33:07 UTC |
b32e035 | Volodymyr Kysenko | 19 January 2022, 22:26:32 UTC | Handle strict_float in InjectDmaTransfer Change-Id: Iff4a52d63e7067bfb083bc10c4c27eee4d82c6ea | 19 January 2022, 22:26:32 UTC |
6cbcebd | Steven Johnson | 19 January 2022, 17:14:32 UTC | Merge branch 'master' into xtensa-codegen | 19 January 2022, 17:14:32 UTC |
030fe5a | Steven Johnson | 18 January 2022, 21:23:49 UTC | wasm simd cleanup (#6566) - f64x2.convert_low_i32x4_s/u is handled properly by LLVM 14, so remove the TODO stuff and leave the test special-cased for that version (don't bother trying to make it work in LLVM13) - our `float_to_double` glue is only needed for LLVM13, so update comments and ifdefs appropriately | 18 January 2022, 21:23:49 UTC |
77de08c | Masahiro Masuda | 18 January 2022, 17:43:25 UTC | [APP] Fix `hexagon_benchmarks` build (use two-var prefetch) (#6563) * Fix hexagon_benchmark build (use two-var prefetch) * tweak indent | 18 January 2022, 17:43:25 UTC |
cf4565b | Steven Johnson | 17 January 2022, 01:19:27 UTC | Revert "Fix for top-of-tree LLVM (#6561)" (#6564) This reverts commit a86e13deb304ae9980f941eff6adf964b013ac60. | 17 January 2022, 01:19:27 UTC |
1aa68fb | Steven Johnson | 16 January 2022, 20:58:36 UTC | Enable simd_op_check test for wasm i8x16.popcnt (#6562) LLVM does in fact generate this correctly, but only for 8x16 sized vectors | 16 January 2022, 20:58:36 UTC |
e581669 | Andrew Adams | 14 January 2022, 17:48:42 UTC | Make HALIDE_REGISTER_GENERATOR work with multiple template args (#6556) * Make HALIDE_REGISTER_GENERATOR work with multiple template args * Add missing generator to cmakefile * Can't put parens around this usage * Another attempt at stripping parens from a type Co-authored-by: Steven Johnson <srj@google.com> | 14 January 2022, 17:48:42 UTC |
a86e13d | Steven Johnson | 13 January 2022, 23:34:30 UTC | Fix for top-of-tree LLVM (#6561) * Fix for top-of-tree LLVM * Update LLVM_Runtime_Linker.cpp | 13 January 2022, 23:34:30 UTC |
b9e0e72 | Steven Johnson | 13 January 2022, 04:35:23 UTC | Use add_halide_generator() everywhere in apps/ (#6554) * Use add_halide_generator() everywhere in apps/ Existing CMake code uses add_executable + target_link_libraries to create a Generator, but the somewhat-recently-added add_halide_generator() helper is a cleaner way to do this. Move to using it everywhere in apps/ instead. (Note that add_halide_generator()) doesn't yet work for in-tree builds, unfortunately.) * fixes * fully-qualify names * Update CMakeLists.txt * Fully Qualify * LINK_LIBRARIES * make add_halide_library() do some guessing * Update HalideGeneratorHelpers.cmake | 13 January 2022, 04:35:23 UTC |
b811f09 | Steven Johnson | 12 January 2022, 02:16:43 UTC | Fix deprecation warnings in Python tutorials (#6552) | 12 January 2022, 02:16:43 UTC |
5807c57 | Steven Johnson | 11 January 2022, 00:26:42 UTC | Merge branch 'master' into xtensa-codegen | 11 January 2022, 00:26:42 UTC |
34cfde3 | Steven Johnson | 10 January 2022, 19:44:47 UTC | Fixes for top-of-tree LLVM (#6546) (#6548) | 10 January 2022, 19:44:47 UTC |
bc5b557 | Andrew Adams | 09 January 2022, 02:55:57 UTC | Better default lowering of absd (#6545) * Better default lowering of absd This produces 3 instructions on x86, whereas the previous version produced 5. * Switch to an x86-only solution * Fixes * Another fix. Add reference. * typo | 09 January 2022, 02:55:57 UTC |
4ec870d | Dillon Sharlet | 07 January 2022, 22:03:34 UTC | Fix description of rounding_shift_left/rounding_shift_right (#6549) | 07 January 2022, 22:03:34 UTC |
3732f1a | Steven Johnson | 07 January 2022, 21:39:50 UTC | Merge branch 'master' into xtensa-codegen | 07 January 2022, 21:39:50 UTC |
b8a711a | Andrew Adams | 07 January 2022, 18:57:42 UTC | Attempted redo of faster noise (#6539) * Make random 2x faster by putting the innermost var last * Improve period of low bits of random noise * Add new rewrite rules for quadratics By pulling constant additions outside of quadratics, we can shave off a few add instructions in the inner loop for random number generation, which uses a quadratic modulo 2^32 I also removed the !overflows predicates, because rules already fail to match if a fold overflows. New rules formally verified. * Make expensive_zero actually always zero * Partially revert new simplifier rules * Enable new rules * Revert test Co-authored-by: Steven Johnson <srj@google.com> | 07 January 2022, 18:57:42 UTC |
c0cffc9 | Steven Johnson | 07 January 2022, 00:59:23 UTC | Add simd_op_check_xtensa.cpp to CMakeLists.txt | 07 January 2022, 00:59:23 UTC |
d9bef35 | Steven Johnson | 07 January 2022, 00:57:22 UTC | Run clang-tidy and clang-format on xtensa_codegen branch | 07 January 2022, 00:57:22 UTC |
5a776b3 | Steven Johnson | 07 January 2022, 00:34:32 UTC | Merge branch 'master' into xtensa-codegen | 07 January 2022, 00:34:32 UTC |
6a9dfde | Volodymyr Kysenko | 07 January 2022, 00:34:15 UTC | Alternative implementation of halide_xtensa_narrow_with_rounding_shift_i16 (#6547) Change-Id: I2c3f8a40c5279ec09bcc18b077b9badd7ee253fe | 07 January 2022, 00:34:15 UTC |
daa5b7c | Steven Johnson | 06 January 2022, 18:05:15 UTC | Convert apps/hannk/Elementwise to use generate() (#6543) It's the only Generator in Halide that uses build() instead of generate() (aside from tests designed specifically to test build()); convert it to keep things nice and consistent. | 06 January 2022, 18:05:15 UTC |
f694314 | Volodymyr Kysenko | 06 January 2022, 17:45:18 UTC | Merge branch 'xtensa-codegen' of https://github.com/halide/Halide into xtensa-codegen Change-Id: I54c3f9ff3ca3eec9b4f294c6833645a1ca3e14b3 | 06 January 2022, 17:45:18 UTC |
8688394 | Volodymyr Kysenko | 06 January 2022, 17:44:37 UTC | Disable halide_xtensa_narrow_with_rounding_shift_i16 due to (likely) a compiler bug Change-Id: Ib88961d8d08332e34ee69cd136eff9647387a965 | 06 January 2022, 17:44:37 UTC |
7484f22 | Steven Johnson | 06 January 2022, 01:32:14 UTC | Update CodeGen_Xtensa.cpp | 06 January 2022, 01:32:14 UTC |
1212efb | Steven Johnson | 06 January 2022, 00:46:45 UTC | Avoid unused-var warning/error | 06 January 2022, 00:46:45 UTC |
b244a83 | Steven Johnson | 06 January 2022, 00:13:02 UTC | Merge branch 'master' into xtensa-codegen | 06 January 2022, 00:13:02 UTC |
6f7d5ce | Steven Johnson | 06 January 2022, 00:12:51 UTC | Revert "Make it possible to interpret a wide type as multiple smaller elements (#6506)" (#6541) This reverts commit 1b180a8e93339aac2d19db57d2ef99b67253a0bc. | 06 January 2022, 00:12:51 UTC |
3f4feb6 | Steven Johnson | 05 January 2022, 21:46:37 UTC | Merge branch 'master' into xtensa-codegen | 05 January 2022, 21:46:37 UTC |
935c05e | Steven Johnson | 05 January 2022, 21:46:06 UTC | Fix GeneratorOutput_Buffer::set_estimates() (#6540) The existing wrapper wouldn't work for Outputs that have Tuple-valued elemets. | 05 January 2022, 21:46:06 UTC |
95737be | Alex Reinking | 05 January 2022, 21:45:18 UTC | Update CMake documentation (#6535) * Allow third parties to externally override SOVERSION Debian and other third-party packagers might need or want to patch our sources for a variety of reasons. In those cases, they might also need to override the SOVERSION. See here for a practical example: https://salsa.debian.org/pkg-llvm-team/halide/-/blob/f881de70cd83095053e13047b63f61faf6bc7a36/debian/patches/0006-Fixup-libhalide-version-soversion-for-debian-package.patch * Update CMake documentation. * Fix typo * Add link to ToC | 05 January 2022, 21:45:18 UTC |
7bb8198 | Alex Reinking | 05 January 2022, 21:44:36 UTC | Allow third parties to externally override SOVERSION (#6534) Debian and other third-party packagers might need or want to patch our sources for a variety of reasons. In those cases, they might also need to override the SOVERSION. See here for a practical example: https://salsa.debian.org/pkg-llvm-team/halide/-/blob/f881de70cd83095053e13047b63f61faf6bc7a36/debian/patches/0006-Fixup-libhalide-version-soversion-for-debian-package.patch | 05 January 2022, 21:44:36 UTC |
4960a00 | Volodymyr Kysenko | 05 January 2022, 21:38:29 UTC | make format Change-Id: I3aa5d16074b5cf64c6bad76a3168848e92d1ee53 | 05 January 2022, 21:38:29 UTC |
fee1abb | Volodymyr Kysenko | 05 January 2022, 21:33:02 UTC | Add reinterpret to the list of ops which don't need slicing Change-Id: I315fc4f9af3e6e1398fdd0b1bec3be6f82123679 | 05 January 2022, 21:33:02 UTC |
f8459da | Steven Johnson | 05 January 2022, 19:10:34 UTC | Remove unnecessary `std::move` calls (#6537) Compilers with `-Werror` will fail with `error: moving a temporary object prevents copy elision` | 05 January 2022, 19:10:34 UTC |
50edb64 | Steven Johnson | 05 January 2022, 19:10:03 UTC | Revert "Make random faster by putting the innermost var last (#6504)" (#6538) This reverts commit 00211656fd208c5e6eb28f943dbbe8c65b45622f. | 05 January 2022, 19:10:03 UTC |
8e4b09f | Volodymyr Kysenko | 05 January 2022, 05:33:30 UTC | Optimizations: * better type conversions * narrowing rounding right shift * remove debug pring from DMA initializer * 2x and 4x vector reduce patterns. Change-Id: I460233c765a95aebcd906da4c1b16db751d91bfc | 05 January 2022, 05:33:30 UTC |
7d2713a | Steven Johnson | 05 January 2022, 00:21:45 UTC | Add forwarding method & python wrapper for Func::dma() | 05 January 2022, 00:21:45 UTC |
60f9c32 | Steven Johnson | 05 January 2022, 00:10:13 UTC | Merge branch 'master' into xtensa-codegen | 05 January 2022, 00:10:13 UTC |
3a4e4c7 | Infinoid | 04 January 2022, 23:35:51 UTC | If cmake built a python module, teach cmake to install the python module. (#6523) | 04 January 2022, 23:35:51 UTC |
b8eb22d | Roman Lebedev | 04 January 2022, 21:56:22 UTC | Fix Python GIL lock handling (Fixes #6524, Fixes #5631) (#6525) * Fix Python GIL lock handling (Fixes #6524, Fixes #5631) As disscussed in https://github.com/halide/Halide/pull/6523#issuecomment-1003545664 and later in https://github.com/halide/Halide/issues/6524, pybind11 v2.8.1 added some defensive checks that fail for halide, namely in `python_tutorial_lesson_04_debugging_2` and `python_tutorial_lesson_05_scheduling_1`. https://github.com/halide/Halide/issues/6524#issuecomment-1003569810 notes: > * Python calls a Halide-JIT-generated function , which runs with the GIL held. > * Halide runtime spawns worker threads. > * The worker threads try to call pybind11's py::print function to emit traces. > * Pybind11 complains, correctly, that the worker thread doesn't hold the GIL. > > Trying to acquire the GIL hangs, because the main thread is still holding it. I tried teaching the main thread to release the GIL (as suggested in #5631), but I still saw hangs when I tried this. I have tried, and just dropping the lock before calling into halide, or just acquiring it in `halide_python_print` doesn't work, we need to do both. I have verified that the two tests fail without this fix, and pass with it. | 04 January 2022, 21:56:22 UTC |
bce2ef4 | Roman Lebedev | 04 January 2022, 21:06:29 UTC | Install Python tutorials (#6530) * Install Python tutorials I know we have previously discussed that `TYPE DOC` should be used, but unfortunately i'm not sure that will work here, because doc/tutorial directory is already occupied by C++ tutorials, and i don't think they should be mixed. I'm open to alternative suggestions. | 04 January 2022, 21:06:29 UTC |
0021165 | Andrew Adams | 04 January 2022, 16:40:23 UTC | Make random faster by putting the innermost var last (#6504) * Make random 2x faster by putting the innermost var last * Improve period of low bits of random noise * Add new rewrite rules for quadratics By pulling constant additions outside of quadratics, we can shave off a few add instructions in the inner loop for random number generation, which uses a quadratic modulo 2^32 I also removed the !overflows predicates, because rules already fail to match if a fold overflows. New rules formally verified. * Make expensive_zero actually always zero | 04 January 2022, 16:40:23 UTC |
f11d820 | Roman Lebedev | 04 January 2022, 16:34:58 UTC | Implement SanitizerCoverage support (Refs. #6513) (#6517) * Implement SanitizerCoverage support (Refs. #6513) Please refer to https://clang.llvm.org/docs/SanitizerCoverage.html TLDR: `ModuleSanitizerCoveragePass` instruments the IR by inserting calls to callbacks at certain constructs. What the callbacks should do is up to the implementation. They are effectively required for fuzzing to be effective, and are provided by e.g. libfuzzer. One huge caveat is `SanitizerCoverageOptions` which controls which which callbacks should actually be inserted. I just don't know what to do about it. Right now i have hardcoded the set that would have been enabled by `-fsanitize=fuzzer-no-link`, because the alternative, due to halide unflexibility, would be to introduce ~16 suboptions to control each one. * Simplify test * sancov test: avoid potential signedness warnings. * Rename all instances of sancov to sanitizecoverage * Adjust spelling of "SanitizerCoverage" in some places * Actually adjust the feature name in build system for the test * Hopefully fix Makefile build Co-authored-by: Steven Johnson <srj@google.com> | 04 January 2022, 16:34:58 UTC |
7eb9949 | Roman Lebedev | 04 January 2022, 16:32:52 UTC | [NFC-ish] Finish MSAN handling (#6516) Somehow, initially i missed that there was MSan support, so it might be good to actually mention that we don't need to run any MSan passes here, and that we didn't forget to run them. Secondly, it seems inconsistent not annotate the functions with `Attribute::SanitizeMemory`, like we do for others. I suppose it isn't strictly required, since they are used to actually drive the instrumentation passes, and we don't run MSan pass, but they are also used to disable some LLVM optimizations, and that //might// be important. Or not, but then i suppose there should be a comment about it? Co-authored-by: Steven Johnson <srj@google.com> | 04 January 2022, 16:32:52 UTC |
5c33902 | Andrew Adams | 04 January 2022, 16:08:43 UTC | free shape storage last (#6511) Some decref-triggered runtime methods need the shape Fixes #6509 Co-authored-by: Steven Johnson <srj@google.com> | 04 January 2022, 16:08:43 UTC |
0089de9 | Andrew Adams | 04 January 2022, 16:08:28 UTC | Handle mixed-width args to mul-shift-right (#6526) and codegen it to pmulhuw on x86 Co-authored-by: Steven Johnson <srj@google.com> | 04 January 2022, 16:08:28 UTC |
1b180a8 | Andrew Adams | 03 January 2022, 23:04:31 UTC | Make it possible to interpret a wide type as multiple smaller elements (#6506) * Make it possible to interpret a wide type as multiple smaller elements This is helpful for things like reinterpreting 32-bit packed rgba values as individual components for free. * clang-format | 03 January 2022, 23:04:31 UTC |
f9ea2d4 | Steven Johnson | 03 January 2022, 22:12:35 UTC | Fix use-after-free bug in SlidingWindow.cpp (#6527) | 03 January 2022, 22:12:35 UTC |
2651402 | Steven Johnson | 03 January 2022, 20:45:32 UTC | Fix simd-op-check for top-of-tree LLVM (#6529) * Fix simd-op-check for top-of-tree LLVM * Update simd_op_check.cpp | 03 January 2022, 20:45:32 UTC |
9a530b1 | Roman Lebedev | 29 December 2021, 22:58:10 UTC | Fix weird CMake issue with custom LLVM (#6519) Without this, cmake fails with: ``` CMake Error in dependencies/llvm/CMakeLists.txt: Target "Halide_LLVM" INTERFACE_INCLUDE_DIRECTORIES property contains path: "/repositories/halide/dependencies/llvm/" which is prefixed in the source directory. ``` `LLVM_INCLUDE_DIRS` there is `/repositories/llvm-project/llvm/include;/builddirs/llvm-project/build-Clang13/include`, and `INTERFACE_INCLUDE_DIRECTORIES`'s property beforehand is `` (empty), but after this line it suddenly becomes `/repositories/halide/dependencies/llvm/$<BUILD_INTERFACE:/repositories/llvm-project/llvm/include;/builddirs/llvm-project/build-Clang13/include>`. This is quite obscure. I don't really understand what is going on, but with the patch it builds fine. | 29 December 2021, 22:58:10 UTC |
6ed65ba | Roman Lebedev | 29 December 2021, 22:57:41 UTC | Mullapudi2016: don't hardcode the list of supported targets (#6520) As discussed in https://github.com/halide/Halide/issues/6518, this is a bit dubious, and e.g. prevents building on RISC-V, because there is no way to not build autoschedulers currently. | 29 December 2021, 22:57:41 UTC |
1d1f06a | Jin Yue | 23 December 2021, 15:02:14 UTC | Support new warp shuffle intrinsics after CUDA Volta architecture (#6505) * warp shuffle for volta. * Add a warp shuffle test. * Remove TODO because we have HoistWarpShuffles. * Fix test case position. * Pass target to lower_warp_shuffles. * format Co-authored-by: jinyue.jy <jinyue.jy@alibaba-inc.com> | 23 December 2021, 15:02:14 UTC |
e7f655b | Andrew Adams | 22 December 2021, 02:41:03 UTC | Fix a missing case in clamp_unsafe_accesses (#6508) * Fix a missing case in clamp_unsafe_accesses * Don't check func_value_bounds of images | 22 December 2021, 02:41:03 UTC |
b0f4681 | Roman Lebedev | 19 December 2021, 00:59:16 UTC | Try to fix riscv64 build (#6503) https://buildd.debian.org/status/fetch.php?pkg=halide&arch=riscv64&ver=13.0.2-1&stamp=1639833165&raw=0 ``` [1283/3260] /usr/bin/clang++-13 -DHALIDE_ENABLE_RTTI -DHALIDE_WITH_EXCEPTIONS -DHalide_EXPORTS -DLLVM_VERSION=130 -DWITH_AARCH64 -DWITH_AMDGPU -DWITH_ARM -DWITH_D3D12 -DWITH_HEXAGON -DWITH_INTROSPECTION -DWITH_METAL -DWITH_MIPS -DWITH_NVPTX -DWITH_OPENCL -DWITH_OPENGLCOMPUTE -DWITH_POWERPC -DWITH_RISCV -DWITH_WEBASSEMBLY -DWITH_X86 -D_GNU_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -I/usr/lib/llvm-13/include -g -O3 -DNDEBUG -fPIC -Wall -Wcast-qual -Wignored-qualifiers -Woverloaded-virtual -Winconsistent-missing-destructor-override -Winconsistent-missing-override -Wno-deprecated-declarations -Wno-double-promotion -Wno-float-conversion -Wno-float-equal -Wno-missing-field-initializers -Wno-old-style-cast -Wno-shadow -Wno-sign-conversion -Wno-switch-enum -Wno-undef -Wno-unused-function -Wno-unused-macros -Wno-unused-parameter -Wno-c++98-compat-pedantic -Wno-c++98-compat -Wno-cast-align -Wno-comma -Wno-covered-switch-default -Wno-documentation-unknown-command -Wno-documentation -Wno-exit-time-destructors -Wno-global-constructors -Wno-implicit-float-conversion -Wno-implicit-int-conversion -Wno-implicit-int-float-conversion -Wno-missing-prototypes -Wno-nonportable-system-include-path -Wno-reserved-id-macro -Wno-return-std-move-in-c++11 -Wno-shadow-field-in-constructor -Wno-shadow-field -Wno-shorten-64-to-32 -Wno-undefined-func-template -Wno-unused-member-function -Wno-unused-template -pthread -std=c++17 -MD -MT src/CMakeFiles/Halide.dir/Target.cpp.o -MF src/CMakeFiles/Halide.dir/Target.cpp.o.d -o src/CMakeFiles/Halide.dir/Target.cpp.o -c /<<PKGBUILDDIR>>/src/Target.cpp FAILED: src/CMakeFiles/Halide.dir/Target.cpp.o /usr/bin/clang++-13 -DHALIDE_ENABLE_RTTI -DHALIDE_WITH_EXCEPTIONS -DHalide_EXPORTS -DLLVM_VERSION=130 -DWITH_AARCH64 -DWITH_AMDGPU -DWITH_ARM -DWITH_D3D12 -DWITH_HEXAGON -DWITH_INTROSPECTION -DWITH_METAL -DWITH_MIPS -DWITH_NVPTX -DWITH_OPENCL -DWITH_OPENGLCOMPUTE -DWITH_POWERPC -DWITH_RISCV -DWITH_WEBASSEMBLY -DWITH_X86 -D_GNU_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -I/usr/lib/llvm-13/include -g -O3 -DNDEBUG -fPIC -Wall -Wcast-qual -Wignored-qualifiers -Woverloaded-virtual -Winconsistent-missing-destructor-override -Winconsistent-missing-override -Wno-deprecated-declarations -Wno-double-promotion -Wno-float-conversion -Wno-float-equal -Wno-missing-field-initializers -Wno-old-style-cast -Wno-shadow -Wno-sign-conversion -Wno-switch-enum -Wno-undef -Wno-unused-function -Wno-unused-macros -Wno-unused-parameter -Wno-c++98-compat-pedantic -Wno-c++98-compat -Wno-cast-align -Wno-comma -Wno-covered-switch-default -Wno-documentation-unknown-command -Wno-documentation -Wno-exit-time-destructors -Wno-global-constructors -Wno-implicit-float-conversion -Wno-implicit-int-conversion -Wno-implicit-int-float-conversion -Wno-missing-prototypes -Wno-nonportable-system-include-path -Wno-reserved-id-macro -Wno-return-std-move-in-c++11 -Wno-shadow-field-in-constructor -Wno-shadow-field -Wno-shorten-64-to-32 -Wno-undefined-func-template -Wno-unused-member-function -Wno-unused-template -pthread -std=c++17 -MD -MT src/CMakeFiles/Halide.dir/Target.cpp.o -MF src/CMakeFiles/Halide.dir/Target.cpp.o.d -o src/CMakeFiles/Halide.dir/Target.cpp.o -c /<<PKGBUILDDIR>>/src/Target.cpp warning: unknown warning option '-Wno-return-std-move-in-c++11' [-Wunknown-warning-option] /<<PKGBUILDDIR>>/src/Target.cpp:114:5: error: use of undeclared identifier 'cpuid' cpuid(info, 1, 0); ^ /<<PKGBUILDDIR>>/src/Target.cpp:148:9: error: use of undeclared identifier 'cpuid' cpuid(info2, 7, 0); ^ /<<PKGBUILDDIR>>/src/Target.cpp:181:17: error: use of undeclared identifier 'cpuid' cpuid(info3, 7, 1); ^ 1 warning and 3 errors generated. ``` ... which doesn't make sense because that code is supposed to only compile for X86. But that is because RISCV header guard is wrong, https://github.com/riscv-non-isa/riscv-toolchain-conventions says: ``` C/C++ preprocessor definitions * __riscv: defined for any RISC-V target. Older versions of the GCC toolchain defined __riscv__. ``` | 19 December 2021, 00:59:16 UTC |
1d86751 | Steven Johnson | 16 December 2021, 19:30:06 UTC | Grab Bag of minor cleanups to LowerParallelTasks (#6498) * Grab Bag of minor cleanups to LowerParallelTasks Basically OCD code stuff I noted down when debugging the issues, this restructures the inner loop to avoid calling a local function that has non-obvious side effects (setting just the right slot in the closure args), as well as consolidating via helper functions, hoisting common stuff used in both paths, using std::move where seemingly appropriate, adding some (hopefully correct) comments about arg expectations, and other things that aren't likely to really move the needle in terms of Halide compile speed, but (hopefully) make the code a little bit more understandable after some time away. (There was a todo about "find a better place for generate_closure_ir()"; this PR eliminates it entirely, just inlining it into the caller, which I think is reasonable given thhe number of assumptions the caller has to make in the first place...) * Update LowerParallelTasks.cpp | 16 December 2021, 19:30:06 UTC |
dffae98 | Steven Johnson | 16 December 2021, 01:24:59 UTC | Update simd_op_check for arm64 upz1 code generation (#6499) (#6500) | 16 December 2021, 01:24:59 UTC |
084236c | Steven Johnson | 16 December 2021, 01:24:33 UTC | Fix size_t -> int conversion warning (#6501) | 16 December 2021, 01:24:33 UTC |
45e1809 | Steven Johnson | 15 December 2021, 20:41:39 UTC | Update WABT to 1.0.25 (#6497) * Update WABT to 1.0.25 (cannot land until https://github.com/WebAssembly/wabt/pull/1788 lands) * tickle buildbots | 15 December 2021, 20:41:39 UTC |
7e233f1 | Steven Johnson | 15 December 2021, 02:16:34 UTC | Update Codegen_Xtensa::print_assignment() from #6195 Handles need to be `auto *` for the previous PR to work properly. (This should be refactored more intelligently to reduce code reuse; this is just a quick-fix to unbreak.) | 15 December 2021, 02:16:34 UTC |
d518030 | Steven Johnson | 14 December 2021, 21:39:30 UTC | Update XtensaOptimize.cpp | 14 December 2021, 21:39:30 UTC |
79724a5 | Steven Johnson | 14 December 2021, 21:31:36 UTC | Merge branch 'master' into xtensa-codegen | 14 December 2021, 21:31:36 UTC |
c3ff4d2 | Steven Johnson | 14 December 2021, 00:32:31 UTC | Restore support for using V8 as the Wasm JIT interpreter (#6478) * Support using V8 as the Wasm JIT interpreter This is a partial revert of https://github.com/halide/Halide/pull/5097. It brings back a bunch of the code in WasmExecutor to set up and use V8 to run Wasm code. All of the code is copy-pasted. There are some small cleanups to move common code (like BDMalloc, structs, asserts) to a common area guarded by `if WITH_WABT || WITH_V8`. Enabling V8 requires setting 2 CMake options: - V8_INCLUDE_PATH - V8_LIB_PATH The first is a path to v8 include folder, to find headers, the second is the monolithic v8 library. This is because it's pretty difficult to build v8, and there are various flags you can set. Comments around those options provide some instructions for building v8. By default, we still use the wabt for running Wasm code, but we can use V8 by setting WITH_WABT=OFF WITH_V8=ON. Maybe in the future, with more testing, we can flip this. Right now this requires a locally patched build of V8 due to https://crbug.com/v8/10461, but once that is resolved, the version of V8 that includes the fix will be fine. Also enable a single test, block_transpose, to run on V8, with these results: $ HL_JIT_TARGET=wasm-32-wasmrt-wasm_simd128 \ ./test/performance/performance_block_transpose Dummy Func version: Scalar transpose bandwidth 3.45061e+08 byte/s. Wrapper version: Scalar transpose bandwidth 3.38931e+08 byte/s. Dummy Func version: Transpose vectorized in y bandwidth 6.74143e+08 byte/s. Wrapper version: Transpose vectorized in y bandwidth 3.54331e+08 byte/s. Dummy Func version: Transpose vectorized in x bandwidth 3.50053e+08 byte/s. Wrapper version: Transpose vectorized in x bandwidth 6.73421e+08 byte/s. Success! For comparison, when targeting host: $ ./test/performance/performance_block_transpose Dummy Func version: Scalar transpose bandwidth 1.33689e+09 byte/s. Wrapper version: Scalar transpose bandwidth 1.33583e+09 byte/s. Dummy Func version: Transpose vectorized in y bandwidth 2.20278e+09 byte/s. Wrapper version: Transpose vectorized in y bandwidth 1.45959e+09 byte/s. Dummy Func version: Transpose vectorized in x bandwidth 1.45921e+09 byte/s. Wrapper version: Transpose vectorized in x bandwidth 2.21746e+09 byte/s. Success! For comparison, running with wabt: Dummy Func version: Scalar transpose bandwidth 828715 byte/s. Wrapper version: Scalar transpose bandwidth 826204 byte/s. Dummy Func version: Transpose vectorized in y bandwidth 1.12008e+06 byte/s. Wrapper version: Transpose vectorized in y bandwidth 874958 byte/s. Dummy Func version: Transpose vectorized in x bandwidth 879031 byte/s. Wrapper version: Transpose vectorized in x bandwidth 1.10525e+06 byte/s. Success! * Add instructions to build V8 * Formatting * More documentation * Update README_webassembly.md * Update README_webassembly.md * Update WasmExecutor.cpp * Update WasmExecutor.cpp * Skip tests * Update WasmExecutor.cpp * Skip performance tests * Update WasmExecutor.cpp * Address review comments * 9.8.147 -> 9.8.177 Co-authored-by: Ng Zhi An <zhin@google.com> | 14 December 2021, 00:32:31 UTC |
46d8ca8 | Zalman Stern | 14 December 2021, 00:31:45 UTC | Move parallel/async lowering from LLVM codegen to a standard Halide IR lowering pass. (#6195) * First cut at factoring parallel task compilation, including closure generating and calling, into a normal IR to IR lowering pass. Includes adding struct handling intrinsics to LLVM and C++ backends. Still a work in progress. * Fix formating that got munged by emacs somehow. * Checkpoint progress. * Small fixes. * Checkpoint progress. * Checkpoint preogress. * Checkpoint progress. * Checkpoint progress. Debugging code will be removed. * Try a fix for make_typed_struct in C++ codegen. * Another attempt to fix C++ codegen. * Another C codegen fix. * Checkpoint progress. * Use make_typed_struct rather than make_struct to construct closure. Ensure all types are carried through exactly the same to both make_struct_type and make_typed_struct. * Checkpoint. * Uniqueify closure names because LLVM was doing that to function names. * Small formatting cleanups. Fixes to call graph checker. Disable tests related to this while Andrew and I figure out how to get it to work across closures. * Get generated C++ to compile via a combination of fixing types and bludgeoning types into submission via subterfuge. * Typo fix to a typo fix. * Restore inadvertently deleted code. * Rename make_struct_type to declare_struct_type. * Add new file to CMake. * Add fixes for Hexagon offload and any passes that might add additional LoweredFunctions in the future. * Add comment with a bit of info for the future.. * Typo fix. * Don't duplicate the closure call to test the error return. Don't declare halide_buffer_t type if there are no buffers in closure. * Use _ucon in C++ code to get rid of constness casting ugliness. * Change resolve_function_name intrinsic to use a Call node to designate the function. This makes generating the C++ declaration in the C++ backend trivial. Few more changes to type handling and naming. * Small C++ backend output formating change. Don't generate For loops with no variable. Update internal test for C++ output. * Add halide_semaphore_acquire_t as a well known type for use inside compiler. * Add handling for halide_semaphore_t allocation. Formating fixes. * Fix type for halide_semaphore_t. * Reapply C++ backend formatting fix. * Add support for calling legacy halide_do_par_for runtime routine in cases where it is valid. * Formatting fixes. * Format and tidy fixes. * Attempt to pass formatting check. * Fix last set of test failures. * Formatting whitespace fixes. * Update comments. * Attempt to fix pointer cast error with some versions of LLVM. * Another attempt at fixing bool compatibility casting. * Another iteration. * Remove likely useless extern argument check logic. * Add hacky fix for losing global variables. * Comment typo fixes. * Remove no-longer-used Closure code from Codegen_Internal * Remove unused MayBlock visitor class * clang-tidy * Attempt to fix parallel offloads for HVX * Update parallel_nested_1.cpp * Augment Closure debugging * Add some std::move usage * Fix hvx lock/unlock semantics for PR #6457 (#6462) Fix qurt_hvx_lock issues * Sort IntrinsicOp and corresponding names * Remove unused `is_const_pointer()` function * Minor hygiene in LowerParallelTasks - normalize local functions to snake_case - put all local functions & classes in anon namespace - move MinThreads visitor to file scope to reduce nestedness of code * use Closure::include * Switch to PureIntrinsics per review feedback. * Minor cleanup of parallel refactor intrinsics (#6465) * Minor cleanup of parallel refactor intrinsics - Renamed `load_struct_member` to `load_typed_struct_member` to make it more clear that it is intended for use only with the results of `make_typed_struct`. - Split `declare_struct_type` into two intrinsics, `define_typed_struct` and `forward_declare_typed_struct`, removing the need for the underdocumented `mode` argument and hopefully making usage clearer - Added / clarified comments for the intrinsics modified above * Update comments * Fix comments * Update CodeGen_C.cpp * Remove 'foo.buffer' from Closure entirely This is a direct adaptation of what #6481 does for the old LLVM-based code, and allows elimination of one use of `get_pointer_or_null()`. PR is meant to be merged into factor_parallel_codegen, not master. * Update LowerParallelTasks.cpp * Keep track of task_parent inside LowerParallelTasks; remove no-longer-needed get_pointer_or_symbol() intrinsic (#6486) * Fix potential issue with additional LoweredFuncs (#6490) I think this is the right fix for this case; that said, I can't find a single instance in any of our test cases that actually triggers this. * factor parallel codegen with fewer intrinsics (#6487) * Rework some of parallel closure lowering to avoid some intrinsics This version relies more heavily on the existing make_struct, and puts function names in the Variable namespace as globals. Co-authored-by: Steven Johnson <srj@google.com> Co-authored-by: dsharletg <dsharlet@google.com> Co-authored-by: Steven Johnson <srj@google.com> Co-authored-by: Andrew Adams <andrew.b.adams@gmail.com> | 14 December 2021, 00:31:45 UTC |
e23b6f0 | Andrew Adams | 13 December 2021, 20:00:52 UTC | rounding shift rights should use rounding halving add (#6494) * rounding shift rights should use rounding halving add On x86 currently we lower cast<uint8_t>((cast<uint16_t>(x) + 8) / 16) to: cast<uint8_t>(shift_right(widening_add(x, 8), 4)) This compiles to 8 instructions on x86: Widen each half of the input vector, add 8 to each half-vector, shift each half-vector, then narrow each half-vector. First, this should have been a rounding_shift_right. Some patterns were missing in FindIntrinsics. Second, rounding_shift_right had suboptimal codegen in the case where the second arg is a positive const. On archs without a rounding shift right instruction you can further rewrite this to: shift_right(rounding_halving_add(x, 7), 3) which is just two instructions on x86. | 13 December 2021, 20:00:52 UTC |
11448b2 | Steven Johnson | 10 December 2021, 19:31:01 UTC | Document the usage of llvm::legacy::PassManager (#6491) * Document the usage of llvm::legacy::PassManager There is some confusion about whether this usage is acceptable. TL;DR: it's not just acceptable, it's required for the forseeable future. Add comments to capture this to avoid future such questions. (With great thanks to Alina for pointing me at the relevant LLVM discussion links!) * Add date | 10 December 2021, 19:31:01 UTC |
7fe1e2c | Andrew Adams | 10 December 2021, 15:06:30 UTC | Let lerp lowering incorporate a final cast. (#6480) * Let lerp lowering incorporate a final cast This lets it save a few instructions on x86 and arm. cast(UInt(16), lerp(some_u8s)) produces the following, before and after this PR Before: x86: vmovdqu (%r15,%r13), %xmm4 vpmovzxbw -2(%r15,%r13), %ymm5 vpxor %xmm0, %xmm4, %xmm6 vpmovzxbw %xmm6, %ymm6 vpmovzxbw -1(%r15,%r13), %ymm7 vpmullw %ymm6, %ymm5, %ymm5 vpmovzxbw %xmm4, %ymm4 vpmullw %ymm4, %ymm7, %ymm4 vpaddw %ymm4, %ymm5, %ymm4 vpaddw %ymm1, %ymm4, %ymm4 vpmulhuw %ymm2, %ymm4, %ymm4 vpsrlw $7, %ymm4, %ymm4 vpand %ymm3, %ymm4, %ymm4 vmovdqu %ymm4, (%rbx,%r13,2) addq $16, %r13 decq %r10 jne .LBB0_10 arm: ldr q0, [x17] ldur q2, [x17, #-1] ldur q1, [x17, #-2] subs x0, x0, #1 // =1 mvn v3.16b, v0.16b umull v4.8h, v2.8b, v0.8b umull2 v0.8h, v2.16b, v0.16b umlal v4.8h, v1.8b, v3.8b umlal2 v0.8h, v1.16b, v3.16b urshr v1.8h, v4.8h, #8 urshr v2.8h, v0.8h, #8 raddhn v1.8b, v1.8h, v4.8h raddhn v0.8b, v2.8h, v0.8h ushll v0.8h, v0.8b, #0 ushll v1.8h, v1.8b, #0 add x17, x17, #16 // =16 stp q1, q0, [x18, #-16] add x18, x18, #32 // =32 b.ne .LBB0_10 After: x86: vpmovzxbw -2(%r15,%r13), %ymm3 vmovdqu (%r15,%r13), %xmm4 vpxor %xmm0, %xmm4, %xmm5 vpmovzxbw %xmm5, %ymm5 vpmullw %ymm5, %ymm3, %ymm3 vpmovzxbw -1(%r15,%r13), %ymm5 vpmovzxbw %xmm4, %ymm4 vpmullw %ymm4, %ymm5, %ymm4 vpaddw %ymm4, %ymm3, %ymm3 vpaddw %ymm1, %ymm3, %ymm3 vpmulhuw %ymm2, %ymm3, %ymm3 vpsrlw $7, %ymm3, %ymm3 vmovdqu %ymm3, (%rbp,%r13,2) addq $16, %r13 decq %r10 jne .LBB0_10 arm: ldr q0, [x17] ldur q2, [x17, #-1] ldur q1, [x17, #-2] subs x0, x0, #1 // =1 mvn v3.16b, v0.16b umull v4.8h, v2.8b, v0.8b umull2 v0.8h, v2.16b, v0.16b umlal v4.8h, v1.8b, v3.8b umlal2 v0.8h, v1.16b, v3.16b ursra v4.8h, v4.8h, #8 ursra v0.8h, v0.8h, #8 urshr v1.8h, v4.8h, #8 urshr v0.8h, v0.8h, #8 add x17, x17, #16 // =16 stp q1, q0, [x18, #-16] add x18, x18, #32 // =32 b.ne .LBB0_10 So on X86 we skip a pointless and instruction, and on ARM we get a rounding add and shift right instead of a rounding narrowing add shift right followed by a widen. * Add test * Fix bug in test * Don't produce out-of-range lerp values | 10 December 2021, 15:06:30 UTC |
bcfd6af | Steven Johnson | 09 December 2021, 23:06:05 UTC | Fail if no_bounds_query specified for HL_JIT_TARGET (#6489) * Fail if no_bounds_query specified for HL_JIT_TARGET JIT requires the use of bounds_query; disabling it will almost certainly fail in JIT mode, either with a confusing assert message, or a crash (if you also specify no_asserts). This adds a more useful failure message. * Update Target.cpp | 09 December 2021, 23:06:05 UTC |
59118de | Steven Johnson | 08 December 2021, 22:12:53 UTC | Deal with Printer::scratch (#6469) (#6472) Instead of trying to optimize every Printer instance to use stack (and failing), move the StackPrinter concept into printer.h directly and require opt-in at the point of compilation to use stack instead of malloc. This PR also does a few other drive-by cleanups: - Ensures that all Printer ctors are explicit - Makes some template aliases to make using (e.g.) ErrorPrinter with a custom buffer size slightly cleaner syntax - Have tracing use the `.str()` method, which already deals with MSAN internally - Make all the Printer data members private - Fix some evil code in opencl.cpp that previously used the now-private data members | 08 December 2021, 22:12:53 UTC |
5516457 | Volodymyr Kysenko | 08 December 2021, 19:32:57 UTC | Put back handling of halide_xtensa_* Change-Id: I1cae77dea757d5480c50aa6329e33a31565eadbd | 08 December 2021, 19:32:57 UTC |
d089588 | Steven Johnson | 06 December 2021, 17:02:21 UTC | Move null check from Printer to halide_string_to_string() The Printer is (currently) usually inlined into every module, so this check is repeated in multiple chunks of code. Since the goal is to avoid crashing when debugging, let's move it to halide_string_to_string() (which will catch all these, and possibly more) and save some code size. (Further improvements in Printer code size on the way; this change seems worthy of considering separately.) | 08 December 2021, 19:11:28 UTC |
ac68384 | Steven Johnson | 08 December 2021, 19:00:52 UTC | Merge branch 'master' into xtensa-codegen | 08 December 2021, 19:00:52 UTC |
7199e7d | Andrew Adams | 08 December 2021, 01:45:10 UTC | Try removing optional buffer added to closure | 08 December 2021, 18:53:35 UTC |
17a3dd6 | Volodymyr Kysenko | 08 December 2021, 02:50:27 UTC | Vector intrinsics for round, sqrt, floor Change-Id: I9c023ddf93df99e8236dff5c1aeae65b67fcd2f3 | 08 December 2021, 02:50:27 UTC |
8f28c7a | Volodymyr Kysenko | 08 December 2021, 02:41:19 UTC | Merge branch 'master' into xtensa-codegen Change-Id: I179ce7fc5b59cd22ffcd0fef4258c3b527d97469 | 08 December 2021, 02:41:19 UTC |
7992369 | Andrew Adams | 07 December 2021, 16:16:50 UTC | Add a fast integer divide that rounds to zero (#6455) * Add a version of fast_integer_divide that rounds towards zero * clang-format * Fix test condition * Clean up debugging code * Add explanatory comment to performance test * Pacify clang tidy | 07 December 2021, 16:16:50 UTC |
fb305fd | Roman Lebedev | 07 December 2021, 02:15:18 UTC | `apps/linear_algebra/benchmarks/macros.h`: don't forget SSE guard (#6471) This is breaking i386 build: https://buildd.debian.org/status/fetch.php?pkg=halide&arch=i386&ver=13.0.1-3&stamp=1638786518&raw=0 | 07 December 2021, 02:15:18 UTC |
09fd580 | Volodymyr Kysenko | 06 December 2021, 23:15:10 UTC | Merge branch 'xtensa-codegen' of https://partner-code.googlesource.com/halide into xtensa-codegen Change-Id: I7939da54006b8687ba1aba22ae6730e6aa60be1c | 06 December 2021, 23:15:10 UTC |
4878c54 | Volodymyr Kysenko | 06 December 2021, 22:21:58 UTC | Merge branch 'master' into xtensa-codegen Change-Id: Ic142db3c1d70359dc9574b37ac122f03efb28540 | 06 December 2021, 22:21:58 UTC |
e0df687 | Marcos Slomp | 06 December 2021, 20:34:44 UTC | decommissioning StackPrinter (#6470) | 06 December 2021, 20:34:44 UTC |
392430d | Steven Johnson | 02 December 2021, 21:23:25 UTC | Fix Closure API (#6464) The current API requires calling a Visitor from the Closure ctor, which means we implicitly call virtual methods from the class ctor, which is a no-no for a non-final class (see comments on https://github.com/halide/Halide/pull/6443). | 02 December 2021, 21:23:25 UTC |
0ed461b | Steven Johnson | 02 December 2021, 18:38:50 UTC | Add operator<< for Closure (#6443) * Add operator<< for Closure Moves the ad-hoc implementation our of HostClosure::arguments() for easier debugging usage. Also, drive-by elimination of the body of HostClosure ctor, which was identical to the one inherited from Closure. * Update DeviceArgument.cpp * Add explanatory comment | 02 December 2021, 18:38:50 UTC |
a0a1421 | Steven Johnson | 02 December 2021, 18:12:01 UTC | Fix match_xtensa_patterns() for recent changes | 02 December 2021, 18:12:01 UTC |
22a5ddb | Steven Johnson | 02 December 2021, 18:01:48 UTC | Merge branch 'master' into xtensa-codegen | 02 December 2021, 18:01:48 UTC |
a6f4cf4 | Steven Johnson | 02 December 2021, 18:00:41 UTC | Update CMakeLists.txt | 02 December 2021, 18:00:41 UTC |