https://github.com/halide/Halide
- HEAD
- refs/heads/Halide_unsharp
- refs/heads/abadams/align_strided_const_loads
- refs/heads/abadams/alloca
- refs/heads/abadams/atomic_parallel_compiled_in
- refs/heads/abadams/atomic_vector_non_recursive
- refs/heads/abadams/averaging_tree
- refs/heads/abadams/avoid_name_mangling_in_cross_module_dependencies
- refs/heads/abadams/better_absd
- refs/heads/abadams/better_codegen_for_non_const_ramps
- refs/heads/abadams/bgu_cholesky
- refs/heads/abadams/braces_around_statements
- refs/heads/abadams/cache_tighten_producer_consumer_nodes
- refs/heads/abadams/check_reorder_dups
- refs/heads/abadams/clarify_broadcast_shuffle
- refs/heads/abadams/compositing_app
- refs/heads/abadams/cond_wait_spin
- refs/heads/abadams/cse_in_unroll_split_tuples
- refs/heads/abadams/custom_cuda_context
- refs/heads/abadams/custom_cuda_context_2
- refs/heads/abadams/custom_cuda_context_3
- refs/heads/abadams/d3d12abi
- refs/heads/abadams/deflake_mullapudi_reorder
- refs/heads/abadams/delete_prepare_for_early_exit
- refs/heads/abadams/depthwise_separable_conv
- refs/heads/abadams/diagnose_boundary_condition_failure
- refs/heads/abadams/disable_onnx_app_on_mac
- refs/heads/abadams/divide_using_pavgw
- refs/heads/abadams/dont_link_to_cudart
- refs/heads/abadams/dont_reinterpret_concat
- refs/heads/abadams/early_out
- refs/heads/abadams/enable_f16c
- refs/heads/abadams/extract_concat_bits
- refs/heads/abadams/fast_integer_divide_round_to_zero
- refs/heads/abadams/faster_runtime_integer_division
- refs/heads/abadams/faster_unroll
- refs/heads/abadams/fix-arm-seg2
- refs/heads/abadams/fix_4211
- refs/heads/abadams/fix_5323
- refs/heads/abadams/fix_5329
- refs/heads/abadams/fix_5889
- refs/heads/abadams/fix_6984
- refs/heads/abadams/fix_7229
- refs/heads/abadams/fix_7260
- refs/heads/abadams/fix_7365
- refs/heads/abadams/fix_7374
- refs/heads/abadams/fix_7504
- refs/heads/abadams/fix_7514
- refs/heads/abadams/fix_7531
- refs/heads/abadams/fix_7584
- refs/heads/abadams/fix_7584_v2
- refs/heads/abadams/fix_7742
- refs/heads/abadams/fix_7756
- refs/heads/abadams/fix_7761
- refs/heads/abadams/fix_7768
- refs/heads/abadams/fix_7786
- refs/heads/abadams/fix_7810
- refs/heads/abadams/fix_7811
- refs/heads/abadams/fix_7815
- refs/heads/abadams/fix_7867
- refs/heads/abadams/fix_7871
- refs/heads/abadams/fix_7872
- refs/heads/abadams/fix_7873
- refs/heads/abadams/fix_7888
- refs/heads/abadams/fix_7890
- refs/heads/abadams/fix_7891
- refs/heads/abadams/fix_7892
- refs/heads/abadams/fix_7893
- refs/heads/abadams/fix_7906
- refs/heads/abadams/fix_7909
- refs/heads/abadams/fix_7968
- refs/heads/abadams/fix_8038
- refs/heads/abadams/fix_8054
- refs/heads/abadams/fix_arm_fcvtmp
- refs/heads/abadams/fix_autoschedule_feature_transposition
- refs/heads/abadams/fix_cse_name_collisions
- refs/heads/abadams/fix_cuda_mat_mul_assert
- refs/heads/abadams/fix_deinterleave_bug
- refs/heads/abadams/fix_deinterleave_for_reinterpret
- refs/heads/abadams/fix_div_round_to_zero
- refs/heads/abadams/fix_fft_compile_time_regression
- refs/heads/abadams/fix_generate_output_snippets
- refs/heads/abadams/fix_if_nesting_condition
- refs/heads/abadams/fix_leaks_in_memoize_test
- refs/heads/abadams/fix_lgtm_warnings
- refs/heads/abadams/fix_links_to_master
- refs/heads/abadams/fix_load_of_broadcast
- refs/heads/abadams/fix_lossless_cast_of_sub
- refs/heads/abadams/fix_onnx_app
- refs/heads/abadams/fix_pointless_lower_condition
- refs/heads/abadams/fix_potential_gpu_deadlock
- refs/heads/abadams/fix_realize_condition_depends_on_tuple
- refs/heads/abadams/fix_reduce_expr_modulo_of_vector
- refs/heads/abadams/fix_riscv_vx_vi
- refs/heads/abadams/fix_round
- refs/heads/abadams/fix_stencil_chain_gpu_schedule
- refs/heads/abadams/fix_track_bounds_intervals
- refs/heads/abadams/fix_tutorial_2
- refs/heads/abadams/forward_partition_methods
- refs/heads/abadams/fully_fused_depthwise_separable_conv
- refs/heads/abadams/fuzz_sliding_window
- refs/heads/abadams/gaussian_blur_app
- refs/heads/abadams/generator_infinite_default_timeout
- refs/heads/abadams/gpu_autoscheduler_parallel_random_probes
- refs/heads/abadams/include_riscv_in_readme
- refs/heads/abadams/interleave_nested_vector
- refs/heads/abadams/ir_match_by_ref
- refs/heads/abadams/lerp_plus_cast
- refs/heads/abadams/local_laplacian_code_size
- refs/heads/abadams/lower_halving_sub
- refs/heads/abadams/lower_rounding_shift_right
- refs/heads/abadams/mac-arm-fixes
- refs/heads/abadams/make_fast_inverse_test_throughput_limited
- refs/heads/abadams/makefile_serialization_support
- refs/heads/abadams/mismatched_new_delete
- refs/heads/abadams/mixed_sign_mul_shift_right
- refs/heads/abadams/mixed_width_mul_shift_right
- refs/heads/abadams/multiple_scatter
- refs/heads/abadams/mux_intrinsic
- refs/heads/abadams/name_helpers
- refs/heads/abadams/narrow_predicates
- refs/heads/abadams/nested_vectorization_compile_time_regression_fix
- refs/heads/abadams/nested_vectorization_tweaks
- refs/heads/abadams/parallel_simd_op_check
- refs/heads/abadams/per_instance_profiling
- refs/heads/abadams/precompute_shared_mem_size
- refs/heads/abadams/prefer_no_gather
- refs/heads/abadams/print_uncaught_exception
- refs/heads/abadams/promote_fixed_point_intrinsics
- refs/heads/abadams/psabdw
- refs/heads/abadams/random_pipelines
- refs/heads/abadams/rationalize_gpu_for_loop_names
- refs/heads/abadams/reenable_unscheduled_stage_warning
- refs/heads/abadams/reinterpret_vector
- refs/heads/abadams/remove_arch_os_for_shaders
- refs/heads/abadams/remove_bad_pruning
- refs/heads/abadams/remove_parameter_self_references
- refs/heads/abadams/remove_readnone_on_functions
- refs/heads/abadams/remove_use_of_python_config_in_onnx_makefile
- refs/heads/abadams/reschedule_bgu
- refs/heads/abadams/reschedule_bilateral_grid
- refs/heads/abadams/rewrite_atomic_pass
- refs/heads/abadams/rounding_shift_right_use_average
- refs/heads/abadams/rungenmain_error
- refs/heads/abadams/sampling_profiler_overhead_v2
- refs/heads/abadams/scope_improvements
- refs/heads/abadams/simpler_broadcasts
- refs/heads/abadams/simplify_correlated_pyramid
- refs/heads/abadams/siotas_20
- refs/heads/abadams/sioutas_20
- refs/heads/abadams/slide_over_split_loop
- refs/heads/abadams/sorting_network_working_branch
- refs/heads/abadams/stable_topological_order
- refs/heads/abadams/string_view
- refs/heads/abadams/strip_asserts_last
- refs/heads/abadams/switch_stmt
- refs/heads/abadams/target_specific_lerp
- refs/heads/abadams/time_lowering_passes
- refs/heads/abadams/track_failedness_through_solver_lets
- refs/heads/abadams/turn_off_slp_vectorization_for_avx512
- refs/heads/abadams/tweak_unpack_buffers
- refs/heads/abadams/undo_pointless_widening
- refs/heads/abadams/unordered_blocks
- refs/heads/abadams/unsigned_demosaic
- refs/heads/abadams/update_makefile_for_llvm_19
- refs/heads/abadams/use_arm_for_runtime_triple
- refs/heads/abadams/use_pmaddubsw_for_downsample
- refs/heads/abadams/validate_gpu_schedules
- refs/heads/abadams/vector_reduce_hexagon_predicate
- refs/heads/abadams/vector_scan
- refs/heads/abadams/vst_type_fix
- refs/heads/abadams/widening_let_bug
- refs/heads/abadams/x86_avg
- refs/heads/abadams/zen4
- refs/heads/adadams/profile_allocator
- refs/heads/add_image_checks_after_bounds_inference_plus_new_rules
- refs/heads/add_outermost_to_extern
- refs/heads/add_vectorization_to_search_space
- refs/heads/aelphy/feature_cadence_changes
- refs/heads/aelphy/float_extracts
- refs/heads/align_loads_comment_fix
- refs/heads/alina-strided-store
- refs/heads/another_buffer_copy_fix
- refs/heads/arm_sve_redux
- refs/heads/ataei-block_asserts-codegen
- refs/heads/ataei-debug_info
- refs/heads/ataei-fix-pow
- refs/heads/ataei-gen_str_param
- refs/heads/ataei-implicit_lhs_vars
- refs/heads/ataei-onnx
- refs/heads/ataei-onnx_converter_update
- refs/heads/ataei-onnx_pybind
- refs/heads/ataei-resnet50_benchmarks
- refs/heads/ataei-standalone_autoscheduler
- refs/heads/ataei_lots_of_inputs
- refs/heads/auto_sched_benchmarks
- refs/heads/auto_sched_estimates
- refs/heads/auto_sched_inline
- refs/heads/auto_sched_test_notparallel
- refs/heads/autoschedule_top_down
- refs/heads/autoschedule_with_convnet
- refs/heads/autoscheduler_scalar_imageparam_fix
- refs/heads/backports/10.x
- refs/heads/backports/11.x
- refs/heads/backports/12.x
- refs/heads/backports/13.x
- refs/heads/balance_expressions
- refs/heads/bazel
- refs/heads/benchmarks
- refs/heads/blaze
- refs/heads/bounds_buffer_lets_fix
- refs/heads/bounds_correct_vs_bounds_loaded_reduced
- refs/heads/buffer_device_api_target
- refs/heads/bug_device_free
- refs/heads/bug_inline_unbounded
- refs/heads/build/fix-xcode-2
- refs/heads/build/manylinux-fixes
- refs/heads/circ_buffer
- refs/heads/cmake-no-runtime-debug-symbols
- refs/heads/cmake/asan
- refs/heads/cmake/deps-cleanup
- refs/heads/cmake/find-modules
- refs/heads/cmake/spirv
- refs/heads/cmake_wasm_features
- refs/heads/compute_at_guard_with_if_goes_on_stack
- refs/heads/compute_with_at
- refs/heads/compute_with_check
- refs/heads/compute_with_excessive_bounds
- refs/heads/compute_with_inlined
- refs/heads/compute_with_remove_is_right_level
- refs/heads/cpack/nuget
- refs/heads/ctest/wrappers
- refs/heads/cuda-constant
- refs/heads/d3d12-allocation-cache
- refs/heads/deferred_cse_after_inlining
- refs/heads/destructor_calls_deinit
- refs/heads/dg/deserialize_unmapped_objects
- refs/heads/dg/fix_vulkan_codegen_bool_conversion
- refs/heads/dg/vulkan_conform_api
- refs/heads/dg/vulkan_region_allocator_fixes
- refs/heads/dgerstmann/fix-vulkan-memory-config-init
- refs/heads/disable_acquire_release_test_vulkan
- refs/heads/distinct_wrapper_names
- refs/heads/dkg/6863_asan_fixes
- refs/heads/dkg/vulkan
- refs/heads/dpalermo_dmabuf
- refs/heads/dpalermo_dmabuf_libion
- refs/heads/dpalermo_hexagon_remote_202003
- refs/heads/dpalermo_sdk4_2_0_2
- refs/heads/ds/buffer-get-pure
- refs/heads/ds/opt-tile-size
- refs/heads/ds/tail-none
- refs/heads/ds/while
- refs/heads/dsharletg/bitwise-intrinsics
- refs/heads/dsharletg/find-vector-reduce
- refs/heads/dsharletg/jit-optimization
- refs/heads/dsharletg/memcpy-copy_from
- refs/heads/dsharletg/pattern-headroom
- refs/heads/dsharletg/refactor-host-alignment
- refs/heads/dsharletg/runtime-size
- refs/heads/dsharletg/simplify-abs
- refs/heads/dsharletg/simplify-type-bounds
- refs/heads/dsharletg/specialize-bounds
- refs/heads/dsharletg/upsample-channels
- refs/heads/empty_prefetch
- refs/heads/emscripten_vector_fix
- refs/heads/export_all-wsmoses
- refs/heads/expr_auto_sched
- refs/heads/extern_bugs
- refs/heads/extern_host_alloc
- refs/heads/factor_parallel_codegen_hack
- refs/heads/fast_sync_tsan
- refs/heads/faster_integer_division
- refs/heads/feature/apps-external
- refs/heads/feature/cmake-presets
- refs/heads/feature/convert
- refs/heads/feature/f16_interleave
- refs/heads/feature/gather_load_q7
- refs/heads/feature/llvm-codemodel
- refs/heads/feature/load_predicated
- refs/heads/feature/luma_regression
- refs/heads/feature/maintanence
- refs/heads/feature/reinterprets
- refs/heads/feature/tcm_bump_allocator
- refs/heads/feature/xtensa_fix_interleave_q8
- refs/heads/feature/xtensa_q8_tests
- refs/heads/find_intrinsics_issue
- refs/heads/find_intrinsics_widening_lets
- refs/heads/fix-floated-pure-stage
- refs/heads/fix-race-condition
- refs/heads/fix_hexagon_alignment
- refs/heads/fix_hvx_intrinsics
- refs/heads/fix_prefetch_test
- refs/heads/fix_windows_vs15_build
- refs/heads/fixed_length_vectors
- refs/heads/fixed_point_local_laplac
- refs/heads/gemmlowp
- refs/heads/generate
- refs/heads/gha/pip
- refs/heads/gpu_canon_fix
- refs/heads/halide_ir_flatbuffer
- refs/heads/hex_dma2_async
- refs/heads/hexagon_le_runtime
- refs/heads/hexagon_priority
- refs/heads/hexagon_setpriority
- refs/heads/hexagon_strided_pred_load
- refs/heads/hexagon_sysmon_markers
- refs/heads/imaging-synthesis
- refs/heads/includes_fix
- refs/heads/ios_fast_sync_fix
- refs/heads/jia-kai-fix-runtime-cuda-init
- refs/heads/kamil-openglcompute-infinity
- refs/heads/kamil/name_pthread_workers
- refs/heads/kp_bit_shift
- refs/heads/line_buffer
- refs/heads/loop_carry_not_working
- refs/heads/lower_on_huge_stack
- refs/heads/main
- refs/heads/master
- refs/heads/memoize_with_extents
- refs/heads/metal_float16
- refs/heads/metaprogrammed_simplifier_mod
- refs/heads/mohamedadaly-vmlal
- refs/heads/more_powerful_sliding
- refs/heads/new_autoschedule_with_new_simplifier_arm_worker_branch
- refs/heads/new_autoscheduler
- refs/heads/new_simplifier_rule_testing
- refs/heads/newer_ion_ioctl
- refs/heads/no_bounds_query_when_bounds_used
- refs/heads/opengl_compute_buffer_types_fix
- refs/heads/openglcompute_reuse_shared_allocations
- refs/heads/optmize_reorder
- refs/heads/par_for_opt
- refs/heads/pdb/fix_7806
- refs/heads/pdb/hexagon_remote_cmake
- refs/heads/pdb_add_libcpp_makefile_inc
- refs/heads/pdb_eliminate_interleaves_test
- refs/heads/pdb_fix_clang_build
- refs/heads/pdb_fix_install_qc
- refs/heads/pdb_fix_loop_carry
- refs/heads/pdb_fix_simd_op_check_hvx
- refs/heads/pdb_mul_div_mod_multi_thread
- refs/heads/pdb_remove_hvx_v64
- refs/heads/perform_inline_with_order
- refs/heads/pr/2572
- refs/heads/pr/2676
- refs/heads/pr/2975
- refs/heads/pr/3017
- refs/heads/pr/3081
- refs/heads/pr/3387
- refs/heads/pr/3939
- refs/heads/pr/3960
- refs/heads/pr/4380
- refs/heads/pr/4414
- refs/heads/pr/5331
- refs/heads/pr/5438
- refs/heads/pr/5455
- refs/heads/pr/5758_2
- refs/heads/predicated_vector
- refs/heads/prefetch_specialize
- refs/heads/print_schedule
- refs/heads/profile_hardware_counters
- refs/heads/random-pipelines
- refs/heads/rdom_with_pure_vars
- refs/heads/readme-fix-gcd
- refs/heads/realization_order
- refs/heads/refactor_module
- refs/heads/register_promotion
- refs/heads/release/10.x
- refs/heads/release/11.x
- refs/heads/release/12.x
- refs/heads/release/13.x
- refs/heads/release/14.x
- refs/heads/release/15.x
- refs/heads/release/16.x
- refs/heads/release/17.x
- refs/heads/release/8.x
- refs/heads/remove_max_on_fuse_factor
- refs/heads/reorder_rvar
- refs/heads/reset_unique_counter
- refs/heads/revert-3612-ataei-speedup_compiletime
- refs/heads/revert-7009-rootjalex/distribute-w_shl
- refs/heads/revert-7601-compile_hexagon_remote
- refs/heads/riscv_update
- refs/heads/rl_simplifier_rules
- refs/heads/rootjalex/add_simpl_rules
- refs/heads/rootjalex/arm-optimize
- refs/heads/rootjalex/autoscheduler_mcts
- refs/heads/rootjalex/bounds-rewriter
- refs/heads/rootjalex/bounds_synthesis
- refs/heads/rootjalex/cbounds
- refs/heads/rootjalex/cbounds_predicated
- refs/heads/rootjalex/fix-sat-overflow
- refs/heads/rootjalex/fix_estimate_issue
- refs/heads/rootjalex/fix_failed_unrolls
- refs/heads/rootjalex/gsoc_codegen
- refs/heads/rootjalex/improve_cbounds_fixed
- refs/heads/rootjalex/improve_constant_bounds
- refs/heads/rootjalex/pitchfork-arm
- refs/heads/rootjalex/reinterpret-simplify
- refs/heads/rootjalex/rts
- refs/heads/rootjalex/super_simplify_bounds
- refs/heads/rootjalex/test_cbounds_fixed
- refs/heads/rootjalex/test_constant_bounds
- refs/heads/rootjalex/trs-codegen
- refs/heads/rootjalex/trs-codegen-cross
- refs/heads/rootjalex/trs-merge
- refs/heads/rootjalex/uint32-int32-cast
- refs/heads/rootjalex/x86-hadds
- refs/heads/rootjalex/x86-optimize
- refs/heads/rootjalex/x86-optimize-test
- refs/heads/rootjalex/x86-sat
- refs/heads/rootjalex/x86-test
- refs/heads/rule_removal_experiments
- refs/heads/schedule-output-storage
- refs/heads/separate_bounds_query_entrypoint
- refs/heads/shallow
- refs/heads/shift_amount_type_change
- refs/heads/shoaibkamil/cmake-without-arm
- refs/heads/shoaibkamil/correct_memory_fences
- refs/heads/shoaibkamil/d3d-fixes
- refs/heads/shoaibkamil/deprecate_openglcompute
- refs/heads/shoaibkamil/json
- refs/heads/shoaibkamil/llvm_clone_tag
- refs/heads/shoaibkamil/minor-vcpkg-doc-change
- refs/heads/shoaibkamil/opengl_compute_tests
- refs/heads/shoaibkamil/performance_tests_as_generators
- refs/heads/shoaibkamil/rule_removal_experiments
- refs/heads/shoaibkamil/super_simplify_with_interpreter
- refs/heads/shoaibkamil/windows-arm-fix-attributes
- refs/heads/sim_shlib_addr_print
- refs/heads/simplify-nested-broadcasts
- refs/heads/simplify-vectorreduce-shuffles2
- refs/heads/simplify_mod
- refs/heads/sioutas_2020
- refs/heads/sioutas_2020_autoscheduler
- refs/heads/slomp/gpu-codegen-profiling
- refs/heads/slomp/msvc-static-analysis
- refs/heads/solve_div
- refs/heads/solve_div_master
- refs/heads/solve_div_simplifier_test
- refs/heads/sr/python-late-binding-defaults
- refs/heads/srj-aaa
- refs/heads/srj-alloc
- refs/heads/srj-alloca
- refs/heads/srj-appmake2
- refs/heads/srj-armv83a
- refs/heads/srj-aslog
- refs/heads/srj-assert
- refs/heads/srj-assoc
- refs/heads/srj-auto-multi
- refs/heads/srj-auto-multi2
- refs/heads/srj-auto_schedule_mat_mul
- refs/heads/srj-autosched
- refs/heads/srj-b2cpphide
- refs/heads/srj-barr
- refs/heads/srj-bits
- refs/heads/srj-blacklist
- refs/heads/srj-bounds
- refs/heads/srj-bufcalltype
- refs/heads/srj-bufcallwrap
- refs/heads/srj-bufcallwrap2
- refs/heads/srj-buffer
- refs/heads/srj-bv
- refs/heads/srj-classic-autotune
- refs/heads/srj-clean
- refs/heads/srj-constcall
- refs/heads/srj-crosscompile
- refs/heads/srj-ctlz
- refs/heads/srj-cvec-patch
- refs/heads/srj-dag
- refs/heads/srj-debug-to-file
- refs/heads/srj-deir
- refs/heads/srj-f16
- refs/heads/srj-fp16
- refs/heads/srj-fsch
- refs/heads/srj-fthru
- refs/heads/srj-g2
- refs/heads/srj-g3
- refs/heads/srj-gha-test-fixes
- refs/heads/srj-hidden
- refs/heads/srj-hide2
- refs/heads/srj-hvx
- refs/heads/srj-hvx-bug
- refs/heads/srj-hvx-codegen-bug
- refs/heads/srj-hvx-nocopy
- refs/heads/srj-hvxshift
- refs/heads/srj-iib
- refs/heads/srj-initshape
- refs/heads/srj-inv
- refs/heads/srj-ir
- refs/heads/srj-irmut2
- refs/heads/srj-iwyu
- refs/heads/srj-iwyu3
- refs/heads/srj-javascript_work_in_progress
- refs/heads/srj-lensblur
- refs/heads/srj-lessinc
- refs/heads/srj-llvm-loop-opt
- refs/heads/srj-mak
- refs/heads/srj-maxthreads
- refs/heads/srj-mod
- refs/heads/srj-msan
- refs/heads/srj-msan-call
- refs/heads/srj-muldivmod
- refs/heads/srj-mut
- refs/heads/srj-outputs-2
- refs/heads/srj-parse
- refs/heads/srj-pch
- refs/heads/srj-printfunc
- refs/heads/srj-pygp
- refs/heads/srj-revertbits
- refs/heads/srj-schedule-storage
- refs/heads/srj-shl-shr-2
- refs/heads/srj-sio
- refs/heads/srj-static-const
- refs/heads/srj-strided-store
- refs/heads/srj-tidyh
- refs/heads/srj-tiff
- refs/heads/srj-trace
- refs/heads/srj-tutorial
- refs/heads/srj-using
- refs/heads/srj-wasmfix
- refs/heads/srj-xor2
- refs/heads/srj/abstract-gen-without-get-output-func-KEEP
- refs/heads/srj/aligned-alloc
- refs/heads/srj/aligned-alloc-2
- refs/heads/srj/aligned-malloc-with-aligned-alloc
- refs/heads/srj/all-explicit-ctor
- refs/heads/srj/anderson-thread-info-ptr
- refs/heads/srj/aot-perf
- refs/heads/srj/argv-signatures
- refs/heads/srj/argv-types
- refs/heads/srj/async-test
- refs/heads/srj/b2cpp-const-data
- refs/heads/srj/better-xt-dispatch
- refs/heads/srj/bfloat1
- refs/heads/srj/bp
- refs/heads/srj/build_halide_h
- refs/heads/srj/c-bool
- refs/heads/srj/cache-clear
- refs/heads/srj/clang-fmt-ignore
- refs/heads/srj/clang-tidy
- refs/heads/srj/clear-c-cache
- refs/heads/srj/cmake-asan
- refs/heads/srj/cmake-asan2
- refs/heads/srj/cmake-jit-generators
- refs/heads/srj/configure-cmake
- refs/heads/srj/cpp-generator-v2-experiment-KEEP
- refs/heads/srj/crosscompile
- refs/heads/srj/ctad
- refs/heads/srj/depr
- refs/heads/srj/deprecation
- refs/heads/srj/device-copy
- refs/heads/srj/example
- refs/heads/srj/experiment
- refs/heads/srj/experiment-6967
- refs/heads/srj/exporting
- refs/heads/srj/expr_t
- refs/heads/srj/external-tensors
- refs/heads/srj/fix-pytorch
- refs/heads/srj/fixed-rollback
- refs/heads/srj/fopen-fix
- refs/heads/srj/forward
- refs/heads/srj/forward-name
- refs/heads/srj/gen-func
- refs/heads/srj/gen-func-2
- refs/heads/srj/gen-func-3
- refs/heads/srj/gen2-1
- refs/heads/srj/gen_closure
- refs/heads/srj/generator_aot_gpu_multi_context_threaded
- refs/heads/srj/globals
- refs/heads/srj/halide-buffer-crop
- refs/heads/srj/halide-malloc-alignment
- refs/heads/srj/halide-must-use
- refs/heads/srj/halide-runtime-must-use-result
- refs/heads/srj/hang-repro
- refs/heads/srj/hannk
- refs/heads/srj/hannk-aliasing
- refs/heads/srj/hannk-error-checking
- refs/heads/srj/hannk-errors
- refs/heads/srj/hannk-inplace
- refs/heads/srj/hannk-mmap
- refs/heads/srj/hannk-tflite-27
- refs/heads/srj/hannk-verbosity
- refs/heads/srj/hdrs
- refs/heads/srj/html-becomes-viz
- refs/heads/srj/implicit-mult-widening
- refs/heads/srj/issue-7076
- refs/heads/srj/iwyu
- refs/heads/srj/iwyu-2
- refs/heads/srj/iwyu-6
- refs/heads/srj/libHANNK
- refs/heads/srj/llvm_type_of
- refs/heads/srj/maybe-unused
- refs/heads/srj/meanop
- refs/heads/srj/metadata-calling-convention
- refs/heads/srj/more-tidy
- refs/heads/srj/msan-dtf
- refs/heads/srj/multimeta
- refs/heads/srj/nanobind
- refs/heads/srj/new-rt-1
- refs/heads/srj/no-threadpool
- refs/heads/srj/no-timeout-thread
- refs/heads/srj/oglc-mutexed
- refs/heads/srj/param-map
- refs/heads/srj/pip-15.x
- refs/heads/srj/pip-cron
- refs/heads/srj/possible-uninited
- refs/heads/srj/pr-7566
- refs/heads/srj/printer-size
- refs/heads/srj/profiler-data-race
- refs/heads/srj/ptr-int-cast
- refs/heads/srj/pyapps
- refs/heads/srj/pyext-fix
- refs/heads/srj/pygen-class
- refs/heads/srj/pygen-deux
- refs/heads/srj/pygen-func
- refs/heads/srj/pygen-native-types
- refs/heads/srj/pyinstall
- refs/heads/srj/pypi-try
- refs/heads/srj/pystuff
- refs/heads/srj/python-buffer-unpack
- refs/heads/srj/python-tutorial
- refs/heads/srj/reshape
- refs/heads/srj/rt-error-smallify
- refs/heads/srj/rt-return-types
- refs/heads/srj/runtime-error-handling
- refs/heads/srj/sat-fixes-exp
- refs/heads/srj/sat-fixes-exp-2
- refs/heads/srj/shadow-field
- refs/heads/srj/snprintf
- refs/heads/srj/spirv-license
- refs/heads/srj/stat-buf-deprecations
- refs/heads/srj/static-buffer-generators
- refs/heads/srj/stmt-html
- refs/heads/srj/stringify
- refs/heads/srj/synth-gen-params
- refs/heads/srj/synth-params-python
- refs/heads/srj/test-arm_sve_redux
- refs/heads/srj/test-intrinsics-bounds
- refs/heads/srj/test8076
- refs/heads/srj/test8078
- refs/heads/srj/test8094
- refs/heads/srj/test8105a
- refs/heads/srj/test8115
- refs/heads/srj/test_tmpdir_fix
- refs/heads/srj/tidy
- refs/heads/srj/tidy-format-14
- refs/heads/srj/tidymore
- refs/heads/srj/tidymore2
- refs/heads/srj/tls
- refs/heads/srj/tls-3
- refs/heads/srj/tls-4
- refs/heads/srj/tls-ucon
- refs/heads/srj/tmp-unschedule-experiment
- refs/heads/srj/tot-fix
- refs/heads/srj/try-revert-sat
- refs/heads/srj/type-traits
- refs/heads/srj/typed-func
- refs/heads/srj/ucon-all-const
- refs/heads/srj/ucon-non-const
- refs/heads/srj/visit-warnings
- refs/heads/srj/wasm-atomic2
- refs/heads/srj/wasm-simd
- refs/heads/srj/wasm-stuff
- refs/heads/srj/wasm-threads
- refs/heads/srj/wasm-updates
- refs/heads/srj/wasm-work
- refs/heads/srj/wip
- refs/heads/srj/x-rounding
- refs/heads/srj/xbuf
- refs/heads/srj/xc+plus+size+tmp
- refs/heads/srj/xc-types
- refs/heads/srj/xt-uint-cast-test
- refs/heads/srj/xtensa-arch
- refs/heads/srj/xtensa-merge
- refs/heads/srj/xvc-experimetn
- refs/heads/srj/zlib-embed
- refs/heads/standalone_autoscheduler
- refs/heads/standalone_autoscheduler_arm_worker
- refs/heads/standalone_autoscheduler_arm_worker_amazon
- refs/heads/standalone_autoscheduler_gpu
- refs/heads/standalone_autoscheduler_hexagon
- refs/heads/sticky_task_assignments
- refs/heads/store_with
- refs/heads/store_with_solver_for_super_simplify
- refs/heads/strict_float_cse_fix
- refs/heads/super_simplify
- refs/heads/super_simplify_v2
- refs/heads/super_simplify_v3
- refs/heads/transitive_wrapper
- refs/heads/trigger-release-v16
- refs/heads/tzumao-autodiff-boundarycond
- refs/heads/tzumao-gradient-autoscheduler-bug
- refs/heads/tzumao-predicate-store-load
- refs/heads/tzumao-python-buffer
- refs/heads/tzumao_autodiff_unbounded
- refs/heads/tzumao_improve_gradient_autoscheduler
- refs/heads/tzumao_issue_4297
- refs/heads/tzumao_licm_before_BI
- refs/heads/unbounded_bugs
- refs/heads/undo_async_copy_chain_black_list
- refs/heads/use_string_literals_for_blobs
- refs/heads/users/lukas/python-pip
- refs/heads/validate_sched_error_msg
- refs/heads/var_ir_fix
- refs/heads/vksnk/async-experiment
- refs/heads/vksnk/async-multiple-producers
- refs/heads/vksnk/async-order
- refs/heads/vksnk/better-loop-carry
- refs/heads/vksnk/better-message
- refs/heads/vksnk/bound-storage
- refs/heads/vksnk/bounds-widen-right
- refs/heads/vksnk/c-print-type
- refs/heads/vksnk/c-round
- refs/heads/vksnk/check-return-result
- refs/heads/vksnk/compute-with-bug
- refs/heads/vksnk/compute_with_async
- refs/heads/vksnk/dma-limit-channels
- refs/heads/vksnk/dma-min-max
- refs/heads/vksnk/expr-match-shuffle
- refs/heads/vksnk/extract-from-scalar
- refs/heads/vksnk/f16-load
- refs/heads/vksnk/fix-packvr
- refs/heads/vksnk/fix_halide_xtensa_narrow_with_rounding_shift_i16
- refs/heads/vksnk/fused-compute-with
- refs/heads/vksnk/hoist-storage-bug
- refs/heads/vksnk/lerp-intrinsics
- refs/heads/vksnk/lower-signed-shifts
- refs/heads/vksnk/missing-exception
- refs/heads/vksnk/non-widening-halves
- refs/heads/vksnk/optimize-shuffles
- refs/heads/vksnk/replace-all
- refs/heads/vksnk/restrict
- refs/heads/vksnk/roll-buffer
- refs/heads/vksnk/roundeven-arm
- refs/heads/vksnk/rvar-bounds
- refs/heads/vksnk/simplify-slice
- refs/heads/vksnk/skip-semaphores
- refs/heads/vksnk/storage-folding
- refs/heads/vksnk/strided-load-of-4_2
- refs/heads/vksnk/typed-scope
- refs/heads/vksnk/update-simd-driver
- refs/heads/vksnk/vectorize-bug
- refs/heads/vksnk/vectorize-scalarize
- refs/heads/vksnk/widening_absd
- refs/heads/vksnk/xtensa-codegen-fp16
- refs/heads/vksnk/xtensa-dma-improvements
- refs/heads/vksnk/xtensa-regroup-pass
- refs/heads/vksnk/xtensa/lift-allocs
- refs/heads/vulkan
- refs/heads/vulkan-diagnose-alloc-failures
- refs/heads/vulkan-phase0-adts
- refs/heads/vulkan-phase1-spirv
- refs/heads/vulkan-phase2-runtime
- refs/heads/vulkan2
- refs/heads/vulkan_fix_gpu_dynamic_shared_test
- refs/heads/vulkan_fix_subregion_memory_offsets
- refs/heads/webassembly-old
- refs/heads/winograd
- refs/heads/wording_fix
- refs/heads/xtensa-codegen
- refs/heads/xtensa-codegen-parallel
- refs/heads/xuanda/fix-serialize-bad-partition-always
- refs/remotes/origin/rootjalex/add_autosched_caching
- refs/tags/release_2018_02_15
- refs/tags/release_2019_08_27
- refs/tags/release_8.0.0
- refs/tags/v10.0.0
- refs/tags/v10.0.1
- refs/tags/v11.0.0
- refs/tags/v11.0.1
- refs/tags/v12.0.0
- refs/tags/v12.0.1
- refs/tags/v13.0.0
- refs/tags/v13.0.1
- refs/tags/v13.0.2
- refs/tags/v13.0.3
- refs/tags/v13.0.4
- refs/tags/v14.0.0
- refs/tags/v15.0.0
- refs/tags/v15.0.1
- refs/tags/v16.0.0
- refs/tags/v17.0.0
- refs/tags/v17.0.1
- refs/tags/v8.0.0
Take a new snapshot of a software origin
If the archived software origin currently browsed is not synchronized with its upstream version (for instance when new commits have been issued), you can explicitly request Software Heritage to take a new snapshot of it.
Use the form below to proceed. Once a request has been submitted and accepted, it will be processed as soon as possible. You can then check its processing state by visiting this dedicated page.Processing "take a new snapshot" request ...
Permalinks
To reference or cite the objects present in the Software Heritage archive, permalinks based on SoftWare Hash IDentifiers (SWHIDs) must be used.
Select below a type of object currently browsed in order to display its associated SWHID and permalink.
Revision | Author | Date | Message | Commit Date |
---|---|---|---|---|
412a0a4 | Steven Johnson | 08 December 2021, 01:58:10 UTC | Minor tweaks to generate_closure_ir() Cherry-picking some minor changes from an otherwise-failed experimental branch: - Condense code in generate_closure_ir() via use of a lambda - Emit the calls to load_typed_struct_member in such a way that they are in ascending order (they currently get emitted in descending order, which is not wrong but is unsatisfying and suboptimal) - Call simplify() on the final Stmt, so that unused calls to load_typed_struct_member get omitted from the IR (presumably LLVM will strip these out later but it can't fail to do so if we don't emit them at all) | 08 December 2021, 01:58:10 UTC |
2923246 | Steven Johnson | 06 December 2021, 21:31:50 UTC | Update CodeGen_C.cpp | 06 December 2021, 21:31:50 UTC |
b6ba514 | Steven Johnson | 06 December 2021, 17:09:28 UTC | Minor cleanup of parallel refactor intrinsics (#6465) * Minor cleanup of parallel refactor intrinsics - Renamed `load_struct_member` to `load_typed_struct_member` to make it more clear that it is intended for use only with the results of `make_typed_struct`. - Split `declare_struct_type` into two intrinsics, `define_typed_struct` and `forward_declare_typed_struct`, removing the need for the underdocumented `mode` argument and hopefully making usage clearer - Added / clarified comments for the intrinsics modified above * Update comments * Fix comments | 06 December 2021, 17:09:28 UTC |
c21e701 | Z Stern | 03 December 2021, 01:06:23 UTC | Switch to PureIntrinsics per review feedback. | 03 December 2021, 01:06:23 UTC |
a1d267e | Steven Johnson | 02 December 2021, 21:30:37 UTC | use Closure::include | 02 December 2021, 21:30:37 UTC |
24a6eb2 | Steven Johnson | 02 December 2021, 21:24:00 UTC | Merge branch 'master' into factor_parallel_codegen | 02 December 2021, 21:24:00 UTC |
392430d | Steven Johnson | 02 December 2021, 21:23:25 UTC | Fix Closure API (#6464) The current API requires calling a Visitor from the Closure ctor, which means we implicitly call virtual methods from the class ctor, which is a no-no for a non-final class (see comments on https://github.com/halide/Halide/pull/6443). | 02 December 2021, 21:23:25 UTC |
387c58f | Steven Johnson | 02 December 2021, 19:02:31 UTC | Minor hygiene in LowerParallelTasks - normalize local functions to snake_case - put all local functions & classes in anon namespace - move MinThreads visitor to file scope to reduce nestedness of code | 02 December 2021, 19:02:31 UTC |
25b7b77 | Steven Johnson | 02 December 2021, 18:49:23 UTC | Remove unused `is_const_pointer()` function | 02 December 2021, 18:49:23 UTC |
0ed461b | Steven Johnson | 02 December 2021, 18:38:50 UTC | Add operator<< for Closure (#6443) * Add operator<< for Closure Moves the ad-hoc implementation our of HostClosure::arguments() for easier debugging usage. Also, drive-by elimination of the body of HostClosure ctor, which was identical to the one inherited from Closure. * Update DeviceArgument.cpp * Add explanatory comment | 02 December 2021, 18:38:50 UTC |
9af64ae | Steven Johnson | 02 December 2021, 17:55:04 UTC | Sort IntrinsicOp and corresponding names | 02 December 2021, 17:55:04 UTC |
d180598 | Steven Johnson | 02 December 2021, 17:46:59 UTC | Merge branch 'master' into factor_parallel_codegen | 02 December 2021, 17:46:59 UTC |
5cf9ae5 | Andrew Adams | 02 December 2021, 15:04:43 UTC | Reduce overhead of sampling profiler by having only one thread do it (#6433) * Reduce overhead of sampling profiler by having only one thread do it * Use const ref * One line per member | 02 December 2021, 15:04:43 UTC |
479d839 | Steven Johnson | 02 December 2021, 03:42:04 UTC | Add LinkageType::ExternalPlusArgv (#6452) (#6463) Allows us to skip generating metadata for offloaded hexagon funcs, which will never use it. | 02 December 2021, 03:42:04 UTC |
0faca07 | Steven Johnson | 02 December 2021, 00:42:34 UTC | Fix hvx lock/unlock semantics for PR #6457 (#6462) Fix qurt_hvx_lock issues | 02 December 2021, 00:42:34 UTC |
f02b1e9 | Steven Johnson | 01 December 2021, 19:26:03 UTC | Merge branch 'master' into factor_parallel_codegen | 01 December 2021, 19:26:03 UTC |
4877d26 | Steven Johnson | 01 December 2021, 19:22:00 UTC | Tweak Hexagon codegen output to match the pattern in Lower.cpp more accurately (for level 1 vs 2); also prefix the outputs so they are easier to read as Hexagon-specific when debugging (#6461) | 01 December 2021, 19:22:00 UTC |
7eb9f5f | Steven Johnson | 30 November 2021, 23:55:26 UTC | Add some std::move usage | 30 November 2021, 23:55:26 UTC |
e02571b | Steven Johnson | 30 November 2021, 18:10:02 UTC | Augment Closure debugging | 30 November 2021, 19:29:30 UTC |
c0192ff | Steven Johnson | 30 November 2021, 06:13:44 UTC | Re-enable performance_async_gpu for D3D12Compute (#6450) * Re-enable performance_async_gpu for D3D12Compute It's been disabled for ~2 years because of flaky failures (#3586); we should see if the many changes since then have improved things or not. * tickle buildbots | 30 November 2021, 06:13:44 UTC |
7d2c2c6 | Steven Johnson | 29 November 2021, 23:33:33 UTC | Update parallel_nested_1.cpp | 29 November 2021, 23:33:33 UTC |
e64651b | Steven Johnson | 29 November 2021, 22:52:33 UTC | Merge branch 'master' into factor_parallel_codegen | 29 November 2021, 22:52:33 UTC |
5aeea6a | Andrew Adams | 26 November 2021, 22:32:24 UTC | Fixes for c++20 (#6446) Fixes #6445 | 26 November 2021, 22:32:24 UTC |
76c0946 | Martijn Courteaux | 26 November 2021, 20:03:24 UTC | Syntax highlighting for embedded PTX code. (#6447) * Include GPU source kernels in Stmt and StmtHtml file. * Syntax highlighting for embedded PTX code. | 26 November 2021, 20:03:24 UTC |
3bde22a | Martijn Courteaux | 24 November 2021, 20:59:37 UTC | Include GPU source kernels in Stmt and StmtHtml file. (#6444) | 24 November 2021, 20:59:37 UTC |
e71bb81 | Steven Johnson | 23 November 2021, 23:07:45 UTC | Merge branch 'factor_parallel_codegen' of https://github.com/halide/Halide into factor_parallel_codegen | 23 November 2021, 23:07:45 UTC |
2fd4c9f | Steven Johnson | 23 November 2021, 23:07:36 UTC | Attempt to fix parallel offloads for HVX | 23 November 2021, 23:07:36 UTC |
8b68f85 | Andrew Adams | 23 November 2021, 21:13:48 UTC | Avoid needless gather in fast_integer_divide lowering (#6441) * Avoid needless gather in fast_integer_divide lowering fast_integer_divide did two lookups, one for a multiplier, and one for a shift. It turns out you can just use count leading zeros to compute a workable shift instead of having to do a lookup. This PR speeds up use of fast_integer_divide in cases where the denominator varies across vector lanes by ~70% or so by avoiding one of the two expensive gathers. * Fix slash direction * Pacify clang-tidy * Use portable bit-counting methods * Cleaner initialization of tables | 23 November 2021, 21:13:48 UTC |
6528ba6 | Steven Johnson | 23 November 2021, 20:18:13 UTC | clang-tidy | 23 November 2021, 20:18:13 UTC |
b9ac7d6 | Steven Johnson | 23 November 2021, 19:45:20 UTC | Remove unused MayBlock visitor class | 23 November 2021, 19:45:20 UTC |
df247f9 | Steven Johnson | 23 November 2021, 19:41:00 UTC | Remove no-longer-used Closure code from Codegen_Internal | 23 November 2021, 19:41:00 UTC |
114d209 | Steven Johnson | 23 November 2021, 18:48:38 UTC | Merge branch 'master' into factor_parallel_codegen | 23 November 2021, 18:48:38 UTC |
d12fbd1 | Steven Johnson | 23 November 2021, 17:33:38 UTC | Codegen_C: buffer compilation needs to special-case scalar buffers (#6442) The existing code will emit something like `halide_dimension_t foo_buffer_shape[] = {};` for these, which is a zero-length array, which some compilers will (justifiably) say has no effect. We should be able to just use nullptr for the shape in these cases. | 23 November 2021, 17:33:38 UTC |
59d6da7 | Andrew Adams | 23 November 2021, 17:25:47 UTC | Skip custom cuda context test on older GPUs (#6437) | 23 November 2021, 17:25:47 UTC |
5733306 | Steven Johnson | 22 November 2021, 21:30:11 UTC | Merge branch 'master' into factor_parallel_codegen | 22 November 2021, 21:30:11 UTC |
a89041b | Steven Johnson | 22 November 2021, 21:29:11 UTC | Ensure that halide_start_clock() is called before halide_current_time_ns() in hexagon_host.cpp (#6438) This oversight was causing an assert with the -debug feature flag enabled (with presumably-misleading timing results as well) | 22 November 2021, 21:29:11 UTC |
57d1e05 | Steven Johnson | 22 November 2021, 19:46:52 UTC | Set up SANITIZER_FLAGS and OPTIMIZE for apps/Makefile.inc (#6435) Minor hygiene to make it easy to build AOT apps with TSAN or ASAN. | 22 November 2021, 19:46:52 UTC |
2239443 | Andrew Adams | 19 November 2021, 22:56:12 UTC | Do target-specific lowering of lerp (#6432) * Do target-specific lowering of lerp Saves instructions on x86. Before #6426 vpaddw %ymm0, %ymm1, %ymm1 vpsrlw $8, %ymm1, %ymm2 vpaddw %ymm2, %ymm1, %ymm1 vpsrlw $8, %ymm1, %ymm1 After #6426 vpsrlw $7, %ymm2, %ymm3 vpand %ymm0, %ymm3, %ymm3 vpsrlw $8, %ymm2, %ymm4 vpaddw %ymm2, %ymm4, %ymm2 vpaddw %ymm3, %ymm2, %ymm2 vpsrlw $7, %ymm2, %ymm3 vpand %ymm0, %ymm3, %ymm3 vpsrlw $8, %ymm2, %ymm2 vpaddw %ymm2, %ymm3, %ymm2 vpand %ymm1, %ymm2, %ymm2 This PR: vpaddw %ymm0, %ymm3, %ymm3 vpmulhuw %ymm1, %ymm3, %ymm3 vpsrlw $7, %ymm3, %ymm3 * Target is a struct | 19 November 2021, 22:56:12 UTC |
cfd03c9 | Steven Johnson | 19 November 2021, 17:41:05 UTC | Don't remap the function name or the target in the metadata (#6430) The remapping is only intended to be used for output argument(s), not the function name; if you have an output with the same name as the function, you can get the metadata emitted with incorrect information. (And remapping the target string is just silly.) This is almost impossible to do currently, but if you construct a Generator just right, you can make it happen. | 19 November 2021, 17:41:05 UTC |
c3040cb | Volodymyr Kysenko | 19 November 2021, 17:10:15 UTC | Rewrite integer lerp using intrinsics (#6426) * Rewrite integer lerp using intrinsics * Comment | 19 November 2021, 17:10:15 UTC |
f49f800 | Z Stern | 19 November 2021, 08:50:15 UTC | Merge branch 'master' into factor_parallel_codegen | 19 November 2021, 08:50:15 UTC |
5d93f1e | Z Stern | 19 November 2021, 08:49:17 UTC | Comment typo fixes. | 19 November 2021, 08:49:17 UTC |
0e40edc | Ashish Uthama | 18 November 2021, 21:27:53 UTC | Include LICENSE.txt in package (#6428) Co-authored-by: Ashish Uthama <you@example.com> | 18 November 2021, 21:27:53 UTC |
36dd10f | Steven Johnson | 17 November 2021, 23:14:21 UTC | Fix Introspection issues (#6424) - DWARF v5 has a slightly different header; this recognizes it so we don't fail immediately - Add support for the line_strp form - Allow for a graceful failure if a debug abbreviation is missing; I've only seen this when compiling for TSAN, and I'm honestly not entirely sure if this is a bug in the DWARF generation for those tools vs a subtle flaw in our parsing, but bailing out early and skipping introspection seems kinder than assert-fail. | 17 November 2021, 23:14:21 UTC |
00a0715 | dsharletg | 15 November 2021, 21:33:21 UTC | Add hacky fix for losing global variables. | 15 November 2021, 21:34:16 UTC |
16fa3ce | Steven Johnson | 12 November 2021, 23:17:53 UTC | [hannk] Pacify clang-tidy (#6412) * [hannk] Pacify clang-tidy * One more ASAN fix We must use use_global_gc = false to work properly with the JIT * Revert "One more ASAN fix" This reverts commit 9ed07a70b4a656790236a5ff6966155df823a319. * Rework Op::mutate() to avoid UB | 12 November 2021, 23:17:53 UTC |
b63f6af | Steven Johnson | 12 November 2021, 20:56:57 UTC | [hannk] Fix lower_tflite_fullyconnected (#6414) Fixed the bounds calculation in lower_tflite_fullyconnected() to preserve the invariants expected, and added a testcase that previously failed. | 12 November 2021, 20:56:57 UTC |
8c2dd5f | Steven Johnson | 12 November 2021, 20:34:14 UTC | One more ASAN fix (#6413) We must use use_global_gc = false to work properly with the JIT | 12 November 2021, 20:34:14 UTC |
0153c6b | Steven Johnson | 12 November 2021, 16:35:37 UTC | Revamp Hannk IR (#6379) Refactor Hannk IR and transforms to use a Mutator-based approach | 12 November 2021, 16:35:37 UTC |
79da2a0 | Steven Johnson | 12 November 2021, 16:34:30 UTC | Fix broken ASAN code (#6408) * Fix broken ASAN code Various changes and merges ended up with us using multiple ASAN passes, which was pretty crashy (we just didn't notice because it isn't tested well enough on our buildbots, but is elsewhere). I think we really only want to use the ModuleAddressSanitizerPass (not the non-Module version), which is what Clang does. * set UseAfterScope = true | 12 November 2021, 16:34:30 UTC |
02a394d | Steven Johnson | 12 November 2021, 03:25:52 UTC | x86_cpuid_halide must preserve all 64 bits of rbx/rsi (#6409) The existing code attempts to preserve ebx (since the cpuid instruction can trash it), but it only preserves the lower 32 bits; on 64-bit systems, this (amazingly) usually works OK unless you are compiling in (e.g.) ASAN mode, which can subtly change codegen such that the full 32 bits of rbx must be preserved. I'm genuinely astonished this hasn't bitten us before now! | 12 November 2021, 03:25:52 UTC |
d763406 | Volodymyr Kysenko | 12 November 2021, 01:30:05 UTC | Change implementation of round_f* in CodeGen_C to use nearbyint to match CodeGen_LLVM (#6406) | 12 November 2021, 01:30:05 UTC |
9ff87ce | Steven Johnson | 11 November 2021, 18:04:09 UTC | _halide_buffer_crop() needs to check for runtime failures (v2) (#6403) * _halide_buffer_crop() needs to check for runtime failures (v2) (Alternate to #6402) We currently assume that _halide_buffer_crop() will never fail. This is a bad assumption, as it can call device_crop(), which can fail due to unexpected runtime errors, or from a backend simply leaving the device_crop field at the default (unimplemented) case (as is currently the case for the OGLC backend). When this happens, the dst buffer was left in an inconsistent, invalid state (which was what led to the crashes fixed by #6401). This change modifies _halide_buffer_crop() to return nullptr in the event of an error, and ensure that all cropped buffers are checked for null at the right point. (This is not optimal, of course, since the specific error returned by device_crop is getting dropped on the floor, but the existence of an error is no longer ignored.) This addresses at least some of the failure issues we are seeing in performance_async_gpu with the OpenGLCompute backend. (Also: drive-by whitespace fix in CodegenC) * Oops | 11 November 2021, 18:04:09 UTC |
d343e76 | Andrew Adams | 11 November 2021, 17:06:00 UTC | Fix obscure bug in widening let substitution (#6405) Fix obscure bug in widening let substitution | 11 November 2021, 17:06:00 UTC |
8e34a35 | Steven Johnson | 09 November 2021, 23:10:40 UTC | Remove halide_abort_if_false() usage in runtime/metal (#6398) * Remove halide_abort_if_false() usage in runtime/metal This converts all the usage of `halide_abort_if_false()` in runtime/metal into either an explicit runtime check-and-return-error-code (if the check looks plausible), or `halide_debug_assert()` (if the check seems to be stating an invariant that shouldn't be possible in well-structured code). These changes are admittedly subjective, so feedback is especially welcome. Also, driveby change to sync-common.h to use `halide_debug_assert()` rather than a local equivalent. * nits | 09 November 2021, 23:10:40 UTC |
4f70271 | Steven Johnson | 09 November 2021, 22:51:05 UTC | Add defensive checks to halide_buffer_copy_already_locked (#6401) Found while debugging crashes with performance_async_gpu for OpenGLCompute: the 'if' tree wasn't robust enough for malformed buffers being passed, and could attempt to deref and use a null src->device_interface or dst->device_interface in some cases. This patch just improves this function to return an error in these cases (rather than crashing); the fact that we are getting malformed buffers passed to us is likely a separate bug. | 09 November 2021, 22:51:05 UTC |
b189722 | Steven Johnson | 09 November 2021, 21:35:24 UTC | [hannk] Upgrade hannk to use TFLite 2.7.0 by default (#6393) * [hannk] Upgrade hannk to use TFLite 2.7.0 by default * Fix unused-vars warnings | 09 November 2021, 21:35:24 UTC |
b021f87 | Steven Johnson | 09 November 2021, 21:25:26 UTC | Move PyTorch test into standalone tests (#6397) * Move PyTorch test into standalone tests It doesn't need to be internal. Also simplified to use only public API, updated the expected correctness, and avoided the need to have cuda present on the system to test for cuda output (since we can cross-compile to generate the C++ output anywhere). * fixes * Fix Windows text file endings * Update pytorch.cpp * Update pytorch.cpp | 09 November 2021, 21:25:26 UTC |
4286c78 | Steven Johnson | 09 November 2021, 17:13:09 UTC | Drop support for LLVM11 (#6396) * Drop support for LLVM11 With Halide 13 released, we should drop support for LLVM11 in Halide trunk, since we only promise to support LLVM trunk + two releases. * Update packaging.yml * Update config.cmake * Update CMakeLists.txt | 09 November 2021, 17:13:09 UTC |
d3ea755 | Steven Johnson | 09 November 2021, 17:03:17 UTC | Fix OGLC debug builds (#6399) If you try to build and run something with `openglcompute` and `debug`, you may crash with a div-by-zero, because the openglcompute runtime never calls `halide_start_clock()`, and all implementations of `halide_current_time_ns()` assume that it has been called. On (e.g.) OSX, this results in div by zero. This fixes it by inserting the correct call into openglruntime.cpp, and also adding debug-only asserts to all the `halide_current_time_ns()` implementations. (I was tempted to fix this by removing `halide_start_clock()` entirely and just lazily initing the initial value in `halide_current_time_ns()`, but I figured that would likely get pushback...) | 09 November 2021, 17:03:17 UTC |
d6f1345 | Steven Johnson | 08 November 2021, 23:13:13 UTC | Rename halide_assert -> halide_abort_if_false (#6382) * Rename halide_assert -> HALIDDE_CHECK A crashing bug got mistakenly inserted because a new contributor (reasonably) assumed that the `halide_assert()` macro in our runtime code was like a C `assert()` (i.e., something that would vanish in optimized builds). This is not the case; it is a check that happens in all build modes and always triggers an `abort()` if it fires. We should remove any ambiguity about it, so this proposes to rename it to somethingmore like the Google/Abseil-style CHECK() macro, to make it stand out more. (We may want to do a followup to verify that all of the uses really are unrecoverable errors that aren't better handled by returning an error.) * clang-format * Fix for top-of-tree LLVM * Fix for older versions * HALIDE_CHECK -> halide_abort_if_false * Update runtime_internal.h | 08 November 2021, 23:13:13 UTC |
1312817 | Steven Johnson | 08 November 2021, 22:01:29 UTC | Clean up CodeGen_LLVM names to match ASAN nomenclature changes (#6395) | 08 November 2021, 22:01:29 UTC |
6071cf6 | Steven Johnson | 08 November 2021, 20:13:44 UTC | Check results of all runtime function calls (#6389) * Check results of all runtime function calls This cherry-picks just the changes to callsites internal to Halide (and tests) from #6388. (It doesn't attempt to annotate runtime functions to enforce checking the results.) * Update write_debug_image.cpp * Add checks + comment to buffer_copy_aottest * Add comment to gpu_object_lifetime_aottest * Update memory_profiler_mandelbrot_aottest.cpp * Update user_context_insanity_aottest.cpp * Update process.cpp | 08 November 2021, 20:13:44 UTC |
a798909 | Steven Johnson | 08 November 2021, 20:13:28 UTC | Add halide_debug_assert() macro (#6390) * Add halide_debug_assert() macro Also convert usage of halide_assert()/HALIDE_CHECK() in hashmap.h and gpu_context_common.h to halide_debug_assert(), as all the usages looked to be appropriate for debug-mode only. (Rebased version of #6385, which this replaces) * appease clang-format | 08 November 2021, 20:13:28 UTC |
c87399c | Z Stern | 05 November 2021, 03:08:18 UTC | Merge branch 'master' into factor_parallel_codegen | 05 November 2021, 03:08:18 UTC |
656c6b5 | Steven Johnson | 04 November 2021, 23:00:28 UTC | [hannk] Have CMake emit .s, .stmt, .ll files (#6392) | 04 November 2021, 23:00:28 UTC |
26ccb54 | Omar Emara | 04 November 2021, 00:35:13 UTC | Support vectorized Select in OpenGLCompute backend (#6371) The ternary operator in GLSL does not work with vector types. While the mix function have overloads to boolean vectors, it is only supported in version 4.5, so it is not exactly portable. To work around this, we use the ternary operator on all elements of the vector type. Necessary for #6348. | 04 November 2021, 00:35:13 UTC |
c005b9f | Omar Emara | 04 November 2021, 00:32:20 UTC | Support vectorization in OpenGLCompute backend (#6348) * Support vectorization in OpenGLCompute backend This patch adds support for vector load and store operations. First, a pass identifies the buffers whose loads and stores are all dense, aligned, and have the same number of lanes. Such buffers are declared with a vector base type and accessed accordingly. Loads and stores that do not satisfy those criteria are implemented as gathers and scatters from buffers whose base type is scalar. Resolves #4976. Partially resolves #1687. * Move buffer name instead of copy (clang-tidy) | 04 November 2021, 00:32:20 UTC |
657bb03 | Steven Johnson | 03 November 2021, 23:29:38 UTC | Fix for top-of-tree LLVM (#6386) * Fix for top-of-tree LLVM * Fix for older versions | 03 November 2021, 23:29:38 UTC |
76315a2 | Omar Emara | 03 November 2021, 22:54:44 UTC | Vectorize Ramp in OpenGLCompute backend (#6372) Currently, ramps are generated as a number of independent scalar expressions that are finally gathered into a vector. For instance, indexing in vectorized code is filled with ramps like the following: ``` int _11 = int(1) * int(1); int _12 = _10 + _11; int _13 = int(2) * int(1); int _14 = _10 + _13; int _15 = int(3) * int(1); int _16 = _10 + _15; ivec4 _17 = ivec4(_10, _12, _14, _16); ``` This patch simplifies the generated code using a multiply add expression on a vector containing an arithmetic expression, such that the code is as follows: ``` ivec4 _11 = ivec4(0, 1, 2, 3) * int(1) + _10; ``` This is more performant due to vectorization, more compact, and more readable because the base and the stride are easily identifiable. | 03 November 2021, 22:54:44 UTC |
2cf3afb | Steven Johnson | 03 November 2021, 22:47:03 UTC | [hannk] Fix MeanOp (#6336) * [hannk] Fix MeanOp The `reducing()` method didn't handle negative values for indices, and didn't reverse the value of the axis as we do elsewhere, so results were incorrect. Also, we now parse and save the value of `keep_dims`, though I can't find evidence that it does much of anything: test cases pass different values for it but none of them fail (even though we ignore it), and at least one reference implementation I see doesn't seem to do anything with it. * Remove keep_dims handling for MeanOp | 03 November 2021, 22:47:03 UTC |
7ec8d70 | Steven Johnson | 03 November 2021, 22:19:15 UTC | Convert various halide_assert -> static_assert (#6383) The type-size checks in d3d12compute.cpp don't need to be runtime checks. | 03 November 2021, 22:19:15 UTC |
a227440 | Steven Johnson | 03 November 2021, 20:55:28 UTC | Remove halide_assert() from halide_default_device_wrap_native (#6381) This was inserted in https://github.com/halide/Halide/pull/6310, probably mistakenly, since `halide_assert()` in the Halide runtime is *not* a debug-only assertion). Instead of a controlled runtime failure, we just abort, which is not OK. | 03 November 2021, 20:55:28 UTC |
415ce0c | Alex Reinking | 03 November 2021, 20:27:46 UTC | Fix empty INSTALL_COMMAND in hannk super-build (#6387) * Fix empty INSTALL_COMMAND in hannk super-build * Fix 3.16 missing command * Fix the fix... | 03 November 2021, 20:27:46 UTC |
0d6b0f5 | Steven Johnson | 03 November 2021, 16:18:45 UTC | Fix for top-of-tree LLVM (#6380) | 03 November 2021, 16:18:45 UTC |
ac2673b | Alex Reinking | 03 November 2021, 00:57:12 UTC | Add super-build for cross-compiling HANNK (#6374) * Add super-build for cross-compiling HANNK * Relax CMake version | 03 November 2021, 00:57:12 UTC |
6070821 | Alex Reinking | 02 November 2021, 19:42:02 UTC | Update README for Halide 13. (#6378) | 02 November 2021, 19:42:02 UTC |
5b8f473 | Volodymyr Kysenko | 02 November 2021, 15:36:19 UTC | Fix for the crash from #6367 (#6375) * Skip empty boxes * Address the comments | 02 November 2021, 15:36:19 UTC |
4225eba | Alex Reinking | 01 November 2021, 23:03:09 UTC | Add helper for cross-compiling Halide generators. (#6366) * Add helper for cross-compiling Halide generators. Created a new function, `add_halide_generator`, that helps users write correct cross-compiling builds by establishing the following convention for creating a generator named `TARGET`: 1. Define Halide generators and libraries in the same project 2. Assume two builds: a host build and a cross build. 3. When creating a generator, check to see if we can load a pre-built version of the target. 4. If so, just use it. 5. If not, make sure the full Halide package is loaded and create a target for the generator. a. If `CMAKE_CROSSCOMPILING` is set, then _warn_ the user (the variable is unreliable on macOS) that something seems fishy. b. Create export rules for the generator. It creates a package `PACKAGE_NAME` and appends to its `EXPORT_FILE`. c. Create a custom target also named `PACKAGE_NAME` for building the generators. d. Create an alias `${PACKAGE_NAMESPACE}${TARGET}`. 6. Users are expected to use the alias in conjunction `add_halide_library`. Users can test the existence of `TARGET` to determine whether a pre-built one was loaded (and set additional properties if not). 7. Setting `${PACKAGE_NAME}_ROOT` is enough to load pre-built generators. `PACKAGE_NAME` is `${PROJECT_NAME}-halide_generators` by default. `PACKAGE_NAMESPACE` is `${PROJECT_NAME}::halide_generators::` by default. `EXPORT_FILE` is `${PROJECT_BINARY_DIR}/cmake/${PACKAGE_NAME}-config.cmake` by default. Users are free to avoid the helper if it would not fit their workflow. * Make HANNK use the new add_halide_generator helper | 01 November 2021, 23:03:09 UTC |
f5ce5f3 | Steven Johnson | 01 November 2021, 20:40:36 UTC | [hannk] Clean up aliasing (v2) (#6364) * wip * [hannk] Clean up aliasing (v2) The code for aliasing tensors was janky. This cleans it up and makes a clear distinction between aliasing done to overlay buffers with crop-and-translate, vs the aliasing done when we reshape tensors. We no longer allow a given tensor to do both of these, and we give preference to Reshape aliasing first. (Cherry-picked from #6321) * Move alias_type into shared ptr | 01 November 2021, 20:40:36 UTC |
1a1c97f | Steven Johnson | 01 November 2021, 20:28:50 UTC | [hannk] Add support for building/running for wasm (#6361) * [hannk] Allow disabling TFLite+Delegate build in CMake Preparatory work for allowing building of hannk with Emscripten; TFLite (and its dependees) problematic to build in that environment, but this will allow us to build a tflite-parser-only environment. (Note that more work is needed to get this working for wasm, as crosscompiling in CMake is still pretty painful; this work was split out to make subsequent reviews simpler) * [hannk] Add support for building/running for wasm * HANNK_BUILD_TFLITE_DELEGATE -> HANNK_BUILD_TFLITE * Use explicit host build strategy for cross compiling HANNK (#6365) * Ignore local emsdk clone * Fix usage of CMAKE_BUILD_TYPE * Only print the Halide target info once per CMake run * Fix Halide "cmake" target detection for Emscripten * Prefer target_link_options to _link_libraries when applicable * Validate, rather than find, NODE_JS_EXECUTABLE (set by emsdk) * Emscripten already wraps tests with node. * Add dependency on Android logging library. * For cross-compiling, find host tools instead of recursive call. Rather than shelling out via execute_process and potentially guessing the toolchain options wrong, expect to find our host tools (i.e. generators) in a package called "hannk_tools". The package is created by the host build via the CMake export() command. Importing this package in the cross build creates IMPORTED targets with the same names as our generators. We then use these generators rather than creating generators for the target build. * Rework cross-compiling script. * Respond to (easy) reviewer comments. * Add HANNK_AOT_HOST_ONLY option. Use in script. * [hannk] tests should only process .tflite files (#6368) currently, random dotfiles (e.g. .DS_Store on OSX) can creep in, causing bogus failures * Add comment about node wrapping. * Rename hannk_tools to hannk-halide_generators * Add comment about exporting targets. * Bump version to Halide 14.0.0 (#6369) Co-authored-by: Steven Johnson <srj@google.com> Co-authored-by: Alex Reinking <alex_reinking@berkeley.edu> | 01 November 2021, 20:28:50 UTC |
69d8ef0 | Alex Reinking | 30 October 2021, 01:33:50 UTC | Bump version to Halide 14.0.0 (#6369) | 30 October 2021, 01:33:50 UTC |
3c52df1 | Steven Johnson | 29 October 2021, 23:46:50 UTC | [hannk] tests should only process .tflite files (#6368) currently, random dotfiles (e.g. .DS_Store on OSX) can creep in, causing bogus failures | 29 October 2021, 23:46:50 UTC |
541bc37 | Steven Johnson | 28 October 2021, 21:14:42 UTC | [hannk] Allow disabling TFLite+Delegate build in CMake (#6360) * [hannk] Allow disabling TFLite+Delegate build in CMake Preparatory work for allowing building of hannk with Emscripten; TFLite (and its dependees) problematic to build in that environment, but this will allow us to build a tflite-parser-only environment. (Note that more work is needed to get this working for wasm, as crosscompiling in CMake is still pretty painful; this work was split out to make subsequent reviews simpler) * Update hannk_delegate.h * HANNK_BUILD_TFLITE_DELEGATE -> HANNK_BUILD_TFLITE | 28 October 2021, 21:14:42 UTC |
e10f104 | Steven Johnson | 28 October 2021, 17:34:27 UTC | Update Emscripten settings (#6362) The settings we use to build C++ in wasm were slightly out of date now that we've updated our runtime to Node instead of d8. Also drive-by gitignore fix. | 28 October 2021, 17:34:27 UTC |
1c7388a | Andrew Adams | 28 October 2021, 17:25:58 UTC | Allow users to use their own cuda contexts and streams in JIT mode (#6345) * Deprecate JIT runtime override methods that take void * * Make it possible to use custom cuda contexts and streams in JIT mode * Clean up comments * Tolerate null handlers in the JITUserContext These can come up if a JITUserContext is passed to something like copy_to_device before getting fully populated by passing it to a call to realize. * Remove reliance on dlsym in test and reuse the runtime's name resolution mechanism instead * Handle case where cuda and cuda-debug runtime modules both exist This change means we'll only ever create one built-in cuda context in this circumstance. * Slight simplification * Improve comments | 28 October 2021, 17:25:58 UTC |
4f573bf | Volodymyr Kysenko | 28 October 2021, 02:05:29 UTC | Add missing widening_absd patterns (#6359) * Add missing widening_absd patterns * Add a comment | 28 October 2021, 02:05:29 UTC |
8f1ae2a | Steven Johnson | 27 October 2021, 20:37:00 UTC | Use Node instead of d8 for Wasm AOT testing (#6356) * Use Node instead of d8 for Wasm AOT testing This requires the right version of Node is installed on your system. Since EMSDK often puts a too-old version of Node in the path, allow overriding via an env var. * wip | 27 October 2021, 20:37:00 UTC |
34534f5 | Steven Johnson | 27 October 2021, 20:35:39 UTC | [hannk] Add missing call to Interpreter::prepare in benchmark app (#6358) | 27 October 2021, 20:35:39 UTC |
a15ffda | Volodymyr Kysenko | 27 October 2021, 01:33:39 UTC | Add include for size_t in constants.h (#6353) * Add include for size_t in constants.h * Change to int | 27 October 2021, 01:33:39 UTC |
86cb6c7 | Andrew Adams | 26 October 2021, 23:12:52 UTC | Deprecate JIT runtime override methods that take void * (#6344) * Deprecate JIT runtime override methods that take void * * Clean up comments | 26 October 2021, 23:12:52 UTC |
6211da9 | Andrew Adams | 26 October 2021, 23:12:34 UTC | Add --help flag to rungenmain, fixing #5323 (#6354) | 26 October 2021, 23:12:34 UTC |
06a37ca | Steven Johnson | 26 October 2021, 17:23:23 UTC | Add to various OpVisitors to avoid overload warnings for some compilers (#6337) | 26 October 2021, 17:23:23 UTC |
47fa87f | Steven Johnson | 26 October 2021, 17:22:49 UTC | [hannk] Add a prepare() method for ops and interp (#6338) * [hannk] Add a prepare() method for ops and interp This adds a new method to the Interpreter, and to all ops, which allows the interpreter (and each op) to do any one-time preparation for future executions. Previously this was lumped into either the Interpreter's ctor, or the Ops various other methods, but this has some nice advantages at minimal cost: - Since the new prepare() returns an error value, it allows the Interpreter to do sanity checking at startup and return an error to the caller (rather than simply crashing); this makes using it in some runtime environments less painful. - Ops can use this to prep and cache information for multiple subsequent runs; initially, Conv and DepthwiseConv use this to calculate and cache the alignment requirements they need later on. This is unlikely to be a huge performance hit, but it is likely nonzero, and As an added bonus, this means that e.g. the map_bounds() method is no longer susceptible to runtime failures from Halide bounds queries. * Update interpreter.cpp * Update transforms.cpp * Update transforms.cpp | 26 October 2021, 17:22:49 UTC |
667836d | Steven Johnson | 26 October 2021, 17:16:35 UTC | Harvest IWYU changes for LLVM, WABT (#6341) A couple of minor hygiene changes, extracted from https://github.com/halide/Halide/pull/6251: - Clean up LLVM_Headers.h to uniformly use <> instead of "" and to alphabetize properly - Clean up WABT includes to reflect what we need more accurately | 26 October 2021, 17:16:35 UTC |
b34919f | Omar Emara | 26 October 2021, 17:09:48 UTC | Fix wrong type in Ramp CodeGen for OpenGLCompute (#6349) The variable type of the Ramp in OpenGLCompute is assigned the type of the base member of the ramp, which is a scalar, while the ramp is a vector. Instead, we should use the type of the ramp instead to take vectorization into account. Partially resolves #1687. | 26 October 2021, 17:09:48 UTC |
ab57ab1 | Steven Johnson | 26 October 2021, 17:07:44 UTC | [hannk] augment SoftmaxOp to allow specifying axis (#6351) (basically equivalent to #6335 but for softmax) | 26 October 2021, 17:07:44 UTC |
50517cb | Steven Johnson | 26 October 2021, 17:07:00 UTC | [hannk] requantize() should never skip the operation (#6350) * [hannk] requantize() should never skip the operation Even if inq == outq, the incoming buffer can contain out-of-range values; we shouldn't try to optimize the op away, since it's cheap. * Update ops.cpp * Update ops.cpp | 26 October 2021, 17:07:00 UTC |
d6d7bbc | Steven Johnson | 26 October 2021, 17:05:48 UTC | Make halide_type_t and halide_type_of constexpr (#6340) * Make halide_type_t and halide_type_of constexpr This allows us to do a bit more at compile time in some cases; e.g., we can more reliably collapse things like `t == halide_type_t(int, 8)` into `t.as_u32() == literal-integer`, avoiding temporaries. It also makes it tractable to to do a `switch` on a series of `halide_type_t`, since we can now use halide_type_t::as_u32() as a constexpr. There were a number of places that did this in an ad-hoc manner previously; I updated those, and also converted at least one more repeated-if clause into a switch. (TBH, I'm not sure if I'm wild about the syntax, though; it is a bit weedy to scan. Suggestions welcome.) * Ensure there are no uninited vars in constexpr funcs * Update HalideRuntime.h | 26 October 2021, 17:05:48 UTC |
334e27a | Volodymyr Kysenko | 26 October 2021, 05:10:41 UTC | Specify template parameter of ScopedValue (#6352) | 26 October 2021, 05:10:41 UTC |