https://github.com/halide/Halide
- HEAD
- refs/heads/Halide_unsharp
- refs/heads/abadams/align_strided_const_loads
- refs/heads/abadams/alloca
- refs/heads/abadams/atomic_parallel_compiled_in
- refs/heads/abadams/atomic_vector_non_recursive
- refs/heads/abadams/averaging_tree
- refs/heads/abadams/avoid_name_mangling_in_cross_module_dependencies
- refs/heads/abadams/better_absd
- refs/heads/abadams/better_codegen_for_non_const_ramps
- refs/heads/abadams/bgu_cholesky
- refs/heads/abadams/braces_around_statements
- refs/heads/abadams/cache_tighten_producer_consumer_nodes
- refs/heads/abadams/check_reorder_dups
- refs/heads/abadams/clarify_broadcast_shuffle
- refs/heads/abadams/compositing_app
- refs/heads/abadams/cond_wait_spin
- refs/heads/abadams/cse_in_unroll_split_tuples
- refs/heads/abadams/custom_cuda_context
- refs/heads/abadams/custom_cuda_context_2
- refs/heads/abadams/custom_cuda_context_3
- refs/heads/abadams/d3d12abi
- refs/heads/abadams/deflake_mullapudi_reorder
- refs/heads/abadams/delete_prepare_for_early_exit
- refs/heads/abadams/depthwise_separable_conv
- refs/heads/abadams/diagnose_boundary_condition_failure
- refs/heads/abadams/disable_onnx_app_on_mac
- refs/heads/abadams/divide_using_pavgw
- refs/heads/abadams/dont_link_to_cudart
- refs/heads/abadams/dont_reinterpret_concat
- refs/heads/abadams/early_out
- refs/heads/abadams/enable_f16c
- refs/heads/abadams/extract_concat_bits
- refs/heads/abadams/fast_integer_divide_round_to_zero
- refs/heads/abadams/faster_runtime_integer_division
- refs/heads/abadams/faster_unroll
- refs/heads/abadams/fix-arm-seg2
- refs/heads/abadams/fix_4211
- refs/heads/abadams/fix_5323
- refs/heads/abadams/fix_5329
- refs/heads/abadams/fix_5889
- refs/heads/abadams/fix_6984
- refs/heads/abadams/fix_7229
- refs/heads/abadams/fix_7260
- refs/heads/abadams/fix_7365
- refs/heads/abadams/fix_7374
- refs/heads/abadams/fix_7504
- refs/heads/abadams/fix_7514
- refs/heads/abadams/fix_7531
- refs/heads/abadams/fix_7584
- refs/heads/abadams/fix_7584_v2
- refs/heads/abadams/fix_7742
- refs/heads/abadams/fix_7756
- refs/heads/abadams/fix_7761
- refs/heads/abadams/fix_7768
- refs/heads/abadams/fix_7786
- refs/heads/abadams/fix_7810
- refs/heads/abadams/fix_7811
- refs/heads/abadams/fix_7815
- refs/heads/abadams/fix_7867
- refs/heads/abadams/fix_7871
- refs/heads/abadams/fix_7872
- refs/heads/abadams/fix_7873
- refs/heads/abadams/fix_7888
- refs/heads/abadams/fix_7890
- refs/heads/abadams/fix_7891
- refs/heads/abadams/fix_7892
- refs/heads/abadams/fix_7893
- refs/heads/abadams/fix_7906
- refs/heads/abadams/fix_7909
- refs/heads/abadams/fix_7968
- refs/heads/abadams/fix_8038
- refs/heads/abadams/fix_8054
- refs/heads/abadams/fix_arm_fcvtmp
- refs/heads/abadams/fix_autoschedule_feature_transposition
- refs/heads/abadams/fix_cse_name_collisions
- refs/heads/abadams/fix_cuda_mat_mul_assert
- refs/heads/abadams/fix_deinterleave_bug
- refs/heads/abadams/fix_deinterleave_for_reinterpret
- refs/heads/abadams/fix_div_round_to_zero
- refs/heads/abadams/fix_fft_compile_time_regression
- refs/heads/abadams/fix_generate_output_snippets
- refs/heads/abadams/fix_if_nesting_condition
- refs/heads/abadams/fix_leaks_in_memoize_test
- refs/heads/abadams/fix_lgtm_warnings
- refs/heads/abadams/fix_links_to_master
- refs/heads/abadams/fix_load_of_broadcast
- refs/heads/abadams/fix_lossless_cast_of_sub
- refs/heads/abadams/fix_onnx_app
- refs/heads/abadams/fix_pointless_lower_condition
- refs/heads/abadams/fix_potential_gpu_deadlock
- refs/heads/abadams/fix_realize_condition_depends_on_tuple
- refs/heads/abadams/fix_reduce_expr_modulo_of_vector
- refs/heads/abadams/fix_riscv_vx_vi
- refs/heads/abadams/fix_round
- refs/heads/abadams/fix_stencil_chain_gpu_schedule
- refs/heads/abadams/fix_track_bounds_intervals
- refs/heads/abadams/fix_tutorial_2
- refs/heads/abadams/forward_partition_methods
- refs/heads/abadams/fully_fused_depthwise_separable_conv
- refs/heads/abadams/fuzz_sliding_window
- refs/heads/abadams/gaussian_blur_app
- refs/heads/abadams/generator_infinite_default_timeout
- refs/heads/abadams/gpu_autoscheduler_parallel_random_probes
- refs/heads/abadams/include_riscv_in_readme
- refs/heads/abadams/interleave_nested_vector
- refs/heads/abadams/ir_match_by_ref
- refs/heads/abadams/lerp_plus_cast
- refs/heads/abadams/local_laplacian_code_size
- refs/heads/abadams/lower_halving_sub
- refs/heads/abadams/lower_rounding_shift_right
- refs/heads/abadams/mac-arm-fixes
- refs/heads/abadams/make_fast_inverse_test_throughput_limited
- refs/heads/abadams/makefile_serialization_support
- refs/heads/abadams/mismatched_new_delete
- refs/heads/abadams/mixed_sign_mul_shift_right
- refs/heads/abadams/mixed_width_mul_shift_right
- refs/heads/abadams/multiple_scatter
- refs/heads/abadams/mux_intrinsic
- refs/heads/abadams/name_helpers
- refs/heads/abadams/narrow_predicates
- refs/heads/abadams/nested_vectorization_compile_time_regression_fix
- refs/heads/abadams/nested_vectorization_tweaks
- refs/heads/abadams/parallel_simd_op_check
- refs/heads/abadams/per_instance_profiling
- refs/heads/abadams/precompute_shared_mem_size
- refs/heads/abadams/prefer_no_gather
- refs/heads/abadams/print_uncaught_exception
- refs/heads/abadams/promote_fixed_point_intrinsics
- refs/heads/abadams/psabdw
- refs/heads/abadams/random_pipelines
- refs/heads/abadams/rationalize_gpu_for_loop_names
- refs/heads/abadams/reenable_unscheduled_stage_warning
- refs/heads/abadams/reinterpret_vector
- refs/heads/abadams/remove_arch_os_for_shaders
- refs/heads/abadams/remove_bad_pruning
- refs/heads/abadams/remove_parameter_self_references
- refs/heads/abadams/remove_readnone_on_functions
- refs/heads/abadams/remove_use_of_python_config_in_onnx_makefile
- refs/heads/abadams/reschedule_bgu
- refs/heads/abadams/reschedule_bilateral_grid
- refs/heads/abadams/rewrite_atomic_pass
- refs/heads/abadams/rounding_shift_right_use_average
- refs/heads/abadams/rungenmain_error
- refs/heads/abadams/sampling_profiler_overhead_v2
- refs/heads/abadams/scope_improvements
- refs/heads/abadams/simpler_broadcasts
- refs/heads/abadams/simplify_correlated_pyramid
- refs/heads/abadams/siotas_20
- refs/heads/abadams/sioutas_20
- refs/heads/abadams/slide_over_split_loop
- refs/heads/abadams/sorting_network_working_branch
- refs/heads/abadams/stable_topological_order
- refs/heads/abadams/string_view
- refs/heads/abadams/strip_asserts_last
- refs/heads/abadams/switch_stmt
- refs/heads/abadams/target_specific_lerp
- refs/heads/abadams/time_lowering_passes
- refs/heads/abadams/track_failedness_through_solver_lets
- refs/heads/abadams/turn_off_slp_vectorization_for_avx512
- refs/heads/abadams/tweak_unpack_buffers
- refs/heads/abadams/undo_pointless_widening
- refs/heads/abadams/unordered_blocks
- refs/heads/abadams/unsigned_demosaic
- refs/heads/abadams/update_makefile_for_llvm_19
- refs/heads/abadams/use_arm_for_runtime_triple
- refs/heads/abadams/use_pmaddubsw_for_downsample
- refs/heads/abadams/validate_gpu_schedules
- refs/heads/abadams/vector_reduce_hexagon_predicate
- refs/heads/abadams/vector_scan
- refs/heads/abadams/vst_type_fix
- refs/heads/abadams/widening_let_bug
- refs/heads/abadams/x86_avg
- refs/heads/abadams/zen4
- refs/heads/adadams/profile_allocator
- refs/heads/add_image_checks_after_bounds_inference_plus_new_rules
- refs/heads/add_outermost_to_extern
- refs/heads/add_vectorization_to_search_space
- refs/heads/aelphy/feature_cadence_changes
- refs/heads/aelphy/float_extracts
- refs/heads/align_loads_comment_fix
- refs/heads/alina-strided-store
- refs/heads/another_buffer_copy_fix
- refs/heads/arm_sve_redux
- refs/heads/ataei-block_asserts-codegen
- refs/heads/ataei-debug_info
- refs/heads/ataei-fix-pow
- refs/heads/ataei-gen_str_param
- refs/heads/ataei-implicit_lhs_vars
- refs/heads/ataei-onnx
- refs/heads/ataei-onnx_converter_update
- refs/heads/ataei-onnx_pybind
- refs/heads/ataei-resnet50_benchmarks
- refs/heads/ataei-standalone_autoscheduler
- refs/heads/ataei_lots_of_inputs
- refs/heads/auto_sched_benchmarks
- refs/heads/auto_sched_estimates
- refs/heads/auto_sched_inline
- refs/heads/auto_sched_test_notparallel
- refs/heads/autoschedule_top_down
- refs/heads/autoschedule_with_convnet
- refs/heads/autoscheduler_scalar_imageparam_fix
- refs/heads/backports/10.x
- refs/heads/backports/11.x
- refs/heads/backports/12.x
- refs/heads/backports/13.x
- refs/heads/balance_expressions
- refs/heads/bazel
- refs/heads/benchmarks
- refs/heads/blaze
- refs/heads/bounds_buffer_lets_fix
- refs/heads/bounds_correct_vs_bounds_loaded_reduced
- refs/heads/buffer_device_api_target
- refs/heads/bug_device_free
- refs/heads/bug_inline_unbounded
- refs/heads/build/fix-xcode-2
- refs/heads/build/manylinux-fixes
- refs/heads/circ_buffer
- refs/heads/cmake-no-runtime-debug-symbols
- refs/heads/cmake/asan
- refs/heads/cmake/deps-cleanup
- refs/heads/cmake/find-modules
- refs/heads/cmake/spirv
- refs/heads/cmake_wasm_features
- refs/heads/compute_at_guard_with_if_goes_on_stack
- refs/heads/compute_with_at
- refs/heads/compute_with_check
- refs/heads/compute_with_excessive_bounds
- refs/heads/compute_with_inlined
- refs/heads/compute_with_remove_is_right_level
- refs/heads/cpack/nuget
- refs/heads/ctest/wrappers
- refs/heads/cuda-constant
- refs/heads/d3d12-allocation-cache
- refs/heads/deferred_cse_after_inlining
- refs/heads/destructor_calls_deinit
- refs/heads/dg/deserialize_unmapped_objects
- refs/heads/dg/fix_vulkan_codegen_bool_conversion
- refs/heads/dg/vulkan_conform_api
- refs/heads/dg/vulkan_region_allocator_fixes
- refs/heads/dgerstmann/fix-vulkan-memory-config-init
- refs/heads/disable_acquire_release_test_vulkan
- refs/heads/distinct_wrapper_names
- refs/heads/dkg/6863_asan_fixes
- refs/heads/dkg/vulkan
- refs/heads/dpalermo_dmabuf
- refs/heads/dpalermo_dmabuf_libion
- refs/heads/dpalermo_hexagon_remote_202003
- refs/heads/dpalermo_sdk4_2_0_2
- refs/heads/ds/buffer-get-pure
- refs/heads/ds/opt-tile-size
- refs/heads/ds/tail-none
- refs/heads/ds/while
- refs/heads/dsharletg/bitwise-intrinsics
- refs/heads/dsharletg/find-vector-reduce
- refs/heads/dsharletg/jit-optimization
- refs/heads/dsharletg/memcpy-copy_from
- refs/heads/dsharletg/pattern-headroom
- refs/heads/dsharletg/refactor-host-alignment
- refs/heads/dsharletg/runtime-size
- refs/heads/dsharletg/simplify-abs
- refs/heads/dsharletg/simplify-type-bounds
- refs/heads/dsharletg/specialize-bounds
- refs/heads/dsharletg/upsample-channels
- refs/heads/empty_prefetch
- refs/heads/emscripten_vector_fix
- refs/heads/export_all-wsmoses
- refs/heads/expr_auto_sched
- refs/heads/extern_bugs
- refs/heads/extern_host_alloc
- refs/heads/factor_parallel_codegen_hack
- refs/heads/fast_sync_tsan
- refs/heads/faster_integer_division
- refs/heads/feature/apps-external
- refs/heads/feature/cmake-presets
- refs/heads/feature/convert
- refs/heads/feature/f16_interleave
- refs/heads/feature/gather_load_q7
- refs/heads/feature/llvm-codemodel
- refs/heads/feature/load_predicated
- refs/heads/feature/luma_regression
- refs/heads/feature/maintanence
- refs/heads/feature/reinterprets
- refs/heads/feature/tcm_bump_allocator
- refs/heads/feature/xtensa_fix_interleave_q8
- refs/heads/feature/xtensa_q8_tests
- refs/heads/find_intrinsics_issue
- refs/heads/find_intrinsics_widening_lets
- refs/heads/fix-floated-pure-stage
- refs/heads/fix-race-condition
- refs/heads/fix_hexagon_alignment
- refs/heads/fix_hvx_intrinsics
- refs/heads/fix_prefetch_test
- refs/heads/fix_windows_vs15_build
- refs/heads/fixed_length_vectors
- refs/heads/fixed_point_local_laplac
- refs/heads/gemmlowp
- refs/heads/generate
- refs/heads/gha/pip
- refs/heads/gpu_canon_fix
- refs/heads/halide_ir_flatbuffer
- refs/heads/hex_dma2_async
- refs/heads/hexagon_le_runtime
- refs/heads/hexagon_priority
- refs/heads/hexagon_setpriority
- refs/heads/hexagon_strided_pred_load
- refs/heads/hexagon_sysmon_markers
- refs/heads/imaging-synthesis
- refs/heads/includes_fix
- refs/heads/ios_fast_sync_fix
- refs/heads/jia-kai-fix-runtime-cuda-init
- refs/heads/kamil-openglcompute-infinity
- refs/heads/kamil/name_pthread_workers
- refs/heads/kp_bit_shift
- refs/heads/line_buffer
- refs/heads/loop_carry_not_working
- refs/heads/lower_on_huge_stack
- refs/heads/main
- refs/heads/master
- refs/heads/memoize_with_extents
- refs/heads/metal_float16
- refs/heads/metaprogrammed_simplifier_mod
- refs/heads/mohamedadaly-vmlal
- refs/heads/more_powerful_sliding
- refs/heads/new_autoschedule_with_new_simplifier_arm_worker_branch
- refs/heads/new_autoscheduler
- refs/heads/new_simplifier_rule_testing
- refs/heads/newer_ion_ioctl
- refs/heads/no_bounds_query_when_bounds_used
- refs/heads/opengl_compute_buffer_types_fix
- refs/heads/openglcompute_reuse_shared_allocations
- refs/heads/optmize_reorder
- refs/heads/par_for_opt
- refs/heads/pdb/fix_7806
- refs/heads/pdb/hexagon_remote_cmake
- refs/heads/pdb_add_libcpp_makefile_inc
- refs/heads/pdb_eliminate_interleaves_test
- refs/heads/pdb_fix_clang_build
- refs/heads/pdb_fix_install_qc
- refs/heads/pdb_fix_loop_carry
- refs/heads/pdb_fix_simd_op_check_hvx
- refs/heads/pdb_mul_div_mod_multi_thread
- refs/heads/pdb_remove_hvx_v64
- refs/heads/perform_inline_with_order
- refs/heads/pr/2572
- refs/heads/pr/2676
- refs/heads/pr/2975
- refs/heads/pr/3017
- refs/heads/pr/3081
- refs/heads/pr/3387
- refs/heads/pr/3939
- refs/heads/pr/3960
- refs/heads/pr/4380
- refs/heads/pr/4414
- refs/heads/pr/5331
- refs/heads/pr/5438
- refs/heads/pr/5455
- refs/heads/pr/5758_2
- refs/heads/predicated_vector
- refs/heads/prefetch_specialize
- refs/heads/print_schedule
- refs/heads/profile_hardware_counters
- refs/heads/random-pipelines
- refs/heads/rdom_with_pure_vars
- refs/heads/readme-fix-gcd
- refs/heads/realization_order
- refs/heads/refactor_module
- refs/heads/register_promotion
- refs/heads/release/10.x
- refs/heads/release/11.x
- refs/heads/release/12.x
- refs/heads/release/13.x
- refs/heads/release/14.x
- refs/heads/release/15.x
- refs/heads/release/16.x
- refs/heads/release/17.x
- refs/heads/release/8.x
- refs/heads/remove_max_on_fuse_factor
- refs/heads/reorder_rvar
- refs/heads/reset_unique_counter
- refs/heads/revert-3612-ataei-speedup_compiletime
- refs/heads/revert-7009-rootjalex/distribute-w_shl
- refs/heads/revert-7601-compile_hexagon_remote
- refs/heads/riscv_update
- refs/heads/rl_simplifier_rules
- refs/heads/rootjalex/add_simpl_rules
- refs/heads/rootjalex/arm-optimize
- refs/heads/rootjalex/autoscheduler_mcts
- refs/heads/rootjalex/bounds-rewriter
- refs/heads/rootjalex/bounds_synthesis
- refs/heads/rootjalex/cbounds
- refs/heads/rootjalex/cbounds_predicated
- refs/heads/rootjalex/fix-sat-overflow
- refs/heads/rootjalex/fix_estimate_issue
- refs/heads/rootjalex/fix_failed_unrolls
- refs/heads/rootjalex/gsoc_codegen
- refs/heads/rootjalex/improve_cbounds_fixed
- refs/heads/rootjalex/improve_constant_bounds
- refs/heads/rootjalex/pitchfork-arm
- refs/heads/rootjalex/reinterpret-simplify
- refs/heads/rootjalex/rts
- refs/heads/rootjalex/super_simplify_bounds
- refs/heads/rootjalex/test_cbounds_fixed
- refs/heads/rootjalex/test_constant_bounds
- refs/heads/rootjalex/trs-codegen
- refs/heads/rootjalex/trs-codegen-cross
- refs/heads/rootjalex/trs-merge
- refs/heads/rootjalex/uint32-int32-cast
- refs/heads/rootjalex/x86-hadds
- refs/heads/rootjalex/x86-optimize
- refs/heads/rootjalex/x86-optimize-test
- refs/heads/rootjalex/x86-sat
- refs/heads/rootjalex/x86-test
- refs/heads/rule_removal_experiments
- refs/heads/schedule-output-storage
- refs/heads/separate_bounds_query_entrypoint
- refs/heads/shallow
- refs/heads/shift_amount_type_change
- refs/heads/shoaibkamil/cmake-without-arm
- refs/heads/shoaibkamil/correct_memory_fences
- refs/heads/shoaibkamil/d3d-fixes
- refs/heads/shoaibkamil/deprecate_openglcompute
- refs/heads/shoaibkamil/json
- refs/heads/shoaibkamil/llvm_clone_tag
- refs/heads/shoaibkamil/minor-vcpkg-doc-change
- refs/heads/shoaibkamil/opengl_compute_tests
- refs/heads/shoaibkamil/performance_tests_as_generators
- refs/heads/shoaibkamil/rule_removal_experiments
- refs/heads/shoaibkamil/super_simplify_with_interpreter
- refs/heads/shoaibkamil/windows-arm-fix-attributes
- refs/heads/sim_shlib_addr_print
- refs/heads/simplify-nested-broadcasts
- refs/heads/simplify-vectorreduce-shuffles2
- refs/heads/simplify_mod
- refs/heads/sioutas_2020
- refs/heads/sioutas_2020_autoscheduler
- refs/heads/slomp/gpu-codegen-profiling
- refs/heads/slomp/msvc-static-analysis
- refs/heads/solve_div
- refs/heads/solve_div_master
- refs/heads/solve_div_simplifier_test
- refs/heads/sr/python-late-binding-defaults
- refs/heads/srj-aaa
- refs/heads/srj-alloc
- refs/heads/srj-alloca
- refs/heads/srj-appmake2
- refs/heads/srj-armv83a
- refs/heads/srj-aslog
- refs/heads/srj-assert
- refs/heads/srj-assoc
- refs/heads/srj-auto-multi
- refs/heads/srj-auto-multi2
- refs/heads/srj-auto_schedule_mat_mul
- refs/heads/srj-autosched
- refs/heads/srj-b2cpphide
- refs/heads/srj-barr
- refs/heads/srj-bits
- refs/heads/srj-blacklist
- refs/heads/srj-bounds
- refs/heads/srj-bufcalltype
- refs/heads/srj-bufcallwrap
- refs/heads/srj-bufcallwrap2
- refs/heads/srj-buffer
- refs/heads/srj-bv
- refs/heads/srj-classic-autotune
- refs/heads/srj-clean
- refs/heads/srj-constcall
- refs/heads/srj-crosscompile
- refs/heads/srj-ctlz
- refs/heads/srj-cvec-patch
- refs/heads/srj-dag
- refs/heads/srj-debug-to-file
- refs/heads/srj-deir
- refs/heads/srj-f16
- refs/heads/srj-fp16
- refs/heads/srj-fsch
- refs/heads/srj-fthru
- refs/heads/srj-g2
- refs/heads/srj-g3
- refs/heads/srj-gha-test-fixes
- refs/heads/srj-hidden
- refs/heads/srj-hide2
- refs/heads/srj-hvx
- refs/heads/srj-hvx-bug
- refs/heads/srj-hvx-codegen-bug
- refs/heads/srj-hvx-nocopy
- refs/heads/srj-hvxshift
- refs/heads/srj-iib
- refs/heads/srj-initshape
- refs/heads/srj-inv
- refs/heads/srj-ir
- refs/heads/srj-irmut2
- refs/heads/srj-iwyu
- refs/heads/srj-iwyu3
- refs/heads/srj-javascript_work_in_progress
- refs/heads/srj-lensblur
- refs/heads/srj-lessinc
- refs/heads/srj-llvm-loop-opt
- refs/heads/srj-mak
- refs/heads/srj-maxthreads
- refs/heads/srj-mod
- refs/heads/srj-msan
- refs/heads/srj-msan-call
- refs/heads/srj-muldivmod
- refs/heads/srj-mut
- refs/heads/srj-outputs-2
- refs/heads/srj-parse
- refs/heads/srj-pch
- refs/heads/srj-printfunc
- refs/heads/srj-pygp
- refs/heads/srj-revertbits
- refs/heads/srj-schedule-storage
- refs/heads/srj-shl-shr-2
- refs/heads/srj-sio
- refs/heads/srj-static-const
- refs/heads/srj-strided-store
- refs/heads/srj-tidyh
- refs/heads/srj-tiff
- refs/heads/srj-trace
- refs/heads/srj-tutorial
- refs/heads/srj-using
- refs/heads/srj-wasmfix
- refs/heads/srj-xor2
- refs/heads/srj/abstract-gen-without-get-output-func-KEEP
- refs/heads/srj/aligned-alloc
- refs/heads/srj/aligned-alloc-2
- refs/heads/srj/aligned-malloc-with-aligned-alloc
- refs/heads/srj/all-explicit-ctor
- refs/heads/srj/anderson-thread-info-ptr
- refs/heads/srj/aot-perf
- refs/heads/srj/argv-signatures
- refs/heads/srj/argv-types
- refs/heads/srj/async-test
- refs/heads/srj/b2cpp-const-data
- refs/heads/srj/better-xt-dispatch
- refs/heads/srj/bfloat1
- refs/heads/srj/bp
- refs/heads/srj/build_halide_h
- refs/heads/srj/c-bool
- refs/heads/srj/cache-clear
- refs/heads/srj/clang-fmt-ignore
- refs/heads/srj/clang-tidy
- refs/heads/srj/clear-c-cache
- refs/heads/srj/cmake-asan
- refs/heads/srj/cmake-asan2
- refs/heads/srj/cmake-jit-generators
- refs/heads/srj/configure-cmake
- refs/heads/srj/cpp-generator-v2-experiment-KEEP
- refs/heads/srj/crosscompile
- refs/heads/srj/ctad
- refs/heads/srj/depr
- refs/heads/srj/deprecation
- refs/heads/srj/device-copy
- refs/heads/srj/example
- refs/heads/srj/experiment
- refs/heads/srj/experiment-6967
- refs/heads/srj/exporting
- refs/heads/srj/expr_t
- refs/heads/srj/external-tensors
- refs/heads/srj/fix-pytorch
- refs/heads/srj/fixed-rollback
- refs/heads/srj/fopen-fix
- refs/heads/srj/forward
- refs/heads/srj/forward-name
- refs/heads/srj/gen-func
- refs/heads/srj/gen-func-2
- refs/heads/srj/gen-func-3
- refs/heads/srj/gen2-1
- refs/heads/srj/gen_closure
- refs/heads/srj/generator_aot_gpu_multi_context_threaded
- refs/heads/srj/globals
- refs/heads/srj/halide-buffer-crop
- refs/heads/srj/halide-malloc-alignment
- refs/heads/srj/halide-must-use
- refs/heads/srj/halide-runtime-must-use-result
- refs/heads/srj/hang-repro
- refs/heads/srj/hannk
- refs/heads/srj/hannk-aliasing
- refs/heads/srj/hannk-error-checking
- refs/heads/srj/hannk-errors
- refs/heads/srj/hannk-inplace
- refs/heads/srj/hannk-mmap
- refs/heads/srj/hannk-tflite-27
- refs/heads/srj/hannk-verbosity
- refs/heads/srj/hdrs
- refs/heads/srj/html-becomes-viz
- refs/heads/srj/implicit-mult-widening
- refs/heads/srj/issue-7076
- refs/heads/srj/iwyu
- refs/heads/srj/iwyu-2
- refs/heads/srj/iwyu-6
- refs/heads/srj/libHANNK
- refs/heads/srj/llvm_type_of
- refs/heads/srj/maybe-unused
- refs/heads/srj/meanop
- refs/heads/srj/metadata-calling-convention
- refs/heads/srj/more-tidy
- refs/heads/srj/msan-dtf
- refs/heads/srj/multimeta
- refs/heads/srj/nanobind
- refs/heads/srj/new-rt-1
- refs/heads/srj/no-threadpool
- refs/heads/srj/no-timeout-thread
- refs/heads/srj/oglc-mutexed
- refs/heads/srj/param-map
- refs/heads/srj/pip-15.x
- refs/heads/srj/pip-cron
- refs/heads/srj/possible-uninited
- refs/heads/srj/pr-7566
- refs/heads/srj/printer-size
- refs/heads/srj/profiler-data-race
- refs/heads/srj/ptr-int-cast
- refs/heads/srj/pyapps
- refs/heads/srj/pyext-fix
- refs/heads/srj/pygen-class
- refs/heads/srj/pygen-deux
- refs/heads/srj/pygen-func
- refs/heads/srj/pygen-native-types
- refs/heads/srj/pyinstall
- refs/heads/srj/pypi-try
- refs/heads/srj/pystuff
- refs/heads/srj/python-buffer-unpack
- refs/heads/srj/python-tutorial
- refs/heads/srj/reshape
- refs/heads/srj/rt-error-smallify
- refs/heads/srj/rt-return-types
- refs/heads/srj/runtime-error-handling
- refs/heads/srj/sat-fixes-exp
- refs/heads/srj/sat-fixes-exp-2
- refs/heads/srj/shadow-field
- refs/heads/srj/snprintf
- refs/heads/srj/spirv-license
- refs/heads/srj/stat-buf-deprecations
- refs/heads/srj/static-buffer-generators
- refs/heads/srj/stmt-html
- refs/heads/srj/stringify
- refs/heads/srj/synth-gen-params
- refs/heads/srj/synth-params-python
- refs/heads/srj/test-arm_sve_redux
- refs/heads/srj/test-intrinsics-bounds
- refs/heads/srj/test8076
- refs/heads/srj/test8078
- refs/heads/srj/test8094
- refs/heads/srj/test8105a
- refs/heads/srj/test8115
- refs/heads/srj/test_tmpdir_fix
- refs/heads/srj/tidy
- refs/heads/srj/tidy-format-14
- refs/heads/srj/tidymore
- refs/heads/srj/tidymore2
- refs/heads/srj/tls
- refs/heads/srj/tls-3
- refs/heads/srj/tls-4
- refs/heads/srj/tls-ucon
- refs/heads/srj/tmp-unschedule-experiment
- refs/heads/srj/tot-fix
- refs/heads/srj/try-revert-sat
- refs/heads/srj/type-traits
- refs/heads/srj/typed-func
- refs/heads/srj/ucon-all-const
- refs/heads/srj/ucon-non-const
- refs/heads/srj/visit-warnings
- refs/heads/srj/wasm-atomic2
- refs/heads/srj/wasm-simd
- refs/heads/srj/wasm-stuff
- refs/heads/srj/wasm-threads
- refs/heads/srj/wasm-updates
- refs/heads/srj/wasm-work
- refs/heads/srj/wip
- refs/heads/srj/x-rounding
- refs/heads/srj/xbuf
- refs/heads/srj/xc+plus+size+tmp
- refs/heads/srj/xc-types
- refs/heads/srj/xt-uint-cast-test
- refs/heads/srj/xtensa-arch
- refs/heads/srj/xtensa-merge
- refs/heads/srj/xvc-experimetn
- refs/heads/srj/zlib-embed
- refs/heads/standalone_autoscheduler
- refs/heads/standalone_autoscheduler_arm_worker
- refs/heads/standalone_autoscheduler_arm_worker_amazon
- refs/heads/standalone_autoscheduler_gpu
- refs/heads/standalone_autoscheduler_hexagon
- refs/heads/sticky_task_assignments
- refs/heads/store_with
- refs/heads/store_with_solver_for_super_simplify
- refs/heads/strict_float_cse_fix
- refs/heads/super_simplify
- refs/heads/super_simplify_v2
- refs/heads/super_simplify_v3
- refs/heads/transitive_wrapper
- refs/heads/trigger-release-v16
- refs/heads/tzumao-autodiff-boundarycond
- refs/heads/tzumao-gradient-autoscheduler-bug
- refs/heads/tzumao-predicate-store-load
- refs/heads/tzumao-python-buffer
- refs/heads/tzumao_autodiff_unbounded
- refs/heads/tzumao_improve_gradient_autoscheduler
- refs/heads/tzumao_issue_4297
- refs/heads/tzumao_licm_before_BI
- refs/heads/unbounded_bugs
- refs/heads/undo_async_copy_chain_black_list
- refs/heads/use_string_literals_for_blobs
- refs/heads/users/lukas/python-pip
- refs/heads/validate_sched_error_msg
- refs/heads/var_ir_fix
- refs/heads/vksnk/async-experiment
- refs/heads/vksnk/async-multiple-producers
- refs/heads/vksnk/async-order
- refs/heads/vksnk/better-loop-carry
- refs/heads/vksnk/better-message
- refs/heads/vksnk/bound-storage
- refs/heads/vksnk/bounds-widen-right
- refs/heads/vksnk/c-print-type
- refs/heads/vksnk/c-round
- refs/heads/vksnk/check-return-result
- refs/heads/vksnk/compute-with-bug
- refs/heads/vksnk/compute_with_async
- refs/heads/vksnk/dma-limit-channels
- refs/heads/vksnk/dma-min-max
- refs/heads/vksnk/expr-match-shuffle
- refs/heads/vksnk/extract-from-scalar
- refs/heads/vksnk/f16-load
- refs/heads/vksnk/fix-packvr
- refs/heads/vksnk/fix_halide_xtensa_narrow_with_rounding_shift_i16
- refs/heads/vksnk/fused-compute-with
- refs/heads/vksnk/hoist-storage-bug
- refs/heads/vksnk/lerp-intrinsics
- refs/heads/vksnk/lower-signed-shifts
- refs/heads/vksnk/missing-exception
- refs/heads/vksnk/non-widening-halves
- refs/heads/vksnk/optimize-shuffles
- refs/heads/vksnk/replace-all
- refs/heads/vksnk/restrict
- refs/heads/vksnk/roll-buffer
- refs/heads/vksnk/roundeven-arm
- refs/heads/vksnk/rvar-bounds
- refs/heads/vksnk/simplify-slice
- refs/heads/vksnk/skip-semaphores
- refs/heads/vksnk/storage-folding
- refs/heads/vksnk/strided-load-of-4_2
- refs/heads/vksnk/typed-scope
- refs/heads/vksnk/update-simd-driver
- refs/heads/vksnk/vectorize-bug
- refs/heads/vksnk/vectorize-scalarize
- refs/heads/vksnk/widening_absd
- refs/heads/vksnk/xtensa-codegen-fp16
- refs/heads/vksnk/xtensa-dma-improvements
- refs/heads/vksnk/xtensa-regroup-pass
- refs/heads/vksnk/xtensa/lift-allocs
- refs/heads/vulkan
- refs/heads/vulkan-diagnose-alloc-failures
- refs/heads/vulkan-phase0-adts
- refs/heads/vulkan-phase1-spirv
- refs/heads/vulkan-phase2-runtime
- refs/heads/vulkan2
- refs/heads/vulkan_fix_gpu_dynamic_shared_test
- refs/heads/vulkan_fix_subregion_memory_offsets
- refs/heads/webassembly-old
- refs/heads/winograd
- refs/heads/wording_fix
- refs/heads/xtensa-codegen
- refs/heads/xtensa-codegen-parallel
- refs/heads/xuanda/fix-serialize-bad-partition-always
- refs/remotes/origin/rootjalex/add_autosched_caching
- refs/tags/release_2018_02_15
- refs/tags/release_2019_08_27
- refs/tags/release_8.0.0
- refs/tags/v10.0.0
- refs/tags/v10.0.1
- refs/tags/v11.0.0
- refs/tags/v11.0.1
- refs/tags/v12.0.0
- refs/tags/v12.0.1
- refs/tags/v13.0.0
- refs/tags/v13.0.1
- refs/tags/v13.0.2
- refs/tags/v13.0.3
- refs/tags/v13.0.4
- refs/tags/v14.0.0
- refs/tags/v15.0.0
- refs/tags/v15.0.1
- refs/tags/v16.0.0
- refs/tags/v17.0.0
- refs/tags/v17.0.1
- refs/tags/v8.0.0
Take a new snapshot of a software origin
If the archived software origin currently browsed is not synchronized with its upstream version (for instance when new commits have been issued), you can explicitly request Software Heritage to take a new snapshot of it.
Use the form below to proceed. Once a request has been submitted and accepted, it will be processed as soon as possible. You can then check its processing state by visiting this dedicated page.Processing "take a new snapshot" request ...
Permalinks
To reference or cite the objects present in the Software Heritage archive, permalinks based on SoftWare Hash IDentifiers (SWHIDs) must be used.
Select below a type of object currently browsed in order to display its associated SWHID and permalink.
Revision | Author | Date | Message | Commit Date |
---|---|---|---|---|
5db62a0 | Andrew Adams | 21 August 2023, 21:26:29 UTC | Add test | 21 August 2023, 21:26:29 UTC |
21fa8b5 | Andrew Adams | 17 August 2023, 16:15:27 UTC | slice IRMatcher should only match on slices Fixes #7768 | 17 August 2023, 16:15:27 UTC |
df4c981 | Xuanda Yang | 27 July 2023, 15:13:02 UTC | Throw an erorr if split is called with the same older and inner var name (#7715) * throw an erorr if split is called with the same older and inner name * update * fix naming * rewording * add test --------- Co-authored-by: Steven Johnson <srj@google.com> | 27 July 2023, 15:13:02 UTC |
09c5d1d | Steven Johnson | 26 July 2023, 22:25:43 UTC | Default WITH_TEST_FUZZ to OFF (#7695) * Fix for top-of-tree LLVM * Default WITH_TEST_FUZZ to OFF Just because our compiler supports fuzzing doesn't mean we want to build the fuzz tests, because they won't really build properly without the right preset specified. (This will be followed up with a change to the buildbot to set WITH_TEST_FUZZ to ON for fuzz tests) | 26 July 2023, 22:25:43 UTC |
bfc26cc | Martijn Courteaux | 26 July 2023, 22:03:52 UTC | Improved profiler result printing. (#7709) * Fixed the regularization for BGU. * Improved profiler result printing. * Clang-format ain't liking pretty code. * Clang-tidy ain't liking pretty code. --------- Co-authored-by: Steven Johnson <srj@google.com> | 26 July 2023, 22:03:52 UTC |
5749d8c | Steven Johnson | 26 July 2023, 20:51:48 UTC | Upgrade Halide main branch for LLVM18 (#7710) LLVM just added `release/17.x` branch and now trunk is 18 -- update our build files and docs accordingly (see also https://github.com/halide/build_bot/pull/248, which needs to land first) | 26 July 2023, 20:51:48 UTC |
c9bf3b1 | Andrew Adams | 25 July 2023, 20:29:57 UTC | Fix float16 warning for older clangs (#7701) | 25 July 2023, 20:29:57 UTC |
f41c392 | Andrew Adams | 25 July 2023, 20:25:15 UTC | Fix leaks caused by self-referential parameter constraints (#7700) * Fix leaks caused by self-referential parameter constraints * Add comment * Add missing overrides * Use const refs for non-mutated args | 25 July 2023, 20:25:15 UTC |
ab3ff3a | Steven Johnson | 25 July 2023, 20:24:20 UTC | Mark all single-arg ctors in src/runtime as explicit (#7707) Minor code hygiene fix, done as byproduct of #7704 | 25 July 2023, 20:24:20 UTC |
df902e7 | Steven Johnson | 25 July 2023, 19:14:12 UTC | Mark all single-arg ctors in autoscheduler code as `explicit` (#7704) explicit ctors | 25 July 2023, 19:14:12 UTC |
fd9bfc8 | Xuanda Yang | 24 July 2023, 21:45:47 UTC | Fix clang and llvm versions in scripts (#7702) * fix clangng+llvm versions in files * more fixes | 24 July 2023, 21:45:47 UTC |
ce16f91 | Martijn Courteaux | 24 July 2023, 18:22:51 UTC | Fixed the regularization for BGU. (#7684) Co-authored-by: Steven Johnson <srj@google.com> | 24 July 2023, 18:22:51 UTC |
943bc5f | Steven Johnson | 24 July 2023, 18:19:50 UTC | Convert error to warning (#7698) Accidentally checked in #7697 with the failure mode as error, not warning | 24 July 2023, 18:19:50 UTC |
128bcdf | Steven Johnson | 24 July 2023, 17:44:03 UTC | Add a warning if a Generator declares any Outputs before the final Input (Fixes #7669) (#7697) * Add a warning if a Generator declares any Outputs before the final Input (Fixes #7669) See https://github.com/halide/Halide/issues/7669 for details * Update abstractgeneratortest_generator.cpp * Add note about allow_out_of_order_inputs_and_outputs() to warning | 24 July 2023, 17:44:03 UTC |
71eb4ee | Steven Johnson | 21 July 2023, 00:56:11 UTC | Fix for top-of-tree LLVM (#7694) | 21 July 2023, 00:56:11 UTC |
475b774 | Steven Johnson | 19 July 2023, 19:13:02 UTC | Fix float16 under asan, attempt #2 (#7691) * Fix float16 under asan, attempt #2 Some sneakiness going on. * Update float16_t.cpp | 19 July 2023, 19:13:02 UTC |
0112da4 | Andrew Adams | 19 July 2023, 18:27:35 UTC | Fix quadratic algorithm in simplify_correlated_differences (#7686) This pass called expr_uses_var in a loop while building up a potentially long let chain. This does a quadratic amount of work in the size of the let chain, which stalled compilation for a particular pathological pipeline I encountered. This changes it to an eager algorithm that tracks the set of free variables and incrementally grows it instead of revisiting the entire expr for each new let added. It is n log(n) in the number of lets instead of n^2 Co-authored-by: Steven Johnson <srj@google.com> | 19 July 2023, 18:27:35 UTC |
18fbc15 | Steven Johnson | 18 July 2023, 18:17:27 UTC | Add Sanitizer details to README_cmake.md (#7688) | 18 July 2023, 18:17:27 UTC |
5f56e64 | Andrew Adams | 18 July 2023, 16:05:59 UTC | Add a select overload for tuples (#7672) * Add a select overload for tuples * Add missing overload * deprecate tuple_select * Fix Python bindings for deprecation of tuple_select() * Update PyIROperator.cpp --------- Co-authored-by: Steven Johnson <srj@google.com> | 18 July 2023, 16:05:59 UTC |
4ba0d8b | Steven Johnson | 18 July 2023, 16:03:37 UTC | Fix correctness_float16_t for ASAN builds (#7687) This appears to be a glitch that has to do with changing ABI for float16 across versions of GCC; we build LLVM with gcc-9 on Linux, but the float16 ABI got changed (and unified in gcc12); since ASAN builds use Clang even on linux, there is a hiccup here. This is an ugly monkey-patch to work around this issue. | 18 July 2023, 16:03:37 UTC |
601b5c5 | Steven Johnson | 11 July 2023, 19:37:55 UTC | Remove ParamMap (#7675) ParamMap was deprecated in Halide 16; per https://github.com/halide/Halide/pull/7357, we should go ahead and remove it for Halide 17, in favor of `compile_to_callable()`. | 11 July 2023, 19:37:55 UTC |
41d6d94 | Andrew Adams | 11 July 2023, 16:52:18 UTC | Update onnx app to Adams2019 autoscheduler and new autoscheduler API (#7673) * Update onnx app to Adams2019 autoscheduler and new autoscheduler API Fixes #7670 * Add model test too * Remove use of tmpnam * Don't test onnx app in a 32-bit build | 11 July 2023, 16:52:18 UTC |
9755e3d | Steven Johnson | 29 June 2023, 17:15:03 UTC | Attempt to fix intermittent PCH "modified" errors (#7666) * Attempt to fix intermittent PCH "modified" errors * Update CMakeLists.txt * Update CMakeLists.txt Co-authored-by: Alex Reinking <alex.reinking@gmail.com> --------- Co-authored-by: Alex Reinking <alex.reinking@gmail.com> | 29 June 2023, 17:15:03 UTC |
6f2cae6 | Alex Reinking | 28 June 2023, 16:38:47 UTC | Dependency wrangling part 0/N: standard CMake modules (#7658) * Hoist Threads::Threads to the top level * Remove global OpenGL dependency This is added by the helpers as-needed. Removing it here lets one build just libHalide without searching for OpenGL. * Narrow scope of OpenMP to tutorial Only the tutorial targets actually use OpenMP. Don't search for OpenMP if WITH_TUTORIALS is off. * Move JPEG and PNG deps to tools Only the Halide::ImageIO library uses these directly, so limiting the scope protects against unintented use. * Work around CMake bug The CMake $<TARGET_NAME_IF_EXISTS:...> genex uses dynamic scoping w.r.t. the target environment, rather than the usual static scoping. This means we need to move the PNG and JPEG dependencies higher up. * Add link to CMake issue in comments. | 28 June 2023, 16:38:47 UTC |
470f43c | Derek Gerstmann | 27 June 2023, 17:34:55 UTC | Bump Halide version to 17.0.0 in main (#7636) * Bump Halide version to 17.0.0 in main * Bump compatible LLVM version requirements to 17, 16, 15. Update build instructions to use newer LLVM version. * Bump clang-format/tidy LLVM version to 15 (minimum required to build Halide) * trigger buildbots * Revert LLVM requirements for run_clang_format/tidy. Do this in a separate PR. --------- Co-authored-by: Derek Gerstmann <dgerstmann@adobe.com> Co-authored-by: Steven Johnson <srj@google.com> | 27 June 2023, 17:34:55 UTC |
c7ca15f | Steven Johnson | 26 June 2023, 22:22:27 UTC | Enable clang-tidy's modernize-use-default-member-init check (#7662) * Upgrade clang-format and clang-tidy to use v16 (Skipping over 15 entirely in favor of the newest stable version) * Update presubmit.yml * Update .clang-tidy * Update .clang-tidy * fixes * Update run-clang-tidy.sh * Update .clang-tidy * Update .clang-tidy * fixes * Update .clang-tidy * Update PyHalide.cpp * Update run-clang-tidy.sh * Update CodeGen_Vulkan_Dev.cpp * Update .clang-tidy * fix * format | 26 June 2023, 22:22:27 UTC |
c28a00f | Steven Johnson | 26 June 2023, 20:10:27 UTC | Update for top-of-tree LLVM changes (#7663) | 26 June 2023, 20:10:27 UTC |
1e3431c | Steven Johnson | 24 June 2023, 01:35:17 UTC | Enable the misc-use-anonymous-namespace clang-tidy check (#7661) * Upgrade clang-format and clang-tidy to use v16 (Skipping over 15 entirely in favor of the newest stable version) * Update presubmit.yml * Update .clang-tidy * Update .clang-tidy * fixes * Update run-clang-tidy.sh * Update .clang-tidy * Update .clang-tidy * fixes * Update .clang-tidy * Update PyHalide.cpp * Update run-clang-tidy.sh * Update CodeGen_Vulkan_Dev.cpp * Enable the misc-use-anonymous-namespace clang-tidy check Basically just says "don't use static" * Update Generator.h * Update Util.cpp * Update JITModule.cpp | 24 June 2023, 01:35:17 UTC |
c2e4f6d | Steven Johnson | 24 June 2023, 01:33:46 UTC | Upgrade clang-format and clang-tidy to use v16 (#7660) * Upgrade clang-format and clang-tidy to use v16 (Skipping over 15 entirely in favor of the newest stable version) * Update presubmit.yml * Update .clang-tidy * Update .clang-tidy * fixes * Update run-clang-tidy.sh * Update .clang-tidy * Update .clang-tidy * fixes * Update .clang-tidy * Update PyHalide.cpp * Update run-clang-tidy.sh * Update CodeGen_Vulkan_Dev.cpp | 24 June 2023, 01:33:46 UTC |
2a93cb0 | Steven Johnson | 23 June 2023, 20:53:14 UTC | Get the ASAN toolchain working again (#7604) * Get the ASAN toolchain working again Various fixes to enable ASAN to finally work (linux x64 only). Note that this found several ASAN failures in the Anderson2021 autoscheduler tests, which are *not* fixed yet; I'll fix thus in a subsequent PR. * Remove stuff that I didn't mean to check in * Configure cuda-specific tests properly too * trigger buildbots * Update CodeGen_LLVM.cpp * Update CodeGen_LLVM.cpp * Fix sloppiness? * Update CMakeLists.txt * trigger buildbots * Use Halide_PYTHON_LAUNCHER to implement ASAN toolchain fixes (#7657) * Use new Halide_PYTHON_LAUNCHER to set env vars * Update CMake docs for Halide_SANITIZER_ENV_VARS --------- Co-authored-by: Alex Reinking <areinkin@qti.qualcomm.com> --------- Co-authored-by: Alex Reinking <alex.reinking@gmail.com> Co-authored-by: Alex Reinking <areinkin@qti.qualcomm.com> | 23 June 2023, 20:53:14 UTC |
0de9eb2 | Steven Johnson | 23 June 2023, 17:47:03 UTC | Fix incorrect name-mangling for llvm.experimental.vp.strided.load (#7654) These ops are only used for RISCV codegen at present, and this one tended to only happen for complex patterns that we don't test in our very limited crosscompilation tests. | 23 June 2023, 17:47:03 UTC |
0218c9e | Andrew Adams | 23 June 2023, 15:21:38 UTC | Add a compositing example app (#7646) * Initial version of a compositing demo app * Improve schedule; add GPU version * Better mux codegen * Consider all definition exprs in mullapudi autoscheduler * Add Tuple mux to IROperator * clang-format, better comments * Remove pointless blank line * Add some fixed-point intrinsics to RegionCosts.cpp to suppress warnings * Add perf numbers * Hopefully fix cmake build * clang-format * clang-format * Fix muxing FuncRefs * More comments * Update process.cpp * Include cmath to hopefully get M_PI * Revert inclusion of cmath --------- Co-authored-by: Steven Johnson <srj@google.com> | 23 June 2023, 15:21:38 UTC |
1e963ff | Steven Johnson | 22 June 2023, 21:45:22 UTC | Default RISCV backend to OFF for LLVM < 17 (#7650) LLVM17 is doing a lot of work on the RISCV backend, and the amount of testing done on Halide's LLVM16-based RISCV codegen is very light. It's been suggested that we should default to not enabling the RISCV backend for LLVM16 and earlier because of this (so that people attempting to use Halide for RISCV won't encounter a possible footgun). This PR just adds the relevant mechanism; whether or not this is the correct decision is not clear. Discussion welcome. | 22 June 2023, 21:45:22 UTC |
9232218 | Steven Johnson | 22 June 2023, 18:20:20 UTC | Fix RISCV codegen for top-of-tree LLVM (#7648) * Fix RISCV codegen for top-of-tree LLVM Also add a warning if you try to codegen with older versions of LLVM: many intrinsics have changed in ways that are hard to deal with both ways, and trying to support both would be painful and of dubious value. * Make LLVM16 work too * Update CodeGen_RISCV.cpp | 22 June 2023, 18:20:20 UTC |
bd42076 | Steven Johnson | 21 June 2023, 22:48:44 UTC | Add user_assert for zero vector width in CodegenRISCV (#7647) * Add user_assert for zero vector width in CodegenRISCV If you forget to add `-rvv-vector_bits_N` to your Target string, we try to codegen with a vector width of 0, which (unsurprisingly) craters in many places which assume a nonzero value. It's pretty unlikely anyone wants to use Halide to codegen to a RISCV core that lacks SIMD, so let's add a more helpful failure message for this easy-to-make error (we can revisit this later if it actually is desirable for some reason.) (I looked briefly at trying to clean up all the places in CodegenLLVM, etc, that make that assumption, but it quickly turned into a rat's nest; it's definitely fixable if we want to support this in the future, but, again, I suspect we don't.) * Update CodeGen_RISCV.cpp | 21 June 2023, 22:48:44 UTC |
8acdc46 | Andrew Adams | 20 June 2023, 22:40:21 UTC | Be more careful about overflow in trim_bounds_using_alignment (#7645) * Be more careful about overflow in trim_bounds_using_alignment Fixes #7575 * trigger buildbots --------- Co-authored-by: Steven Johnson <srj@google.com> | 20 June 2023, 22:40:21 UTC |
3b7e83a | Andrew Adams | 17 June 2023, 00:21:19 UTC | Alternative fix for #4211 (#7628) * Alternative fix for #4211 Call::Prefetch evaluates to a currently-unspecified value of the prefetched type. Let's just make it zero. * Fix prefetch_2d * Fix CodeGen_C * Fix CodeGen_C * trigger buildbots --------- Co-authored-by: Steven Johnson <srj@google.com> | 17 June 2023, 00:21:19 UTC |
2149734 | Steven Johnson | 15 June 2023, 00:48:09 UTC | Revise LLVM fix to work when no V8 or WABT available (#7635) * Revise LLVM fix to work when no V8 or WABT available * Update WasmExecutor.cpp * Update WasmExecutor.cpp * Update WasmExecutor.cpp | 15 June 2023, 00:48:09 UTC |
932ad0b | Shoaib Kamil | 14 June 2023, 17:15:18 UTC | Deprecate OpenGLCompute for Halide 16 (#7627) * Deprecate OpenGLCompute for Halide 16 * clang-format | 14 June 2023, 17:15:18 UTC |
1f5b207 | Steven Johnson | 13 June 2023, 23:55:36 UTC | Fix wasm linker for top-of-tree LLVM (#7634) | 13 June 2023, 23:55:36 UTC |
37fd8c4 | Derek Gerstmann | 13 June 2023, 20:34:48 UTC | Bump HALIDE_VERSION_MAJOR to 16 in makefile in prep for release (#7632) Co-authored-by: Derek Gerstmann <dgerstmann@adobe.com> | 13 June 2023, 20:34:48 UTC |
fa3d87c | Andrew Adams | 13 June 2023, 16:31:34 UTC | Fix inverted may_subtile checks (#7626) | 13 June 2023, 16:31:34 UTC |
bd62a35 | Zalman Stern | 08 June 2023, 01:18:19 UTC | Significant change to RISC V and scalable vector code generation. (#7616) * Completely rework how RISC-V vector intrinsics are called to avoid issues iwth single element vectors being confused with scalars and other conversions that can happen via using call_intrin. Allows using any size vector. Only downside is splitting large vectors no longer happens, but RISC-V allows an LMUL of 8, meaning a vector of up to 8 times the vector register size will compile so this is much less of an issue. Splitting larger vectors can be added. Should also allow fractionaly LMUL in all cases, but this is not verified. * Significant refactor/rewrite of RISC V vector intrinsics support. Should handle many more cases and be well on the way to handling arbitary vector widths within the LMUL range. More tests added to simd_op_check_riscv . Likely well setup to move SVE2 to a similar approach, perhaps without the full genearilty on vector lengths. (I.e. they may need to be quantized to vscale, or offer better performance in that case.) * Formatting fixes. * More formatting. * Don't try to convert void types to match expected vector type. * Backout comment change that is no longer relevant. * Fix failure in camera_pipe app. (Code to make sure vector types match was being presented with a scalar only mismatch. Changed it to ignore scalar to scalar cases.) Address review feedback. * One more review comment. * Comment fix. --------- Co-authored-by: Steven Johnson <srj@google.com> | 08 June 2023, 01:18:19 UTC |
67eaff3 | Steven Johnson | 07 June 2023, 00:30:45 UTC | Upgrade our PyBind11 version to 2.10.4 (#7617) (#7618) * Upgrade our PyBind11 version to 2.10.4 (#7617) * Forgot to save | 07 June 2023, 00:30:45 UTC |
123d855 | Steven Johnson | 06 June 2023, 16:28:34 UTC | Fix PCH build failures (#7613) * Fix PCH build failures (Harvested from #7604 to land separately) * Update CMakeLists.txt | 06 June 2023, 16:28:34 UTC |
ffd20c9 | Steven Johnson | 05 June 2023, 20:33:21 UTC | Revert "[Hexagon] Fix compilation failures hexagon_remote" (#7614) Revert "[Hexagon] Fix compilation failures hexagon_remote (#7601)" This reverts commit bd33a629adfd89129d21fe82e68fc7d20f935283. | 05 June 2023, 20:33:21 UTC |
2304dd8 | Steven Johnson | 05 June 2023, 19:16:59 UTC | Add missing deps for some autoscheduler tests (#7605) * Add missing deps for some autoscheduler tests Autoscheduler tests that rely on the relevant shared library being available at runtime need to add a dependency to ensure this is the case. * Update CMakeLists.txt * Update test.cpp * Update CMakeLists.txt * Update CMakeLists.txt * Update test/autoschedulers/li2018/CMakeLists.txt Co-authored-by: Alex Reinking <alex.reinking@gmail.com> --------- Co-authored-by: Alex Reinking <alex.reinking@gmail.com> | 05 June 2023, 19:16:59 UTC |
51e4e04 | Zalman Stern | 05 June 2023, 19:16:21 UTC | Add target triple setup for RISC V Android. (#7612) Add target triple setup for RISC V Android. More guess based than test based but I'm 80% confident these are good choices. Data layout seems to be the same as well. | 05 June 2023, 19:16:21 UTC |
7e57438 | Steven Johnson | 05 June 2023, 16:59:48 UTC | Silence `psabi` warnings when compiling C++ generated code (#7603) * Silence `psabi` warnings when compiling C++ generated code Some versions of GCC/Clang emit many of these warnings when compiling in some Intel configurations, and they are useless in this context. Make them go away. * Update cmake/HalideGeneratorHelpers.cmake Co-authored-by: Alex Reinking <alex.reinking@gmail.com> --------- Co-authored-by: Alex Reinking <alex.reinking@gmail.com> | 05 June 2023, 16:59:48 UTC |
9ee2d0c | Steven Johnson | 03 June 2023, 00:34:54 UTC | Update Compiler/OS versions in README (#7610) | 03 June 2023, 00:34:54 UTC |
f3e1829 | Nathaniel Brough | 01 June 2023, 20:46:49 UTC | Adds fuzzing preset (#7566) * Adds fuzzing preset Partial fix for #7552 * Adds documentation on fuzz testing Closes: #7552 * Fixes spelling/grammar in fuzzing readme Co-authored-by: Alex Reinking <alex.reinking@gmail.com> * Remove asan flags from fuzzer * Add build directory in cmake/fuzzing documentation * Configure the fuzz tests to run for a finite amount of time * Update README * Update README_fuzz_testing.md * trigger buildbots * trigger buildbots * trigger buildbots * Update CMakeLists.txt --------- Co-authored-by: Steven Johnson <srj@google.com> Co-authored-by: Alex Reinking <alex.reinking@gmail.com> | 01 June 2023, 20:46:49 UTC |
4991231 | Steven Johnson | 01 June 2023, 19:52:02 UTC | Disable fuzzer when using ASAN (#7602) | 01 June 2023, 19:52:02 UTC |
bd33a62 | aankit-ca | 01 June 2023, 18:55:31 UTC | [Hexagon] Fix compilation failures hexagon_remote (#7601) Fix for: 1. Include directory for pthread.h 2. Function signature for qurt_hvx_lock Co-authored-by: Ankit Aggarwal <aankit@quicinc.com> | 01 June 2023, 18:55:31 UTC |
3e6e2a5 | Andrew Adams | 01 June 2023, 17:36:56 UTC | Fix operator/ on ModulusRemainder (#7597) It wasn't reducing the remainder modulo the modulus, which confused trim_bounds_using_alignment in the simplifier. | 01 June 2023, 17:36:56 UTC |
eb9b946 | Steven Johnson | 30 May 2023, 22:39:41 UTC | Apply fix from #7564 to fuzz/bounds (#7596) (Avoids infinite loop for some fuzzing inputs) | 30 May 2023, 22:39:41 UTC |
b450647 | Pranav Bhandarkar | 25 May 2023, 21:07:38 UTC | [Fix for #7524] Skip tests for anderson2021 if PTX is not enabled (#7593) Skip tests for anderson2021 if PTX is not enabled | 25 May 2023, 21:07:38 UTC |
ca8ca00 | Steven Johnson | 24 May 2023, 20:14:29 UTC | Pacify clang-tidy by removing unused constant (#7590) | 24 May 2023, 20:14:29 UTC |
6a98655 | Nathaniel Brough | 22 May 2023, 17:16:28 UTC | fuzz: Add libfuzzer compatible bounds fuzzer (#7549) * fuzz: Add libfuzzer compatible bounds fuzzer * Remove unused constant * Style fix * Fix handling of binary ops * Handle casting to vector-of-bool properly * fuzz: Alphabetically sort targets in CMake --------- Co-authored-by: Steven Johnson <srj@google.com> | 22 May 2023, 17:16:28 UTC |
d234143 | Steven Johnson | 18 May 2023, 19:04:34 UTC | Check for slightly different error msg in AppleClang 14.0.3 (#7582) * Check for slightly different error msg in AppleClang 14.0.3 * Update Makefile | 18 May 2023, 19:04:34 UTC |
02768ef | Steven Johnson | 18 May 2023, 17:43:49 UTC | In fuzz/simplify, output errors to cerr, not cout (#7583) * In fuzz/simplify, output errors to cerr, not cout This makes it easier to capture error output in downstream test harnesses * Also add some more helpful text | 18 May 2023, 17:43:49 UTC |
4282a5d | Steven Johnson | 18 May 2023, 17:05:36 UTC | Fix #7579 (#7580) Fix per @jrprice. (He comments that we should probably regenerate all of mini_webgpu.h, and document how to do that; this is a band-aid to unbreak testing.) | 18 May 2023, 17:05:36 UTC |
2ed955e | Steven Johnson | 18 May 2023, 00:49:25 UTC | Fix various compilation errors with AppleClang 14.0.3 (#7578) * Change & -> && usage Newer versions of Xcode trigger `-Wbitwise-instead-of-logical` for this usage, which we treat as an error * Also fix `error: variable 'i' set but not used [-Werror,-Wunused-but-set-variable]` * Also fix `retrain_cost_model.cpp:419:17: error: variable 'counter' set but not used [-Werror,-Wunused-but-set-variable]` | 18 May 2023, 00:49:25 UTC |
6c8f7aa | Derek Gerstmann | 17 May 2023, 23:05:00 UTC | [vulkan] Fix subregion memory offsets to respect buffer alignment (#7576) * Fix buffer alignment constraints for subregion allocations (some drivers report a minimum alignment for the buffer that is larger than the storage or uniform storage offset alignemnt) Cleanup region offset and size constraints * Clang tidy/format pass --------- Co-authored-by: Derek Gerstmann <dgerstmann@adobe.com> | 17 May 2023, 23:05:00 UTC |
30d309e | Derek Gerstmann | 17 May 2023, 23:04:45 UTC | [vulkan] Change the feature version requirement to v1.3 for correctness_gpu_dynamic_shared (#7577) Change the feature version requirement to v1.3 (since v1.2 lacks the necessary support). Co-authored-by: Derek Gerstmann <dgerstmann@adobe.com> | 17 May 2023, 23:04:45 UTC |
2fd90bf | Derek Gerstmann | 17 May 2023, 18:47:36 UTC | [vulkan] Disable generator acquire_release test for Vulkan (#7565) Disable test for Vulkan Co-authored-by: Derek Gerstmann <dgerstmann@adobe.com> | 17 May 2023, 18:47:36 UTC |
968e52c | Steven Johnson | 17 May 2023, 17:09:47 UTC | Upgrade WABT to 1.0.33 (#7570) * Upgrade WABT to 1.0.33 * Update CMakeLists.txt * Update CMakeLists.txt | 17 May 2023, 17:09:47 UTC |
76bb84d | Steven Johnson | 16 May 2023, 18:25:08 UTC | Allow autoconversion from `Buffer<T>` -> `Buffer<const T>&` and to `Buffer<void>&` (#7571) * Allow autoconversion from `Buffer<T>` -> `Buffer<const T>&` When you are intermixing CPU and GPU calls in a single piece of code, it's preferable to pass `Buffer<>` by nonconst reference, so that lazy host<->device copies are done efficiently. However, many callers prefer to define input Buffers as `Buffer<const T>` (as they should), but the fact that this form didn't easily allow autoconversion from caller (whihc may well have constructed the buffer as non-const) to callee (due to incompatible type references) led some users to just pass by a copy, since these autoconverted. This had a couple of undesirable effects: - Making a copy cost a small but nonzero amount of code (managing refcounts, etc) - More importantly, lazy copies in the callee got 'lost' to the caller, since the `halide_buffer_t` in the callee was a copy, thus any added `device` value or change in dirty bits was never seen. This could previously be worked around by adding explicit calls to `.as_const()`, but that is ugly and awkward. This change adds an ugly-but-safe implicit-conversion overload, to allow converting `Buffer<T>&` to `Buffer<const T>&`, iff T isn't already const. This will allow cleaning up downstream code to pass by references more consistently, without needing to add `.as_const()` warts. * Also add convenience conversions for Buffer<void>& | 16 May 2023, 18:25:08 UTC |
f121abf | philboske | 15 May 2023, 23:47:02 UTC | Fix save_tiff() PlanarConfig assignment for monochrome inputs (#7568) Fixes #7567. | 15 May 2023, 23:47:02 UTC |
ae53d9b | Steven Johnson | 12 May 2023, 18:18:20 UTC | Avoid potentially infinite loop in fuzz/simplify.cpp (#7564) FuzzedDataProvider is *not* a RNG; there's no guarantee that it won't return the same data to you forever. This means that the loop to find a new subtype may never terminate (eg if the 'random' type returned always matches the input type). This "fixes" it by just adding a count to break out of the loop, in which case we just use the original type. Not sure if there's a more elegant fix? | 12 May 2023, 18:18:20 UTC |
e0ef57a | Steven Johnson | 12 May 2023, 16:37:40 UTC | Remove unique_name() usage from fuzz/cse (#7563) | 12 May 2023, 16:37:40 UTC |
252c4b8 | Steven Johnson | 11 May 2023, 17:07:30 UTC | Add/augment some runtime debug output (#7561) - in `halide_buffer_to_string()`, print the `halide_buffer_t*` pointer value as well - in `debug_log_and_validate_buf()`, do debug logging for some failure modes that return errors | 11 May 2023, 17:07:30 UTC |
afea893 | Derek Gerstmann | 10 May 2023, 15:55:38 UTC | [vulkan] Disable performance_wrap test for Vulkan ... results don't match (#7560) * Fix missing initializer for vulkan memory config that got munged in a previous merge. This gets the correctness_multiple_outputs test to pass. * Disable test for Vulkan since shared memory results are incorrect (see issue #7559) * Clang tidy/format pass --------- Co-authored-by: Derek Gerstmann <dgerstmann@adobe.com> | 10 May 2023, 15:55:38 UTC |
53de4ce | Steven Johnson | 09 May 2023, 18:24:07 UTC | Fix #7556 (#7557) * Fix #7556 * Update cast.cpp * Add user_assert that type lanes match * Revert "Add user_assert that type lanes match" This reverts commit e1f34e0c3098a4952af64ae88632bb2ada9763b1. | 09 May 2023, 18:24:07 UTC |
8f22013 | Steven Johnson | 09 May 2023, 17:00:40 UTC | Followup to #7551 for bool vectors (#7555) Need to cast to a type that is bool-with-lanes, not scalar bool | 09 May 2023, 17:00:40 UTC |
acde515 | Derek Gerstmann | 09 May 2023, 17:00:12 UTC | [vulkan] Fix missing initializer for vulkan memory config (#7554) Fix missing initializer for vulkan memory config that got munged in a previous merge. This gets the correctness_multiple_outputs test to pass. Co-authored-by: Derek Gerstmann <dgerstmann@adobe.com> | 09 May 2023, 17:00:12 UTC |
763d207 | Steven Johnson | 08 May 2023, 23:16:57 UTC | Fix fuzz/cse to avoid signed_integer_overflow() results (#7553) * Fix fuzz/cse to avoid signed_integer_overflow() results * Update cse.cpp | 08 May 2023, 23:16:57 UTC |
7afb343 | Steven Johnson | 08 May 2023, 20:41:31 UTC | Fix errors in fuzz/simplify.cpp (#7551) * Style Fix: don't use uppercase-T for non-template arguments * Boolean ops need extra type coercion | 08 May 2023, 20:41:31 UTC |
fb71862 | Steven Johnson | 08 May 2023, 01:39:41 UTC | Fix unused-thing warnings in fuzz/simplify.cpp (#7548) * Fix unused-thing warnings in fuzz/simplify.cpp * Update simplify.cpp | 08 May 2023, 01:39:41 UTC |
c86d418 | Nathaniel Brough | 05 May 2023, 01:33:25 UTC | fix(fuzz): Refactor fuzzers to fix off by 1 errors (#7547) Cleanup the fuzzers making them more readable and fix off by one errors caused by incorrect usage of FuzzedDataProvider::ConsumeIntegralInRange. Closes: #7546 | 05 May 2023, 01:33:25 UTC |
dff1e38 | Dmitry Babokin | 02 May 2023, 20:38:17 UTC | Remove workaround for GCC 4.x.x in cpuid() (#7545) * Remove workaround for GCC 4.x.x | 02 May 2023, 20:38:17 UTC |
96acbc6 | Steven Johnson | 02 May 2023, 20:37:47 UTC | Workaround for Issue #7539 (#7540) * Workaround for Issue #7539 Partial fix for now * trigger buildbots | 02 May 2023, 20:37:47 UTC |
05316af | Marcos Slomp | 02 May 2023, 19:01:29 UTC | metal : replacing spinlock by mutex (#7532) replacing spinlock by mutex | 02 May 2023, 19:01:29 UTC |
2945c71 | Nathaniel Brough | 02 May 2023, 16:22:26 UTC | fuzz: Port correctness/cse fuzzer over to libfuzzer (#7543) | 02 May 2023, 16:22:26 UTC |
7cdbc71 | Steven Johnson | 02 May 2023, 04:00:16 UTC | Rework CMake interface for Dawn/Node bindings (#7422) AOT pipelines that rely on Dawn/WebGPU now depend on a new Halide_WebGPU find-module. This module honors the make-ish HL_WEBGPU_NATIVE_LIB variable as a means of initializing the Halide_WebGPU_NATIVE_LIB cache variable. This is automatically handled by add_halide_generator and add_halide_runtime and is available to downstreams. The JIT tests no longer read the HL_WEBGPU_NODE_BINDINGS environment variable during the CMake configure or build phase. Instead, a test launcher reads it at CTest runtime. Co-authored-by: Alex Reinking <quic_areinkin@quicinc.com> | 02 May 2023, 04:00:16 UTC |
6db47d3 | Nathaniel Brough | 01 May 2023, 23:46:52 UTC | Fix flag check for fuzzers (#7542) On some system size_t isn't available under <cstdint>, however it is garaunteed to be available under <cstddef> for all systems. | 01 May 2023, 23:46:52 UTC |
38ed15d | Steven Johnson | 01 May 2023, 17:08:54 UTC | Fix some autoscheduler build errors (#7538) - Remove inadvertent duplicate of PerfectHashMap.h from adams2019 - add some missing #includes - never pass negative values to exit() | 01 May 2023, 17:08:54 UTC |
044a8cf | Nathaniel Brough | 01 May 2023, 14:00:36 UTC | Add libfuzzer compatible fuzz harness (#7512) | 01 May 2023, 14:00:36 UTC |
244e72c | Steven Johnson | 26 April 2023, 22:37:36 UTC | Avoid endless loop in msan + zero-extent buffer (#7536) With MSAN enabled, we use `make_buffer_copy()` to build an efficient way to check the poison bits on buffers; unfortunately, if you are checking a buffer that has at least one dimension with zero-extent but nonzero-stride, the final while loop will never terminate. Add a trivial check so that it exits. | 26 April 2023, 22:37:36 UTC |
4d86539 | Derek Gerstmann | 25 April 2023, 00:21:15 UTC | [vulkan phase2] Vulkan Runtime (#6924) * Import Vulkan runtime changes from personal branch * Fix build to work with latest changes in main * Hookup Vulkan into Target, DeviceInterface and OffloadGPULoops * Add Vulkan runtime to Makefile * Add Vulkan target to Python bindings * Add runtime linker support to target Vulkan CodeGen * Add Vulkan windows decorator to runtime targets * Wrap debug messages for internal runtime classes with DEBUG_INTERNAL Error on failed string termination * Silence clang-tidy warnings for redundant expressions on Vulkan enum values * Clang tidy & format pass * Fix formatting for single line statements * Move Vulkan option to top-level CMakeLists.txt and enable SPIR-V as needed * Fix Vulkan & SPIRV dependencies for makefile * Add Halide version info to Makefile Add HALIDE_VERSION compiler definitions to compilation * Add HL_VERSION_FLAGS to RUNTIME_CXX_FLAGS * Finish refactoring of Vulkan CodeGen to use SpirV-IR. Added splitmix64 based hashing scheme for types and constants. Numerous fixes to instruction packing. Added debug symbols to all variables. * Clang tidy/format pass. * Fix formatting * Remove leftover ifdef * Fix build error for clang OSX for mismatched type comparison * Refactor loops and conditionals to use blocks * Clang tidy/format pass * Add detailed comments for acquire context parameters * Add comments describing loader method exports and dynamically resolved function pointers Other minor cleanups * Change aborts to debug asserts for context parameters. Add error handling to acquire context. * Cache Vulkan descriptor sets and other shader module objects in compilation cache for reuse * Replace platform specific strncpy for grabbing Extension strings with StringUtils::copy_upto * Enable device features for selected device * Fix alignment constraints for to match Vulkan buffer memory requirements. Add env vars to control Vulkan Memory Allocator config. * Add Vulkan to list of supported APIs in README.md Add Vulkan specific README_vulkan.md * Clang tidy/format pass * Fix conform_alignment to handle zero values * Fix declaration of custom_allocation_callbacks to be static. Change to constexpr for invalid values * Whitespace change to trigger build. * Handle Vulkan kernels that don't require storage buffers. Updated test status. Fixes 7 test cases. * Add src/mini_vulkan.h Apache 2.0 license requirements to License file * Add descriptor set binding info as pre-amble to SPIR-V code module Fix shared memory allocation to use global variables in workgroup storage space Add extern calls for spirv and glsl builtins Add memory fence call to gpu thread barrier Add missing visitors to Vulkan CodeGen Add scalar index & vector index methods for load/store * Clang tidy & format pass * Update test results for Vulkan docs. Passing: 326 Failing: 39 * Fix formatting * Remove extraneous parentheses for is_array_type() * Add Vulkan library to linkage fo Halide generator helpers * Add SPIR-V formatted output (for debugging) * Only declare SIMT intrinics that are actually used. Cleanup & refactor add_kernel method. * Add Vulkan handler to test targets * Clang format/tidy pass * Add doc-strings to SPIR-V interface * Adjust runtime array to widest vector width based on alignment and dense vector loads/stores Fix scalar and vector load/stores Fix casts for vectors Add missing nan, inf, neg_inf, is_finite builtins * Add missing bitwise and logical and methods. Cleanups. * Add comments about necessary packages on Ubuntu v22.04 vs earlier versions * Clang tidy & format pass. * Update Vulkan test results. Pass: 329 Fail: 36 * Remove unused Produce/Consume visitor method * Fix Molten VK initialization to work with v1.3+ loader Add support for direct casts for same-size types Add missing mux, mix, lerp, sinh, tanh, etc intrinsics Add explicit storage access for variables Add a macro to enable debug messages in Vulkan Memory Allocator * Disable dynamic shared memory portion of test for Vulkan (since its not supported yet) * Disable uncached portion of test for Vulkan (since it may OOM) * Disable float64 support in Type::supports_type() for Vulkan target since it's not widely supported * Fix Shuffle to handle all known cases Hookup VulkanMemoryAllocator to gpu allocation cache. Fix if_then_else to allow calls and statements to be used Fix loop counter comparison, and don't allow dynamic loops to be unrolled. Fix scalarize to use CompositeInsert instead of VectorInsertDynamic Fix FMod to use FRem (cause SPIR-V's FMod doesn't do what you'd expect ... but FRem does?!) Use exact same sematics for barriers as GLSL Compute ... still not passing everything Fix SPIR-V block termination checks, keys for null constants, and other cleanups * Clang tidy & format pass * Update correctness test results. PASS: 338, FAIL: 27 * Move counter inside debug #define to fix build * Relax tolerance for newton's method to match other GPU APIs Skip gpu dynamic shared testfor Vulkan (since dynamic shared allocations aren't supported yet) Update correctness test status. PASS: 340, FAIL: 25 * Clang format/tidy pass * Skip Vulkan for float64 for correctness test round (since f64 is optional) * Skip Vulkan for tests that rely upon device crop, and slice. * Only test small vector widths for Vulkan (since widths >=8 are optional) * Caninicalize gpu vars for Vulkan * Fix loop initialization, and increments Add all explicit types, and fix constant declarations Add missing fast intrinsics Convert results of logical ops into expected types (instead of bools) * Add SpvInstruction::add_operands(), add_immediates() and template based append() Make integer logical operations explicit. Better handling of constant data. * Clang format & tidy pass * Fix windows build ... refactor convert_to_bool to use std::vectors rather than dynamic fixed sized arrays * Skip asyn_device_copy, device_buffer_copy, device_crop, and device_slice tests for Vulkan (for now). * Don't test large vector widths for Vulkan (since they are optionally supported) * Clear Vulkan buffer allocations prior to use (tbd if this is necessary) * Skip Vulkan for async copy chain test * Skip Vulkan for interpreter test * Clang tidy/format pass * Fix formatting * Fix build ... use error messages for errors * Separate shared memory resources by element type for Vulkan. * Add Vulkan to conditional for fusing gpu loops * Reorder reset method to match declaration ordering. * Cleanup debug log messages for Vulkan resources * Assert alignment is power of two * Only split regions that have already been freed. Add more debug messages to log * Explicitly cleanup Vulkan command buffers as after they are used Avoid recreating descriptor sets Tidy up Vulkan debug messages * Fix Div, Mod, and div_round_to_zero for integer cases Cleanup reset method * Skip Vulkan for async_copy_chain * Skip 64-bit values on Vulkan since they are optionally supported * Skip interleave_rgb for Vulkan (which doesn't support cropping) * Skip interpreter for Vulkan (which doesn't support dynamic allocation of shared mem). * Clang Tidy/Format pass * Handle calls to pow with negative values for Vulkan Add integer and float constant helpers to SPIRV * Only test real numbers for pow with Vulkan * Clang tidy/format pass * Fix logic so a region request of an entire block matches if exactly the same size as an empty block * Create a zero size buffer to check for alignment Return null handles after freeing * Add more verbose debug output for malloc * Fix UConvert logic to avoid narrowing an integer type less than 8 bits Remove optimization path for division which seems to fail worse than DIV Cleanup DIV and MOD operators * Clang format/tidy pass * Fix SConvert & UConvert ops * Add retain semantics to block allocator interface Update test to validate retain/release/reclaim functionality * Implement device_crop, device_slice and release_crop for Vulkan. Re-enable device_crop, device_slice and interleave_rgb tests. * Clang format/tidy pass * Implement device copy for Vulkan. Enable device copy test. * Clang format/tidy pass * Fix signed mod operator and use euclidean identity (just like glsl) * Clang format/tidy pass * Fix to handle Mod on vectors (use vector constant for bitwise and) * Fix pow operator for Vulkan, and re-enable math test to full range. * Add error checking for return types for conditionals Use bool types for ops that require them, and adapt to expected return types * Handle deallocation for existing regions prior to coalescing. Cleanup region allocator logic for availability. Augment block_allocator test to cover allocation reuse. * Clang tidy/format pass * Fix reserved accounting for regions * Add more details to Windows specific Vulkan build config * Update SPIR-V headers to v1.6 * Add support for dynamic shared memory allocations for Vulkan Add dynamic workgroup dispatching to Vulkan Add optional feature flags for Vulkan capabilities Add Vulkan API version flags for target features Enable v1.3 path if requested Re-enable tests for added features Update Vulkan docs with status updates and feature flags * Enable Vulkan asyc_device_copy test. * Disable Vulkan performance test for async gpu (for now). * Disable Vulkan from python AOT tests and tutorials (since it requires linkage against the vulkan loader system library). * Update Vulkan readme with latest status. Everything works! More or less. =) * Clang format pass * Cleanup formatting for Halide version info in Makefile * Fix typos and address review comments for Vulkan readme * Change value casts to match Halide conventions * Fix typos in comments * Add static_assert to rotl to make compilation errors clearer (instead of using enable_if) Fix debug(3) formatting to avoid super long messages Use lookup table for SPIR-V op code names * Fix typos and logic for Vulkan capabilities * Remove leftover debug ifdef * Fix typo in comments * Rename copy_upto(...) method to be copy_up_to(...) * Handle error case for uninitialized buffer allocation (rather than abort) Fix typos in comments * Support any arbitary number of devices and queues for context creation Fix typos in comments * Add get/set alloc_config methods and API hooks for configuring the VulkanMemoryAllocator * Remove leftover debug ifdef * Hookup API methods for get/set alloc_config when initializing the VulkanMemoryAllocator * Remove empty lines in main * Add required capability flags for 8-bit and 16-bit uniform and storage buffer access Handle casts for GLSL ops (spec requires all args to be the same type as the return type) * Add VkPhysicalDevice8BitStorageFeaturesKHR and related constants * Query for 8-bit and 16-bit uniform and storage access support. Enable these as part of the device feature query chain. * Use VK_WHOLE_SIZE for setting buffer (to pass validation ... otherwise size has to be a multiple of alignment) Remove useless debug asserts for static variables Fix debug logging messages for allocations of scalars (which may not have a dim array) * Query for device limits to enforce min alignment constraints for storage and uniform buffers * Fix shutdown sequence to iterate over descriptor sets Avoid bug in validation layer by reordering destruction sequence * Clang format & tidy pass * Fix logic for locating entry point shader binding ... assume exact match for entry point name Cleanup entry point binding variables and clarify usage * Remove accidentally uncommented debug statements * Cleanup debug output for buffer related updates * Fix split and allocate methods in region allocator to fix issues with alignment constraints - discovered a hang if requested size couldn't be fulfilled after adjusting to aligned sizes - cause was incorrect splitting of existing regions Cleanup region allocator iteration, cleanup and shutdown Added maximum_pool_size configuration option to Vulkan Memory Allocator to restrict pool sizes * Added notes about TARGET_VULKAN=ON being the default now Added links to LunarG MoltenVK SDK installer, and brew packages * Fix markdown formatting * Fix error code handling in Vulkan runtime and internal datastructures. Refactor all (well nearly all) return values to use halide error codes. Reduce the usage of abort_if() for recoverable errors. * Fix typo in error message * Fix typo in readme * Skip GPU allocation cache test on MacOSX since MoltenVK only supports 30 buffers to be allocated * Skip widening reduction test on Vulkan for Mac OSX/IOS since MoltenVK fails to translate calls with vector types for builtins like min/max. etc * Skip doubles in vector cast test on Vulkan for Mac OSX/IOS since Molten doesn't support them * Skip gpu_dynamic_shared and gpu_specialize test for Vulkan on Mac OSX/IOS since MoltenVK doesn't support the dynamic shared memory allocation or dynamic grid size. * Clang format / tidy pass * Resolve conflicts for mini_webgpu.h ... revert to main * Use unique intrinsic var names for each kernel Cleanup constant value declarations with template helper methods Add comments on workgroup size usage * Wrap debug output under ifdef DEBUG_RUNTIME_INTERNAL macro guard Add nearest_multiple constraint to block/region allocator * Add vk_clear_device_buffer utility method Add nearest_multiple constrating to vulkan memory allocatori + fixes correctness/multiple_outputs test Add vkCreateBuffer/vkDestroyBuffer debug output i + for gpu_object_lifetime_tracker Cleanup shutdown for shader_module destruction * Add note about nearest_multiple constraint for vulkan memory allocator * Hookup gpu_object_lifetime_tracker with Vulkan debug statements * Skip dynamic shared memory portion of test for Vulkan on iOS/OSX. * Fix stale comment for float type support. Fix incorrect lowering for intrinsic. --------- Co-authored-by: Derek Gerstmann <dgerstmann@adobe.com> Co-authored-by: Steven Johnson <srj@google.com> | 25 April 2023, 00:21:15 UTC |
fcddcf8 | Marcos Slomp | 24 April 2023, 13:13:21 UTC | metal : replacing `arg_sizes` by `arg_types` in kernel run interface (#7505) * replacing arg_sizes by arg_types * build fix * allocating and computing arg_sizes[] on the stack * clang-format * zero termination oopsie! * special case when argument is a buffer * telling runtime to pass argument types instead of argument sizes to the kernel run call * args[i] could well be 0! * removing arg_sizes[] * addressing code review comments --------- Co-authored-by: Marcos Slomp <slomp@adobe.com> Co-authored-by: Steven Johnson <srj@google.com> | 24 April 2023, 13:13:21 UTC |
e55834b | Steven Johnson | 24 April 2023, 01:06:59 UTC | Fix Anderson2021 tests to avoid spurious failures on non-Cuda systems (#7518) * Fix Anderson2021 tests to avoid spurious failures on non-Cuda systems The Anderson2021 autoscheduler is pretty Cuda-specific, so some tests assume it is present; this is pretty much never true on macOS, and annoying spurious failures are annoying. This adds a new flag and capability to RunGenMain to try to sniff out the necessary runtime setup and make it a quiet [SKIP] failure when testing. * Use set instead of strstr() * Update LoopNest.cpp * Update RunGenMain.cpp * Update RunGenMain.cpp * Update RunGenMain.cpp * Update RunGenMain.cpp * trigger buildbots * Update RunGenMain.cpp | 24 April 2023, 01:06:59 UTC |
93a5887 | Steven Johnson | 20 April 2023, 17:58:33 UTC | Make stmt_html generation work correctly for submodules (#7522) * Don't erase stmt_html before resolving submodules * Fix stmt_html for submodules | 20 April 2023, 17:58:33 UTC |
294f80c | Andrew Adams | 19 April 2023, 23:26:50 UTC | Forbid assigning to Buffer(Expr) by introducing an intermediate type. (#7517) * Forbid assigning to Buffer(Expr) by introducing an intermediate type. Fixes #7514 * Simpler solution * Silence clang-tidy | 19 April 2023, 23:26:50 UTC |
8670a25 | Steven Johnson | 19 April 2023, 18:28:11 UTC | Fix for top-of-tree LLVM (#7523) | 19 April 2023, 18:28:11 UTC |
2527c35 | Steven Johnson | 18 April 2023, 21:31:56 UTC | Don't accidentally embed .s files in .a files when emitting stmt_html (#7520) * Don't accidentally embed .s files in .a files when emitting stmt_html Followup fix for #7516 * format | 18 April 2023, 21:31:56 UTC |
42e71f2 | Steven Johnson | 18 April 2023, 16:28:26 UTC | Convert stmt_html output to use stmt_viz output (#7516) * Allow emitting `stmt_viz` without specifying `assembly` TL;DR: if we request `stmt_viz` without `assembly`, just generate the latter to a temp file that we dispose of later; this wasn't feasible before since we were previously requiring the assembly output to be generated with the same directory and basename as stmt_viz, but that was fixed. * Convert stmt_html output to use stmt_viz output Per discussion on #7507, this entirely removes the "classic" stmt_html output and replaces it with the "new" StmtToViz output. Using `compile_to_lowered_stmt` or requesting `stmt_html` will now always output the new output, and requesting `stmt_viz` output is no longer legal. (Note that this builds on top of #7515, which must be submitted first.) It's not clear to me whether https://github.com/halide/Halide/issues/7507#issuecomment-1511761706 is a blocker for this change, or a request to add back already-lost functionality. * Update Makefile * Update Generator.cpp | 18 April 2023, 16:28:26 UTC |
8efc688 | Steven Johnson | 17 April 2023, 22:22:44 UTC | Allow emitting `stmt_viz` without specifying `assembly` (#7515) TL;DR: if we request `stmt_viz` without `assembly`, just generate the latter to a temp file that we dispose of later; this wasn't feasible before since we were previously requiring the assembly output to be generated with the same directory and basename as stmt_viz, but that was fixed. | 17 April 2023, 22:22:44 UTC |
c9c85dc | Steven Johnson | 16 April 2023, 00:43:43 UTC | Improve assembly-file finding logic in StmtToViz (#7513) (1) Avoid having to guess at location by just passing in the location, since we usually already know it. (2) If we don't know it, be more cautious when constructing it: the output html filename might not match our expectations, and all file extensions must use get_output_info() to work correctly on all platforms. | 16 April 2023, 00:43:43 UTC |
04f09d4 | Andrew Adams | 14 April 2023, 16:58:23 UTC | Add error message when casting multi-element Realization to Buffer (#7506) * Add error message when casting multi-element Realization to Buffer Fixes #7504 * Add missing test | 14 April 2023, 16:58:23 UTC |
e20d798 | Svenn-Arne Dragly | 13 April 2023, 19:17:43 UTC | Add build number to Python wheel before uploading (#7500) * Add build number to Python wheel before uploading This change adds a build number based on GitHub Actions' `github.run_id` to the Python wheel before uploading. This should work around the issue that causes the uploads to fail currently. Fixes #7293 * fixup! Add build number to Python wheel before uploading | 13 April 2023, 19:17:43 UTC |