https://github.com/halide/Halide
- HEAD
- refs/heads/Halide_unsharp
- refs/heads/abadams/align_strided_const_loads
- refs/heads/abadams/alloca
- refs/heads/abadams/atomic_parallel_compiled_in
- refs/heads/abadams/atomic_vector_non_recursive
- refs/heads/abadams/averaging_tree
- refs/heads/abadams/avoid_name_mangling_in_cross_module_dependencies
- refs/heads/abadams/better_absd
- refs/heads/abadams/better_codegen_for_non_const_ramps
- refs/heads/abadams/bgu_cholesky
- refs/heads/abadams/braces_around_statements
- refs/heads/abadams/cache_tighten_producer_consumer_nodes
- refs/heads/abadams/check_reorder_dups
- refs/heads/abadams/clarify_broadcast_shuffle
- refs/heads/abadams/compositing_app
- refs/heads/abadams/cond_wait_spin
- refs/heads/abadams/cse_in_unroll_split_tuples
- refs/heads/abadams/custom_cuda_context
- refs/heads/abadams/custom_cuda_context_2
- refs/heads/abadams/custom_cuda_context_3
- refs/heads/abadams/d3d12abi
- refs/heads/abadams/deflake_mullapudi_reorder
- refs/heads/abadams/delete_prepare_for_early_exit
- refs/heads/abadams/depthwise_separable_conv
- refs/heads/abadams/diagnose_boundary_condition_failure
- refs/heads/abadams/disable_onnx_app_on_mac
- refs/heads/abadams/divide_using_pavgw
- refs/heads/abadams/dont_link_to_cudart
- refs/heads/abadams/dont_reinterpret_concat
- refs/heads/abadams/early_out
- refs/heads/abadams/enable_f16c
- refs/heads/abadams/extract_concat_bits
- refs/heads/abadams/fast_integer_divide_round_to_zero
- refs/heads/abadams/faster_runtime_integer_division
- refs/heads/abadams/faster_unroll
- refs/heads/abadams/fix-arm-seg2
- refs/heads/abadams/fix_4211
- refs/heads/abadams/fix_5323
- refs/heads/abadams/fix_5329
- refs/heads/abadams/fix_5889
- refs/heads/abadams/fix_6984
- refs/heads/abadams/fix_7229
- refs/heads/abadams/fix_7260
- refs/heads/abadams/fix_7365
- refs/heads/abadams/fix_7374
- refs/heads/abadams/fix_7504
- refs/heads/abadams/fix_7514
- refs/heads/abadams/fix_7531
- refs/heads/abadams/fix_7584
- refs/heads/abadams/fix_7584_v2
- refs/heads/abadams/fix_7742
- refs/heads/abadams/fix_7756
- refs/heads/abadams/fix_7761
- refs/heads/abadams/fix_7768
- refs/heads/abadams/fix_7786
- refs/heads/abadams/fix_7810
- refs/heads/abadams/fix_7811
- refs/heads/abadams/fix_7815
- refs/heads/abadams/fix_7867
- refs/heads/abadams/fix_7871
- refs/heads/abadams/fix_7872
- refs/heads/abadams/fix_7873
- refs/heads/abadams/fix_7888
- refs/heads/abadams/fix_7890
- refs/heads/abadams/fix_7891
- refs/heads/abadams/fix_7892
- refs/heads/abadams/fix_7893
- refs/heads/abadams/fix_7906
- refs/heads/abadams/fix_7909
- refs/heads/abadams/fix_7968
- refs/heads/abadams/fix_8038
- refs/heads/abadams/fix_8054
- refs/heads/abadams/fix_arm_fcvtmp
- refs/heads/abadams/fix_autoschedule_feature_transposition
- refs/heads/abadams/fix_cse_name_collisions
- refs/heads/abadams/fix_cuda_mat_mul_assert
- refs/heads/abadams/fix_deinterleave_bug
- refs/heads/abadams/fix_deinterleave_for_reinterpret
- refs/heads/abadams/fix_div_round_to_zero
- refs/heads/abadams/fix_fft_compile_time_regression
- refs/heads/abadams/fix_generate_output_snippets
- refs/heads/abadams/fix_if_nesting_condition
- refs/heads/abadams/fix_leaks_in_memoize_test
- refs/heads/abadams/fix_lgtm_warnings
- refs/heads/abadams/fix_links_to_master
- refs/heads/abadams/fix_load_of_broadcast
- refs/heads/abadams/fix_lossless_cast_of_sub
- refs/heads/abadams/fix_onnx_app
- refs/heads/abadams/fix_pointless_lower_condition
- refs/heads/abadams/fix_potential_gpu_deadlock
- refs/heads/abadams/fix_realize_condition_depends_on_tuple
- refs/heads/abadams/fix_reduce_expr_modulo_of_vector
- refs/heads/abadams/fix_riscv_vx_vi
- refs/heads/abadams/fix_round
- refs/heads/abadams/fix_stencil_chain_gpu_schedule
- refs/heads/abadams/fix_track_bounds_intervals
- refs/heads/abadams/fix_tutorial_2
- refs/heads/abadams/forward_partition_methods
- refs/heads/abadams/fully_fused_depthwise_separable_conv
- refs/heads/abadams/fuzz_sliding_window
- refs/heads/abadams/gaussian_blur_app
- refs/heads/abadams/generator_infinite_default_timeout
- refs/heads/abadams/gpu_autoscheduler_parallel_random_probes
- refs/heads/abadams/include_riscv_in_readme
- refs/heads/abadams/interleave_nested_vector
- refs/heads/abadams/ir_match_by_ref
- refs/heads/abadams/lerp_plus_cast
- refs/heads/abadams/local_laplacian_code_size
- refs/heads/abadams/lower_halving_sub
- refs/heads/abadams/lower_rounding_shift_right
- refs/heads/abadams/mac-arm-fixes
- refs/heads/abadams/make_fast_inverse_test_throughput_limited
- refs/heads/abadams/makefile_serialization_support
- refs/heads/abadams/mismatched_new_delete
- refs/heads/abadams/mixed_sign_mul_shift_right
- refs/heads/abadams/mixed_width_mul_shift_right
- refs/heads/abadams/multiple_scatter
- refs/heads/abadams/mux_intrinsic
- refs/heads/abadams/name_helpers
- refs/heads/abadams/narrow_predicates
- refs/heads/abadams/nested_vectorization_compile_time_regression_fix
- refs/heads/abadams/nested_vectorization_tweaks
- refs/heads/abadams/parallel_simd_op_check
- refs/heads/abadams/per_instance_profiling
- refs/heads/abadams/precompute_shared_mem_size
- refs/heads/abadams/prefer_no_gather
- refs/heads/abadams/print_uncaught_exception
- refs/heads/abadams/promote_fixed_point_intrinsics
- refs/heads/abadams/psabdw
- refs/heads/abadams/random_pipelines
- refs/heads/abadams/rationalize_gpu_for_loop_names
- refs/heads/abadams/reenable_unscheduled_stage_warning
- refs/heads/abadams/reinterpret_vector
- refs/heads/abadams/remove_arch_os_for_shaders
- refs/heads/abadams/remove_bad_pruning
- refs/heads/abadams/remove_parameter_self_references
- refs/heads/abadams/remove_readnone_on_functions
- refs/heads/abadams/remove_use_of_python_config_in_onnx_makefile
- refs/heads/abadams/reschedule_bgu
- refs/heads/abadams/reschedule_bilateral_grid
- refs/heads/abadams/rewrite_atomic_pass
- refs/heads/abadams/rounding_shift_right_use_average
- refs/heads/abadams/rungenmain_error
- refs/heads/abadams/sampling_profiler_overhead_v2
- refs/heads/abadams/scope_improvements
- refs/heads/abadams/simpler_broadcasts
- refs/heads/abadams/simplify_correlated_pyramid
- refs/heads/abadams/siotas_20
- refs/heads/abadams/sioutas_20
- refs/heads/abadams/slide_over_split_loop
- refs/heads/abadams/sorting_network_working_branch
- refs/heads/abadams/stable_topological_order
- refs/heads/abadams/string_view
- refs/heads/abadams/strip_asserts_last
- refs/heads/abadams/switch_stmt
- refs/heads/abadams/target_specific_lerp
- refs/heads/abadams/time_lowering_passes
- refs/heads/abadams/track_failedness_through_solver_lets
- refs/heads/abadams/turn_off_slp_vectorization_for_avx512
- refs/heads/abadams/tweak_unpack_buffers
- refs/heads/abadams/undo_pointless_widening
- refs/heads/abadams/unordered_blocks
- refs/heads/abadams/unsigned_demosaic
- refs/heads/abadams/update_makefile_for_llvm_19
- refs/heads/abadams/use_arm_for_runtime_triple
- refs/heads/abadams/use_pmaddubsw_for_downsample
- refs/heads/abadams/validate_gpu_schedules
- refs/heads/abadams/vector_reduce_hexagon_predicate
- refs/heads/abadams/vector_scan
- refs/heads/abadams/vst_type_fix
- refs/heads/abadams/widening_let_bug
- refs/heads/abadams/x86_avg
- refs/heads/abadams/zen4
- refs/heads/adadams/profile_allocator
- refs/heads/add_image_checks_after_bounds_inference_plus_new_rules
- refs/heads/add_outermost_to_extern
- refs/heads/add_vectorization_to_search_space
- refs/heads/aelphy/feature_cadence_changes
- refs/heads/aelphy/float_extracts
- refs/heads/align_loads_comment_fix
- refs/heads/alina-strided-store
- refs/heads/another_buffer_copy_fix
- refs/heads/arm_sve_redux
- refs/heads/ataei-block_asserts-codegen
- refs/heads/ataei-debug_info
- refs/heads/ataei-fix-pow
- refs/heads/ataei-gen_str_param
- refs/heads/ataei-implicit_lhs_vars
- refs/heads/ataei-onnx
- refs/heads/ataei-onnx_converter_update
- refs/heads/ataei-onnx_pybind
- refs/heads/ataei-resnet50_benchmarks
- refs/heads/ataei-standalone_autoscheduler
- refs/heads/ataei_lots_of_inputs
- refs/heads/auto_sched_benchmarks
- refs/heads/auto_sched_estimates
- refs/heads/auto_sched_inline
- refs/heads/auto_sched_test_notparallel
- refs/heads/autoschedule_top_down
- refs/heads/autoschedule_with_convnet
- refs/heads/autoscheduler_scalar_imageparam_fix
- refs/heads/backports/10.x
- refs/heads/backports/11.x
- refs/heads/backports/12.x
- refs/heads/backports/13.x
- refs/heads/balance_expressions
- refs/heads/bazel
- refs/heads/benchmarks
- refs/heads/blaze
- refs/heads/bounds_buffer_lets_fix
- refs/heads/bounds_correct_vs_bounds_loaded_reduced
- refs/heads/buffer_device_api_target
- refs/heads/bug_device_free
- refs/heads/bug_inline_unbounded
- refs/heads/build/fix-xcode-2
- refs/heads/build/manylinux-fixes
- refs/heads/circ_buffer
- refs/heads/cmake-no-runtime-debug-symbols
- refs/heads/cmake/asan
- refs/heads/cmake/deps-cleanup
- refs/heads/cmake/find-modules
- refs/heads/cmake/spirv
- refs/heads/cmake_wasm_features
- refs/heads/compute_at_guard_with_if_goes_on_stack
- refs/heads/compute_with_at
- refs/heads/compute_with_check
- refs/heads/compute_with_excessive_bounds
- refs/heads/compute_with_inlined
- refs/heads/compute_with_remove_is_right_level
- refs/heads/cpack/nuget
- refs/heads/ctest/wrappers
- refs/heads/cuda-constant
- refs/heads/d3d12-allocation-cache
- refs/heads/deferred_cse_after_inlining
- refs/heads/destructor_calls_deinit
- refs/heads/dg/deserialize_unmapped_objects
- refs/heads/dg/fix_vulkan_codegen_bool_conversion
- refs/heads/dg/vulkan_conform_api
- refs/heads/dg/vulkan_region_allocator_fixes
- refs/heads/dgerstmann/fix-vulkan-memory-config-init
- refs/heads/disable_acquire_release_test_vulkan
- refs/heads/distinct_wrapper_names
- refs/heads/dkg/6863_asan_fixes
- refs/heads/dkg/vulkan
- refs/heads/dpalermo_dmabuf
- refs/heads/dpalermo_dmabuf_libion
- refs/heads/dpalermo_hexagon_remote_202003
- refs/heads/dpalermo_sdk4_2_0_2
- refs/heads/ds/buffer-get-pure
- refs/heads/ds/opt-tile-size
- refs/heads/ds/tail-none
- refs/heads/ds/while
- refs/heads/dsharletg/bitwise-intrinsics
- refs/heads/dsharletg/find-vector-reduce
- refs/heads/dsharletg/jit-optimization
- refs/heads/dsharletg/memcpy-copy_from
- refs/heads/dsharletg/pattern-headroom
- refs/heads/dsharletg/refactor-host-alignment
- refs/heads/dsharletg/runtime-size
- refs/heads/dsharletg/simplify-abs
- refs/heads/dsharletg/simplify-type-bounds
- refs/heads/dsharletg/specialize-bounds
- refs/heads/dsharletg/upsample-channels
- refs/heads/empty_prefetch
- refs/heads/emscripten_vector_fix
- refs/heads/export_all-wsmoses
- refs/heads/expr_auto_sched
- refs/heads/extern_bugs
- refs/heads/extern_host_alloc
- refs/heads/factor_parallel_codegen_hack
- refs/heads/fast_sync_tsan
- refs/heads/faster_integer_division
- refs/heads/feature/apps-external
- refs/heads/feature/cmake-presets
- refs/heads/feature/convert
- refs/heads/feature/f16_interleave
- refs/heads/feature/gather_load_q7
- refs/heads/feature/llvm-codemodel
- refs/heads/feature/load_predicated
- refs/heads/feature/luma_regression
- refs/heads/feature/maintanence
- refs/heads/feature/reinterprets
- refs/heads/feature/tcm_bump_allocator
- refs/heads/feature/xtensa_fix_interleave_q8
- refs/heads/feature/xtensa_q8_tests
- refs/heads/find_intrinsics_issue
- refs/heads/find_intrinsics_widening_lets
- refs/heads/fix-floated-pure-stage
- refs/heads/fix-race-condition
- refs/heads/fix_hexagon_alignment
- refs/heads/fix_hvx_intrinsics
- refs/heads/fix_prefetch_test
- refs/heads/fix_windows_vs15_build
- refs/heads/fixed_length_vectors
- refs/heads/fixed_point_local_laplac
- refs/heads/gemmlowp
- refs/heads/generate
- refs/heads/gha/pip
- refs/heads/gpu_canon_fix
- refs/heads/halide_ir_flatbuffer
- refs/heads/hex_dma2_async
- refs/heads/hexagon_le_runtime
- refs/heads/hexagon_priority
- refs/heads/hexagon_setpriority
- refs/heads/hexagon_strided_pred_load
- refs/heads/hexagon_sysmon_markers
- refs/heads/imaging-synthesis
- refs/heads/includes_fix
- refs/heads/ios_fast_sync_fix
- refs/heads/jia-kai-fix-runtime-cuda-init
- refs/heads/kamil-openglcompute-infinity
- refs/heads/kamil/name_pthread_workers
- refs/heads/kp_bit_shift
- refs/heads/line_buffer
- refs/heads/loop_carry_not_working
- refs/heads/lower_on_huge_stack
- refs/heads/main
- refs/heads/master
- refs/heads/memoize_with_extents
- refs/heads/metal_float16
- refs/heads/metaprogrammed_simplifier_mod
- refs/heads/mohamedadaly-vmlal
- refs/heads/more_powerful_sliding
- refs/heads/new_autoschedule_with_new_simplifier_arm_worker_branch
- refs/heads/new_autoscheduler
- refs/heads/new_simplifier_rule_testing
- refs/heads/newer_ion_ioctl
- refs/heads/no_bounds_query_when_bounds_used
- refs/heads/opengl_compute_buffer_types_fix
- refs/heads/openglcompute_reuse_shared_allocations
- refs/heads/optmize_reorder
- refs/heads/par_for_opt
- refs/heads/pdb/fix_7806
- refs/heads/pdb/hexagon_remote_cmake
- refs/heads/pdb_add_libcpp_makefile_inc
- refs/heads/pdb_eliminate_interleaves_test
- refs/heads/pdb_fix_clang_build
- refs/heads/pdb_fix_install_qc
- refs/heads/pdb_fix_loop_carry
- refs/heads/pdb_fix_simd_op_check_hvx
- refs/heads/pdb_mul_div_mod_multi_thread
- refs/heads/pdb_remove_hvx_v64
- refs/heads/perform_inline_with_order
- refs/heads/pr/2572
- refs/heads/pr/2676
- refs/heads/pr/2975
- refs/heads/pr/3017
- refs/heads/pr/3081
- refs/heads/pr/3387
- refs/heads/pr/3939
- refs/heads/pr/3960
- refs/heads/pr/4380
- refs/heads/pr/4414
- refs/heads/pr/5331
- refs/heads/pr/5438
- refs/heads/pr/5455
- refs/heads/pr/5758_2
- refs/heads/predicated_vector
- refs/heads/prefetch_specialize
- refs/heads/print_schedule
- refs/heads/profile_hardware_counters
- refs/heads/random-pipelines
- refs/heads/rdom_with_pure_vars
- refs/heads/readme-fix-gcd
- refs/heads/realization_order
- refs/heads/refactor_module
- refs/heads/register_promotion
- refs/heads/release/10.x
- refs/heads/release/11.x
- refs/heads/release/12.x
- refs/heads/release/13.x
- refs/heads/release/14.x
- refs/heads/release/15.x
- refs/heads/release/16.x
- refs/heads/release/17.x
- refs/heads/release/8.x
- refs/heads/remove_max_on_fuse_factor
- refs/heads/reorder_rvar
- refs/heads/reset_unique_counter
- refs/heads/revert-3612-ataei-speedup_compiletime
- refs/heads/revert-7009-rootjalex/distribute-w_shl
- refs/heads/revert-7601-compile_hexagon_remote
- refs/heads/riscv_update
- refs/heads/rl_simplifier_rules
- refs/heads/rootjalex/add_simpl_rules
- refs/heads/rootjalex/arm-optimize
- refs/heads/rootjalex/autoscheduler_mcts
- refs/heads/rootjalex/bounds-rewriter
- refs/heads/rootjalex/bounds_synthesis
- refs/heads/rootjalex/cbounds
- refs/heads/rootjalex/cbounds_predicated
- refs/heads/rootjalex/fix-sat-overflow
- refs/heads/rootjalex/fix_estimate_issue
- refs/heads/rootjalex/fix_failed_unrolls
- refs/heads/rootjalex/gsoc_codegen
- refs/heads/rootjalex/improve_cbounds_fixed
- refs/heads/rootjalex/improve_constant_bounds
- refs/heads/rootjalex/pitchfork-arm
- refs/heads/rootjalex/reinterpret-simplify
- refs/heads/rootjalex/rts
- refs/heads/rootjalex/super_simplify_bounds
- refs/heads/rootjalex/test_cbounds_fixed
- refs/heads/rootjalex/test_constant_bounds
- refs/heads/rootjalex/trs-codegen
- refs/heads/rootjalex/trs-codegen-cross
- refs/heads/rootjalex/trs-merge
- refs/heads/rootjalex/uint32-int32-cast
- refs/heads/rootjalex/x86-hadds
- refs/heads/rootjalex/x86-optimize
- refs/heads/rootjalex/x86-optimize-test
- refs/heads/rootjalex/x86-sat
- refs/heads/rootjalex/x86-test
- refs/heads/rule_removal_experiments
- refs/heads/schedule-output-storage
- refs/heads/separate_bounds_query_entrypoint
- refs/heads/shallow
- refs/heads/shift_amount_type_change
- refs/heads/shoaibkamil/cmake-without-arm
- refs/heads/shoaibkamil/correct_memory_fences
- refs/heads/shoaibkamil/d3d-fixes
- refs/heads/shoaibkamil/deprecate_openglcompute
- refs/heads/shoaibkamil/json
- refs/heads/shoaibkamil/llvm_clone_tag
- refs/heads/shoaibkamil/minor-vcpkg-doc-change
- refs/heads/shoaibkamil/opengl_compute_tests
- refs/heads/shoaibkamil/performance_tests_as_generators
- refs/heads/shoaibkamil/rule_removal_experiments
- refs/heads/shoaibkamil/super_simplify_with_interpreter
- refs/heads/shoaibkamil/windows-arm-fix-attributes
- refs/heads/sim_shlib_addr_print
- refs/heads/simplify-nested-broadcasts
- refs/heads/simplify-vectorreduce-shuffles2
- refs/heads/simplify_mod
- refs/heads/sioutas_2020
- refs/heads/sioutas_2020_autoscheduler
- refs/heads/slomp/gpu-codegen-profiling
- refs/heads/slomp/msvc-static-analysis
- refs/heads/solve_div
- refs/heads/solve_div_master
- refs/heads/solve_div_simplifier_test
- refs/heads/sr/python-late-binding-defaults
- refs/heads/srj-aaa
- refs/heads/srj-alloc
- refs/heads/srj-alloca
- refs/heads/srj-appmake2
- refs/heads/srj-armv83a
- refs/heads/srj-aslog
- refs/heads/srj-assert
- refs/heads/srj-assoc
- refs/heads/srj-auto-multi
- refs/heads/srj-auto-multi2
- refs/heads/srj-auto_schedule_mat_mul
- refs/heads/srj-autosched
- refs/heads/srj-b2cpphide
- refs/heads/srj-barr
- refs/heads/srj-bits
- refs/heads/srj-blacklist
- refs/heads/srj-bounds
- refs/heads/srj-bufcalltype
- refs/heads/srj-bufcallwrap
- refs/heads/srj-bufcallwrap2
- refs/heads/srj-buffer
- refs/heads/srj-bv
- refs/heads/srj-classic-autotune
- refs/heads/srj-clean
- refs/heads/srj-constcall
- refs/heads/srj-crosscompile
- refs/heads/srj-ctlz
- refs/heads/srj-cvec-patch
- refs/heads/srj-dag
- refs/heads/srj-debug-to-file
- refs/heads/srj-deir
- refs/heads/srj-f16
- refs/heads/srj-fp16
- refs/heads/srj-fsch
- refs/heads/srj-fthru
- refs/heads/srj-g2
- refs/heads/srj-g3
- refs/heads/srj-gha-test-fixes
- refs/heads/srj-hidden
- refs/heads/srj-hide2
- refs/heads/srj-hvx
- refs/heads/srj-hvx-bug
- refs/heads/srj-hvx-codegen-bug
- refs/heads/srj-hvx-nocopy
- refs/heads/srj-hvxshift
- refs/heads/srj-iib
- refs/heads/srj-initshape
- refs/heads/srj-inv
- refs/heads/srj-ir
- refs/heads/srj-irmut2
- refs/heads/srj-iwyu
- refs/heads/srj-iwyu3
- refs/heads/srj-javascript_work_in_progress
- refs/heads/srj-lensblur
- refs/heads/srj-lessinc
- refs/heads/srj-llvm-loop-opt
- refs/heads/srj-mak
- refs/heads/srj-maxthreads
- refs/heads/srj-mod
- refs/heads/srj-msan
- refs/heads/srj-msan-call
- refs/heads/srj-muldivmod
- refs/heads/srj-mut
- refs/heads/srj-outputs-2
- refs/heads/srj-parse
- refs/heads/srj-pch
- refs/heads/srj-printfunc
- refs/heads/srj-pygp
- refs/heads/srj-revertbits
- refs/heads/srj-schedule-storage
- refs/heads/srj-shl-shr-2
- refs/heads/srj-sio
- refs/heads/srj-static-const
- refs/heads/srj-strided-store
- refs/heads/srj-tidyh
- refs/heads/srj-tiff
- refs/heads/srj-trace
- refs/heads/srj-tutorial
- refs/heads/srj-using
- refs/heads/srj-wasmfix
- refs/heads/srj-xor2
- refs/heads/srj/abstract-gen-without-get-output-func-KEEP
- refs/heads/srj/aligned-alloc
- refs/heads/srj/aligned-alloc-2
- refs/heads/srj/aligned-malloc-with-aligned-alloc
- refs/heads/srj/all-explicit-ctor
- refs/heads/srj/anderson-thread-info-ptr
- refs/heads/srj/aot-perf
- refs/heads/srj/argv-signatures
- refs/heads/srj/argv-types
- refs/heads/srj/async-test
- refs/heads/srj/b2cpp-const-data
- refs/heads/srj/better-xt-dispatch
- refs/heads/srj/bfloat1
- refs/heads/srj/bp
- refs/heads/srj/build_halide_h
- refs/heads/srj/c-bool
- refs/heads/srj/cache-clear
- refs/heads/srj/clang-fmt-ignore
- refs/heads/srj/clang-tidy
- refs/heads/srj/clear-c-cache
- refs/heads/srj/cmake-asan
- refs/heads/srj/cmake-asan2
- refs/heads/srj/cmake-jit-generators
- refs/heads/srj/configure-cmake
- refs/heads/srj/cpp-generator-v2-experiment-KEEP
- refs/heads/srj/crosscompile
- refs/heads/srj/ctad
- refs/heads/srj/depr
- refs/heads/srj/deprecation
- refs/heads/srj/device-copy
- refs/heads/srj/example
- refs/heads/srj/experiment
- refs/heads/srj/experiment-6967
- refs/heads/srj/exporting
- refs/heads/srj/expr_t
- refs/heads/srj/external-tensors
- refs/heads/srj/fix-pytorch
- refs/heads/srj/fixed-rollback
- refs/heads/srj/fopen-fix
- refs/heads/srj/forward
- refs/heads/srj/forward-name
- refs/heads/srj/gen-func
- refs/heads/srj/gen-func-2
- refs/heads/srj/gen-func-3
- refs/heads/srj/gen2-1
- refs/heads/srj/gen_closure
- refs/heads/srj/generator_aot_gpu_multi_context_threaded
- refs/heads/srj/globals
- refs/heads/srj/halide-buffer-crop
- refs/heads/srj/halide-malloc-alignment
- refs/heads/srj/halide-must-use
- refs/heads/srj/halide-runtime-must-use-result
- refs/heads/srj/hang-repro
- refs/heads/srj/hannk
- refs/heads/srj/hannk-aliasing
- refs/heads/srj/hannk-error-checking
- refs/heads/srj/hannk-errors
- refs/heads/srj/hannk-inplace
- refs/heads/srj/hannk-mmap
- refs/heads/srj/hannk-tflite-27
- refs/heads/srj/hannk-verbosity
- refs/heads/srj/hdrs
- refs/heads/srj/html-becomes-viz
- refs/heads/srj/implicit-mult-widening
- refs/heads/srj/issue-7076
- refs/heads/srj/iwyu
- refs/heads/srj/iwyu-2
- refs/heads/srj/iwyu-6
- refs/heads/srj/libHANNK
- refs/heads/srj/llvm_type_of
- refs/heads/srj/maybe-unused
- refs/heads/srj/meanop
- refs/heads/srj/metadata-calling-convention
- refs/heads/srj/more-tidy
- refs/heads/srj/msan-dtf
- refs/heads/srj/multimeta
- refs/heads/srj/nanobind
- refs/heads/srj/new-rt-1
- refs/heads/srj/no-threadpool
- refs/heads/srj/no-timeout-thread
- refs/heads/srj/oglc-mutexed
- refs/heads/srj/param-map
- refs/heads/srj/pip-15.x
- refs/heads/srj/pip-cron
- refs/heads/srj/possible-uninited
- refs/heads/srj/pr-7566
- refs/heads/srj/printer-size
- refs/heads/srj/profiler-data-race
- refs/heads/srj/ptr-int-cast
- refs/heads/srj/pyapps
- refs/heads/srj/pyext-fix
- refs/heads/srj/pygen-class
- refs/heads/srj/pygen-deux
- refs/heads/srj/pygen-func
- refs/heads/srj/pygen-native-types
- refs/heads/srj/pyinstall
- refs/heads/srj/pypi-try
- refs/heads/srj/pystuff
- refs/heads/srj/python-buffer-unpack
- refs/heads/srj/python-tutorial
- refs/heads/srj/reshape
- refs/heads/srj/rt-error-smallify
- refs/heads/srj/rt-return-types
- refs/heads/srj/runtime-error-handling
- refs/heads/srj/sat-fixes-exp
- refs/heads/srj/sat-fixes-exp-2
- refs/heads/srj/shadow-field
- refs/heads/srj/snprintf
- refs/heads/srj/spirv-license
- refs/heads/srj/stat-buf-deprecations
- refs/heads/srj/static-buffer-generators
- refs/heads/srj/stmt-html
- refs/heads/srj/stringify
- refs/heads/srj/synth-gen-params
- refs/heads/srj/synth-params-python
- refs/heads/srj/test-arm_sve_redux
- refs/heads/srj/test-intrinsics-bounds
- refs/heads/srj/test8076
- refs/heads/srj/test8078
- refs/heads/srj/test8094
- refs/heads/srj/test8105a
- refs/heads/srj/test8115
- refs/heads/srj/test_tmpdir_fix
- refs/heads/srj/tidy
- refs/heads/srj/tidy-format-14
- refs/heads/srj/tidymore
- refs/heads/srj/tidymore2
- refs/heads/srj/tls
- refs/heads/srj/tls-3
- refs/heads/srj/tls-4
- refs/heads/srj/tls-ucon
- refs/heads/srj/tmp-unschedule-experiment
- refs/heads/srj/tot-fix
- refs/heads/srj/try-revert-sat
- refs/heads/srj/type-traits
- refs/heads/srj/typed-func
- refs/heads/srj/ucon-all-const
- refs/heads/srj/ucon-non-const
- refs/heads/srj/visit-warnings
- refs/heads/srj/wasm-atomic2
- refs/heads/srj/wasm-simd
- refs/heads/srj/wasm-stuff
- refs/heads/srj/wasm-threads
- refs/heads/srj/wasm-updates
- refs/heads/srj/wasm-work
- refs/heads/srj/wip
- refs/heads/srj/x-rounding
- refs/heads/srj/xbuf
- refs/heads/srj/xc+plus+size+tmp
- refs/heads/srj/xc-types
- refs/heads/srj/xt-uint-cast-test
- refs/heads/srj/xtensa-arch
- refs/heads/srj/xtensa-merge
- refs/heads/srj/xvc-experimetn
- refs/heads/srj/zlib-embed
- refs/heads/standalone_autoscheduler
- refs/heads/standalone_autoscheduler_arm_worker
- refs/heads/standalone_autoscheduler_arm_worker_amazon
- refs/heads/standalone_autoscheduler_gpu
- refs/heads/standalone_autoscheduler_hexagon
- refs/heads/sticky_task_assignments
- refs/heads/store_with
- refs/heads/store_with_solver_for_super_simplify
- refs/heads/strict_float_cse_fix
- refs/heads/super_simplify
- refs/heads/super_simplify_v2
- refs/heads/super_simplify_v3
- refs/heads/transitive_wrapper
- refs/heads/trigger-release-v16
- refs/heads/tzumao-autodiff-boundarycond
- refs/heads/tzumao-gradient-autoscheduler-bug
- refs/heads/tzumao-predicate-store-load
- refs/heads/tzumao-python-buffer
- refs/heads/tzumao_autodiff_unbounded
- refs/heads/tzumao_improve_gradient_autoscheduler
- refs/heads/tzumao_issue_4297
- refs/heads/tzumao_licm_before_BI
- refs/heads/unbounded_bugs
- refs/heads/undo_async_copy_chain_black_list
- refs/heads/use_string_literals_for_blobs
- refs/heads/users/lukas/python-pip
- refs/heads/validate_sched_error_msg
- refs/heads/var_ir_fix
- refs/heads/vksnk/async-experiment
- refs/heads/vksnk/async-multiple-producers
- refs/heads/vksnk/async-order
- refs/heads/vksnk/better-loop-carry
- refs/heads/vksnk/better-message
- refs/heads/vksnk/bound-storage
- refs/heads/vksnk/bounds-widen-right
- refs/heads/vksnk/c-print-type
- refs/heads/vksnk/c-round
- refs/heads/vksnk/check-return-result
- refs/heads/vksnk/compute-with-bug
- refs/heads/vksnk/compute_with_async
- refs/heads/vksnk/dma-limit-channels
- refs/heads/vksnk/dma-min-max
- refs/heads/vksnk/expr-match-shuffle
- refs/heads/vksnk/extract-from-scalar
- refs/heads/vksnk/f16-load
- refs/heads/vksnk/fix-packvr
- refs/heads/vksnk/fix_halide_xtensa_narrow_with_rounding_shift_i16
- refs/heads/vksnk/fused-compute-with
- refs/heads/vksnk/hoist-storage-bug
- refs/heads/vksnk/lerp-intrinsics
- refs/heads/vksnk/lower-signed-shifts
- refs/heads/vksnk/missing-exception
- refs/heads/vksnk/non-widening-halves
- refs/heads/vksnk/optimize-shuffles
- refs/heads/vksnk/replace-all
- refs/heads/vksnk/restrict
- refs/heads/vksnk/roll-buffer
- refs/heads/vksnk/roundeven-arm
- refs/heads/vksnk/rvar-bounds
- refs/heads/vksnk/simplify-slice
- refs/heads/vksnk/skip-semaphores
- refs/heads/vksnk/storage-folding
- refs/heads/vksnk/strided-load-of-4_2
- refs/heads/vksnk/typed-scope
- refs/heads/vksnk/update-simd-driver
- refs/heads/vksnk/vectorize-bug
- refs/heads/vksnk/vectorize-scalarize
- refs/heads/vksnk/widening_absd
- refs/heads/vksnk/xtensa-codegen-fp16
- refs/heads/vksnk/xtensa-dma-improvements
- refs/heads/vksnk/xtensa-regroup-pass
- refs/heads/vksnk/xtensa/lift-allocs
- refs/heads/vulkan
- refs/heads/vulkan-diagnose-alloc-failures
- refs/heads/vulkan-phase0-adts
- refs/heads/vulkan-phase1-spirv
- refs/heads/vulkan-phase2-runtime
- refs/heads/vulkan2
- refs/heads/vulkan_fix_gpu_dynamic_shared_test
- refs/heads/vulkan_fix_subregion_memory_offsets
- refs/heads/webassembly-old
- refs/heads/winograd
- refs/heads/wording_fix
- refs/heads/xtensa-codegen
- refs/heads/xtensa-codegen-parallel
- refs/heads/xuanda/fix-serialize-bad-partition-always
- refs/remotes/origin/rootjalex/add_autosched_caching
- refs/tags/release_2018_02_15
- refs/tags/release_2019_08_27
- refs/tags/release_8.0.0
- refs/tags/v10.0.0
- refs/tags/v10.0.1
- refs/tags/v11.0.0
- refs/tags/v11.0.1
- refs/tags/v12.0.0
- refs/tags/v12.0.1
- refs/tags/v13.0.0
- refs/tags/v13.0.1
- refs/tags/v13.0.2
- refs/tags/v13.0.3
- refs/tags/v13.0.4
- refs/tags/v14.0.0
- refs/tags/v15.0.0
- refs/tags/v15.0.1
- refs/tags/v16.0.0
- refs/tags/v17.0.0
- refs/tags/v17.0.1
- refs/tags/v8.0.0
Take a new snapshot of a software origin
If the archived software origin currently browsed is not synchronized with its upstream version (for instance when new commits have been issued), you can explicitly request Software Heritage to take a new snapshot of it.
Use the form below to proceed. Once a request has been submitted and accepted, it will be processed as soon as possible. You can then check its processing state by visiting this dedicated page.Processing "take a new snapshot" request ...
Permalinks
To reference or cite the objects present in the Software Heritage archive, permalinks based on SoftWare Hash IDentifiers (SWHIDs) must be used.
Select below a type of object currently browsed in order to display its associated SWHID and permalink.
Revision | Author | Date | Message | Commit Date |
---|---|---|---|---|
5254117 | Steven Johnson | 20 February 2024, 17:13:06 UTC | HALIDE_VERSION_PATCH -> 1 (for 17.0.1) (#8113) * HALIDE_VERSION_PATCH -> 1 (for 17.0.1) * Update CMakeLists.txt | 20 February 2024, 17:13:06 UTC |
d15325e | Andrew Adams | 19 February 2024, 21:11:15 UTC | Cherry-pick some recent bug-fixes into 17.0.1 (#8107) * Fix rfactor adding too many pure loops (#8086) When you rfactor an update definition, the new update definition must use all the pure vars of the Func, even though the one you're rfactoring may not have used them all. We also want to preserve any scheduling already done to the pure vars, so we want to preserve the dims list and splits list from the original definition. The code accounted for this by checking the dims list for any missing pure vars and adding them at the end (just before Var::outermost()), but this didn't account for the fact that they may no longer exist in the dims list due to splits that didn't reuse the outer name. In these circumstances we could end up with too many pure loops. E.g. if x has been split into xo and xi, then the code was adding a loop for x even though there were already loops for xo and xi, which of course produces garbage output. This PR instead just checks which pure vars are actually used in the update definition up front, and then uses that to tell which ones should be added. Fixes #7890 * Forward the partition methods from generator outputs (#8090) * Fix reduce_expr_modulo of vector in Solve.cpp (#8089) * Fix reduce_expr_modulo of vector in Solve.cpp * Fix test | 19 February 2024, 21:11:15 UTC |
8f424e5 | Steven Johnson | 19 February 2024, 18:49:45 UTC | Cherry-pick picks for recent WebGPU API changes for a Halide 17.0.1 release (#8106) * Don't require Halide_WebGPU when using wasm (#8063) (#8065) * Don't require Halide_WebGPU when using wasm (#8063) * trigger buildbots * [WebGPU] Update to latest native headers (#8081) * [WebGPU] Update to latest native headers * Remove #ifdef for `requiredFeature[s]Count` * Pass nullptr to wgpuCreateInstance * Emscripten currently requires this * Dawn accepts it too * Use nullptr for another wgpuCreateInstance call --------- Co-authored-by: James Price <jrprice@google.com> | 19 February 2024, 18:49:45 UTC |
3577f88 | Andrew Adams | 01 February 2024, 17:46:10 UTC | Fix type error in VectorizeLoops (#8055) | 01 February 2024, 17:47:24 UTC |
2111594 | Andrew Adams | 26 January 2024, 20:01:41 UTC | Track whether or not let expressions failed to solve in solver (#7982) * Track whether or not let expressions failed to solve in solver After mutating an expression, the solver needs to know two things: 1) Did the expression contain the variable we're solving for 2) Was the expression successfully "solved" for the variable. I.e. the variable only appears once in the leftmost position. We need to know this to know property 1 of any subexpressions (i.e. does the right child of the expression contain the variable). This drives what transformations we do in ways that are guaranteed to terminate and not take exponential time. We were tracking property 1 through lets but not property 2, and this meant we were doing unhelpful transformations in some cases. I found a case in the wild where this made a pipeline take > 1 hour to compile (I killed it after an hour). It may have been in an infinite transformation loop, or it might have just been exponential. Not sure. * Remove surplus comma * Fix use of uninitialized value that could cause bad transformation | 01 February 2024, 17:47:24 UTC |
be6d6c6 | Andrew Adams | 26 January 2024, 17:26:12 UTC | Fix bounds_of_nested_lanes (#8039) * Fix bounds_of_nested_lanes bounds_of_nested_lanes assumed that one layer of nested vectorization could be removed at a time. When faced with the expression: min(ramp(x8(a), x8(b), 5), x40(27)) It panicked, because on the left hand side it reduced the bounds to x8(a) ... x8(a) + x8(b) * 4, and on the right hand side it reduced the bounds to 27. It then attempted to take a min of mismatched types. In general we can't assume that binary operators on nested vectors have the same nesting structure on both sides, so I just rewrote it to reduce directly to a scalar. Fixes #8038 | 01 February 2024, 17:47:24 UTC |
6d29ad5 | Steven Johnson | 13 December 2023, 17:02:37 UTC | Add missing Python bindings for various recent additions to Func and Stage (#8002) * Add missing Python bindings for various recent additions to Func and Stage We have been sloppy about maintaining these. Also added a bit of testing. * Update PyEnums.cpp | 13 December 2023, 17:02:37 UTC |
3d5cf40 | Martijn Courteaux | 12 December 2023, 17:50:56 UTC | Inject profiling for function calls to 'halide_copy_to_host' and 'halide_copy_to_device'. (#7913) * Inject profiling for function calls to 'halide_copy_to_host' and 'halide_copy_to_device'. * WIP: I get segfaults. The device_interface pointer is bogus. * Figured it out... * Allow global sync on d3d12. * Cleanly time all buffer copies as well. * Cleanup old comment. * Following Andrews suggestion for suffixing buffer copies in the profiler. * Sort the profiler report lines into three sections: funcs, buffer copy to device, and buffer copy to host. * Inject profiling for function calls to 'halide_copy_to_host' and 'halide_copy_to_device'. * WIP: I get segfaults. The device_interface pointer is bogus. * Figured it out... * Allow global sync on d3d12. * Cleanly time all buffer copies as well. * Cleanup old comment. * Following Andrews suggestion for suffixing buffer copies in the profiler. * Sort the profiler report lines into three sections: funcs, buffer copy to device, and buffer copy to host. * Attempt to fix output parsing. * Fix crash for copy_to_device * halide_device_sync_global(NULL) -> success * Fixed the buffer copy bug. Added a new test that will cause buffer copies in two directions within the compiled pipeline. This will catch this better in the future. Tweaked the profile report section header printing. * Clang-format, my dear friend... | 12 December 2023, 17:50:56 UTC |
357e646 | Steven Johnson | 08 December 2023, 19:17:30 UTC | Do some basic validation of Target Features (#7986) (#7987) * Do some basic validation of Target Features (#7986) * Update Target.cpp * Update Target.cpp * Fixes * Update Target.cpp * Improve error messaging. * format * Update Target.cpp | 08 December 2023, 19:17:30 UTC |
9c099c2 | Andrew Adams | 08 December 2023, 17:53:04 UTC | Teach unrolling to exploit conditions in enclosing ifs (#7969) * Teach unrolling to exploit conditions in enclosing ifs Fixes #7968 * Handle vectorization as well * Remove unused usings * Add missing print | 08 December 2023, 17:53:04 UTC |
9643518 | Steven Johnson | 08 December 2023, 17:50:32 UTC | Add join_strings() call and use it from mattrs() (#7997) * Add join_strings() call and use it from mattrs() This is a super-nit kind of fix, but the fact that we had rerolled a join-strings algo in a half-dozen places made my teeth hurt, so I decided to fix it: - Add join_strings() to Util.h - revise the mattrs() calls to use it instead of the janky mess they used This doesn't move the needle on code size or speed but it is less weird. Probably other places we could/should use this too. (Does C++20 have join/split strings in the std library yet? If not, why not?) * Update Util.h * Update Util.h * clang-tidy | 08 December 2023, 17:50:32 UTC |
19c1c81 | Steven Johnson | 08 December 2023, 16:50:01 UTC | Make wasm +sign-ext and +nontrapping-fptoint the default (#7995) * Make wasm +sign-ext and +nontrapping-fptoint the default These have been supported in ~all wasm runtimes for a while now, and +nontrapping-fptoint in particular can make a big performance difference. We should enable these by default, and add a new backdoor (wasm_mvponly) for code paths that need to use the original wasm Minimum Viable Product spec only. * Update simd_op_check_wasm.cpp | 08 December 2023, 16:50:01 UTC |
5aa891a | Steven Johnson | 07 December 2023, 18:03:06 UTC | Silence useless 'Outer dim vectorization of var' warning in Mullapudi… (#7992) Silence useless 'Outer dim vectorization of var' warning in Mullapudi scheduler | 07 December 2023, 18:03:06 UTC |
df36139 | Steven Johnson | 07 December 2023, 18:02:42 UTC | Fix all "unscheduled update()" warnings in our code (#7991) * Fix all "unscheduled update()" warnings in our code And also fix the Mullapudi scheduler to explicitly touch all update stages. This allows us to mark this warning as an error if we so choose. * fixes * fixes * Update recursive_box_filters.cpp | 07 December 2023, 18:02:42 UTC |
83febb0 | Andrew Adams | 07 December 2023, 17:46:27 UTC | Fix handling of assert statements whose conditions get vectorized (#7989) * Fix handling of assert statements whose conditions get vectorized * Fix test name | 07 December 2023, 17:46:27 UTC |
d1ecc1f | Andrew Adams | 07 December 2023, 16:06:57 UTC | Make narrowing float->int casts on wasm go via wider ints (#7973) Fixes #7972 | 07 December 2023, 16:06:57 UTC |
6e57d6c | Volodymyr Kysenko | 07 December 2023, 16:06:31 UTC | Add a notebook with a visualization of the aprrox_* functions and their errors (#7974) * Add a notebook with a visualization of the aprrox_* functions and their errors * Fix spelling error | 07 December 2023, 16:06:31 UTC |
9f6ec17 | Steven Johnson | 07 December 2023, 00:59:53 UTC | Silence useless "Insufficient parallelism" autoscheduler warning (#7990) | 07 December 2023, 00:59:53 UTC |
17b7366 | Steven Johnson | 06 December 2023, 23:03:14 UTC | Move canonical version numbers into source, not build system (#7980) (#7981) * Move canonical version numbers into source, not build system (#7980) * Fixes | 06 December 2023, 23:03:14 UTC |
209ec02 | Andrew Adams | 05 December 2023, 22:15:23 UTC | Add appropriate mattrs for arm-32 extensions (#7978) * Add appropriate mattrs for arm-32 extensions Fixes #7976 * Pull clauses out of if | 05 December 2023, 22:15:23 UTC |
17578a1 | Andrew Adams | 05 December 2023, 18:08:08 UTC | Add two new tail strategies for update definitions (#7949) * Add two new tail strategies for update definitions * Stop printing asm * Update expected number of partitions for Partition::Always * Add a comment explaining why the blend safety check is per dimension * Add serialization support for the new tail strategies * trigger buildbots * Add comment --------- Co-authored-by: Steven Johnson <srj@google.com> | 05 December 2023, 18:08:08 UTC |
dea2cf7 | Steven Johnson | 03 December 2023, 21:34:02 UTC | complete_x86_target() should enable F16C and FMA when AVX2 is present (#7971) All known AVX2-enabled architectures definitely have these features. | 03 December 2023, 21:34:02 UTC |
674e6cc | Andrew Adams | 01 December 2023, 21:18:20 UTC | Disallow async nestings that violate read after write dependencies (#7868) * Disallow async nestings that violate read after write dependencies Fixes #7867 * Add test * Add another failure case, and improve error message * Add some more tests * Update test * Add new test to cmakelists * Fix for llvm trunk * Always acquire the folding semaphore, even if unused * Skip async_order test under wasm * trigger buildbots --------- Co-authored-by: Volodymyr Kysenko <vksnk@google.com> Co-authored-by: Steven Johnson <srj@google.com> | 01 December 2023, 21:18:20 UTC |
4fc2a7d | Steven Johnson | 01 December 2023, 00:31:48 UTC | Handle many more intrinsics in Bounds.cpp (#7823) * Handle many more intrinsics in Bounds.cpp This addresses many (but not all) of the `signed integer overflow` issues we're seeing in Google due to https://github.com/halide/Halide/pull/7814 -- a lot of the issues seems to be in code that uses intrinsics that had no handling in value bounds checking, so the bounds were naively large and overflowed. - Most of the intrinsics from FindIntrinsics.h weren't handled; now they all are (most by lowering to other IR, though the halving_add variants were modeled directly because the bitwise ops don't mesh well) - strict_float() is just a pass-through - round() is a best guess (basically, if bounds exist, expand by one as a worst-case) There are definitely others we should handle here... trunc/floor/ceil probably? * Fix round() and strict_float() handling * Update Bounds.cpp * Fixes? * trigger buildbots * Revert saturating_cast handling * Update Bounds.cpp --------- Co-authored-by: Andrew Adams <andrew.b.adams@gmail.com> | 01 December 2023, 00:31:48 UTC |
3136819 | Xuanda Yang | 30 November 2023, 17:59:30 UTC | [serialization] Add Halide version and serialization version in serialization format (#7905) * halide version * serialization version * format * Fix Makefile * trigger buildbots --------- Co-authored-by: Andrew Adams <andrew.b.adams@gmail.com> Co-authored-by: Steven Johnson <srj@google.com> | 30 November 2023, 17:59:30 UTC |
ad5dd20 | antonysigma | 29 November 2023, 17:31:12 UTC | Update instructions to include generated schedules (#7928) The generated schedule from the auto-scheduler can no longer be copy-n-pasted to the Generater source code. Update the tutorial to show how the generated schedules can be appled and included into Generator. Use case: version control and fine tuning of schedules. Resolves: #7148 See also: #7900 Co-authored-by: Steven Johnson <srj@google.com> | 29 November 2023, 17:31:12 UTC |
bf5f206 | Steven Johnson | 29 November 2023, 17:19:03 UTC | Remove inadvertently added generated file (#7966) | 29 November 2023, 17:19:03 UTC |
68f2bbd | Steven Johnson | 29 November 2023, 17:06:51 UTC | Revise Flatbuffers codegen style (#7964) * Rename the generated Flatbuffer headers The Blaze/Bazel rules for Flatbuffers are inflexible and require this naming pattern :-/ * Also update the flags to flatc * Fix lots of stuff * exclude from clang-format * ignore again | 29 November 2023, 17:06:51 UTC |
b7468af | Andrew Adams | 29 November 2023, 16:39:41 UTC | Attempt to fix nested vectorization gemm performance on new build bot (#7959) * Better (simpler) schedules for nested vectorization gemm * Remove early return * Empty-Commit --------- Co-authored-by: Steven Johnson <srj@google.com> | 29 November 2023, 16:39:41 UTC |
5175d16 | Andrew Adams | 28 November 2023, 21:59:21 UTC | Make the fast inverse test throughput-limited rather than latency-limited (#7958) Co-authored-by: Steven Johnson <srj@google.com> | 28 November 2023, 21:59:21 UTC |
2b23e07 | Steven Johnson | 28 November 2023, 16:05:52 UTC | Return values from stub functions in Deserialization (#7963) Needed to prevent "error: non-void function does not return a value" | 28 November 2023, 16:05:52 UTC |
9ce5fd6 | James Price | 28 November 2023, 14:54:03 UTC | [WebGPU] Update to latest native headers (#7932) * [WebGPU] Update to latest native headers * Update mini_webgpu.h with latest version from Dawn * Document this process * Remove an argument from wgpuQueueOnSubmittedWorkDone Fixes #7581 * [WebGPU] Note that wgpu is not yet supported * [WebGPU] Add https:// to external links in README * update to commit b5d38fc7dc2a20081312c95e379c4a918df8b7d4 * Update mini_webgpu.h --------- Co-authored-by: Steven Johnson <srj@google.com> | 28 November 2023, 14:54:03 UTC |
976ea0b | Derek Gerstmann | 28 November 2023, 00:55:41 UTC | [serialization] Serialize stub definitions of external parameters. (#7926) * Serialize stub definitions of external parameters. Add deserialize_parameter methods to allow the user to only deserialize the mapping of external parameters (and remap them to their own user parameters) prior to deserializing the full pipeline definition. * Clang tidy/format pass --------- Co-authored-by: Derek Gerstmann <dgerstmann@adobe.com> | 28 November 2023, 00:55:41 UTC |
8c28a73 | Andrew Adams | 21 November 2023, 23:27:21 UTC | Improve code size and compile time for local laplacian app (#7927) Improve code size and compile time for local laplacian and interpolate apps This reduces compile time for the manual local laplacian schedule from 4.9s to 2.2s, and reduces code size from 126k to 82k Most of the reduction comes from avoiding a pointless boundary condition in the output Func. A smaller amount comes from avoiding loop partitioning using RoundUp and Partition::Never. The Partition::Never calls are responsible for a 3% reduction in code size and compile times by themselves. This has basically no effect on runtime. It seems to reduce it very slightly, but it's in the noise. | 21 November 2023, 23:27:21 UTC |
04c21bf | Volodymyr Kysenko | 21 November 2023, 21:56:45 UTC | Always call lower_round_to_nearest_ties_to_even on arm32 (#7957) | 21 November 2023, 21:56:45 UTC |
f5a4e49 | Andrew Adams | 21 November 2023, 19:23:44 UTC | Add missing condition to if renesting rule (#7952) * Add missing condition to if renesting rule * Add test * clang-format | 21 November 2023, 19:23:44 UTC |
ad0f24e | Andrew Adams | 16 November 2023, 00:49:35 UTC | Track likely values through lets in loop partitioning (#7930) * Track likely values through lets in loop partitioning Fixes #7929 Improves runtime of lens_blur app by ~20% * Add uncaptured likely tags to selects in boundary condition helpers Now that we look through lets, we end up in more situations where both sides have a captured likely. * Better comments | 16 November 2023, 00:49:35 UTC |
0f65435 | Andrew Adams | 14 November 2023, 19:48:34 UTC | More targeted fix for gather instructions being slow on intel processors (#7945) See https://github.com/llvm/llvm-project/issues/70259 | 14 November 2023, 19:48:34 UTC |
f0cdd50 | Andrew Adams | 14 November 2023, 18:23:14 UTC | Delete unused function (#7925) | 14 November 2023, 18:23:14 UTC |
f25af7f | Haojian Wu | 09 November 2023, 05:27:20 UTC | Remove the deprecated API `llvm::Type::getInt8PtrTy` usage. (#7937) This API is removed in LLVM trunk now https://github.com/llvm/llvm-project/commit/7b9d73c2f90c0ed8497339a16fc39785349d9610. | 09 November 2023, 05:27:20 UTC |
3b4dc33 | Zalman Stern | 07 November 2023, 21:23:31 UTC | Make sure all Halide arithmetic scalar types can be named from the Generator interface. (#7934) * Make sure all Halide arithmetic scalar types can be named from the Generator interface. Specifically adding 64-bit signed and unsigned integers and making sure float16 and bfloat16 are fully supported and documented. Add a simple test for all the type names. (Don't use float16 and bfloat16 in the arithmetic as they do not compile with the C++ backend. The name mapping should still be tested but the types passed do not seem to be checked as the values are not used.) | 07 November 2023, 21:23:31 UTC |
256c2f2 | Xuanda Yang | 07 November 2023, 17:57:21 UTC | Add missing serialization of Dim::partition_policy (#7935) add missing serialization of Dim::partition_policy | 07 November 2023, 17:57:21 UTC |
e5bf7ab | Xuanda Yang | 06 November 2023, 23:36:56 UTC | Add special build for testing serialization via a serialization roundtrip in JIT compilation and fix serialization leaks (#7763) * add back JIT testing, enclosed in #ifdef blocks * fix typo * nits * WITH_SERIALIZATION_JIT->WITH_SERIALIZATION_JIT_ROUNDTRIP_TESTING * fix self-reference leaks: now uses weak function ptr in reverse function mappings * Move clang-tidy checks back to Linux Recent changes in the GHA runners for macOS don't play well with clang-tidy; rather than sink any more time into debugging it, I'm going to revert the relevant parts of #7746 so that it runs on the less-finicky Linux runners instead. * bogus * Update Generator.cpp * Update Generator.cpp * call copy_to_host before serializing buffers * throw an error if we serialize on-device buffer * Skip specialize_to_gpu * Update Pipeline.cpp * Skip two more tests * use serialize to memory during jit testing * makefile update * makefile fix * skip the tutorial if flatc is not there * fix * fix signature * fix makefile * trigger buildbot --------- Co-authored-by: Steven Johnson <srj@google.com> | 06 November 2023, 23:36:56 UTC |
e5ee753 | Zalman Stern | 03 November 2023, 00:27:03 UTC | Remove use of dynamic_cast. (#7931) Remove use of dynamic_cast to preserve compiling the Halide compiler without RTTI. | 03 November 2023, 00:27:03 UTC |
1865101 | Martijn Courteaux | 31 October 2023, 17:38:55 UTC | Loop Partitioning Policy through Stage::partition(VarOrRVar, LoopPartitionPolicy) (#7914) * Loop Partitioning Policy through Stage::partition(VarOrRVar, LoopPartitionPolicy) * Renamed LoopPartitionPolicy to Partition. Added tests in boundary_conditions to verify correctness of the code with and without loop partitioning. Added tests that validates that disabling loop partitioning works. * Include error-test for when partitioning is always requested, but none was performed. | 31 October 2023, 17:38:55 UTC |
0134c40 | Volodymyr Kysenko | 30 October 2023, 21:39:17 UTC | Improve the error message if you store_at without a compute_at (#7923) * Improve an error message * Clean up * Update messages | 30 October 2023, 21:39:17 UTC |
97573c6 | Volodymyr Kysenko | 27 October 2023, 21:21:26 UTC | Scheduling directive to hoist the storage of the function (#7915) * Minimal hoist_storage plumbing * HoistedStorage placeholder IR node * Basic hoist_storage test * Fully plumb through the HoistedStorage node * IRPrinter for HoistedStorage * Insert hoisted storage at the correct loop level * Progress * Formatted * Move out common code for creating Allocate node * Format * Emit Allocate at the HoistedStorage site * Collect all dependant vars * Basic test working * Progress * Substitute lets into allocation extents instead of lifting stuff * Infer bounds for the extends dependant on loop variables * Update tests * Remove old code * Remove old code * Better tests * More tests * Validate schedules with hoist_storage * Error test * Fix stupid mistake * More tests * Remove debug prints * Better errors * Add missing handler for inlined functions * Format * Comments * Format * Add some missing visit handlers * New line * Fix comment * Luckily we only have two build systems * Adds hoist_storage_root * Comment for IR node * Serialization support for HoistedStorage * Handle hoist_storage fo tuples * Handle multiple realize nodes * Move assert up * Better error message * Better loop bounds * Format * Updated error message * Happy clang-tidy happy me * An error message when compute is inlined, but store is not inlined * Only mutate lets which are needed * Update apps to use hoist_storage Some very minor performance gains, but mostly in the noise. Also switched the apps makefiles to emit stmt html by default instead of stmt, to take advantage of the new and improved stmt html. * Switch to stack of hoisted storages * Limit scope of lets for expansion * Break early * Skip substitute_in_all_lets * Re-use expanded min/extents * WebAssembly JIT does not support custom allocators * Change debug level to get more info about segfault * More debugging prints * Let's try aligned malloc * Revert "Change debug level to get more info about segfault" This reverts commit a5a689be8c6ad351674f3ced3bbf542335f91d75. * Revert "More debugging prints" This reverts commit bb6b8c1313cbdb9f355df20fd203ee02d485042e. --------- Co-authored-by: Andrew Adams <andrew.b.adams@gmail.com> | 27 October 2023, 21:21:26 UTC |
ed357c2 | Martijn Courteaux | 27 October 2023, 17:23:42 UTC | Fix bug mentioned by @antonysigma. (#7916) | 27 October 2023, 17:23:42 UTC |
cf01e97 | Andrew Adams | 27 October 2023, 17:22:31 UTC | Turn off SLP vectorization for avx512 only (#7918) Fixes #7917 | 27 October 2023, 17:22:31 UTC |
fffb8bd | Andrew Adams | 24 October 2023, 17:23:49 UTC | Fix read-after-write hazard analysis in storage folding (#7910) Explicitly mark which loops get loop-carry-dependencies inserted by sliding window to assist storage folding. Storage folding needs to know about this so it doesn't try to fold in a way that invalidates these read-after-write dependencies. It currently tries to prove the absence of hazards with box_contains(box_provided, box_required), but this is sometimes incorrect because box_provided could be conservatively large, and the code it analyses might not actually provide (store to) all the required (loaded from) values. It's simpler for sliding window to just tell storage folding when it inserts loop-carry-dependencies, and this is most simply done directly in the IR itself. Fixes #7909 | 24 October 2023, 17:23:49 UTC |
d023065 | Martijn Courteaux | 22 October 2023, 19:20:47 UTC | Hotfix reinterpret HTML (#7912) Hotfix reinterpret | 22 October 2023, 19:20:47 UTC |
739053d | Volodymyr Kysenko | 22 October 2023, 19:11:00 UTC | Check returned result in the test (#7911) * Check returned result of Callable * Format | 22 October 2023, 19:11:00 UTC |
872264c | Marcos Slomp | 20 October 2023, 21:33:13 UTC | Static analysis (MSVC) fixes for device_buffer_utils.h (#7904) * Static analysis (MSVC) fixes for device_buffer_utils.h * clang-format happiness * signed integer cast | 20 October 2023, 21:33:13 UTC |
2918854 | Martijn Courteaux | 20 October 2023, 17:37:38 UTC | Highlight groups for the HTML Stmt file and tooltips to reveal types. (#7887) * Highlight groups for the HTML Stmt file and tooltips to reveal types. * Cleaned up JS using eslint. * Remove commented code. | 20 October 2023, 17:37:38 UTC |
bd1d4df | Andrew Adams | 20 October 2023, 17:21:50 UTC | Stop interleaver from expanding the scope of letstmts (#7908) In the following code: let a = b in X let a = c in Y If Stmt X successfully had stores interleaved, it was re-nesting it like so: let a = b in X let a = c in Y This introduces a shadowed variable 'a', which is illegal at this stage of lowering. Fixes #7906 Also some drive-by fixes to earlier tests that had debugging code left in. | 20 October 2023, 17:21:50 UTC |
eb66c06 | Andrew Adams | 18 October 2023, 21:45:47 UTC | Don't lift loop vars outside of their loops in sliding window (#7896) Sliding window, when operating in the mode that shifts the consumer's loop min backwards a few iterations to cover the warmup, was capable of inappropriately lifting for loop vars inside that loop but outside the produce node of the slid Func. Fixes #7891 | 18 October 2023, 21:45:47 UTC |
5c97c3c | Andrew Adams | 17 October 2023, 16:17:58 UTC | Assignment is not associative (#7894) * Assignment is not associative * Fix internal tests | 17 October 2023, 16:17:58 UTC |
f9b90cb | Andrew Adams | 17 October 2023, 16:17:20 UTC | Disable warning for mismatched new/delete (#7897) | 17 October 2023, 16:17:20 UTC |
db207b9 | Andrew Adams | 17 October 2023, 16:16:33 UTC | Mutating if branches in isolation can break reachability analysis (#7895) Fixes #7892 | 17 October 2023, 16:16:33 UTC |
667d6ed | Andrew Adams | 16 October 2023, 17:12:50 UTC | Check for overflow in Type constructor (#7889) * Check for overflow in Type constructor * Don't try to construct illegal types | 16 October 2023, 17:12:50 UTC |
7e35494 | Andrew Adams | 14 October 2023, 11:29:33 UTC | Generate simpler LLVM IR for shuffles that recursively become broadcasts (#7902) * Generate simpler LLVM IR for shuffles that recursively become broadcasts * Don't re-codegen arg | 14 October 2023, 11:29:33 UTC |
51ad730 | Andrew Adams | 13 October 2023, 21:32:01 UTC | Attempted fixed datalayouts for llvm trunk (#7898) * Attempted fixed datalayouts for llvm trunk * Missed a few i128:128s | 13 October 2023, 21:32:01 UTC |
a3911bb | Martijn Courteaux | 12 October 2023, 18:38:44 UTC | Explicitly name the allocgroups on GPU schedules "allocgroup__..." (#7883) * 50cents readibility improvement to allocgroups on GPU schedules. * Improve allocation group prefix: only if the alloc group cluster contains more than 1 allocation prepend the prefix. | 12 October 2023, 18:38:44 UTC |
509140a | antonysigma | 09 October 2023, 19:08:06 UTC | Implement elementwise complex value division (#7848) Implement the logic: (a + bj) / (c + dj) as an inline operator/() function. Use case: direct FFT method to solve linear least square problem, namely: ```math \begin{align} f(x) &=\Vert F^T D F x - b \Vert_2^2 \\ \arg \min_{x \in \mathbb{R}} f(x) &= F^T \left[ D^{-1} F b \right] \end{align} ``` where `D` is a diagonal complex-valued matrix representing image blur kernel, `b` is an ordinary image in vectorized form. | 09 October 2023, 19:08:06 UTC |
9293655 | Andrew Adams | 09 October 2023, 19:07:28 UTC | Update README.md to include RISCV in llvm build instructions (#7878) | 09 October 2023, 19:07:28 UTC |
b607129 | Martijn Courteaux | 06 October 2023, 19:40:35 UTC | HTML Stmt IR with conceptual code and device code. (#7843) * WIP: Conceptual Stmt IR and HTML cleanup. * WIP: Lots of progress on Stmt HTML. Cleanup almost complete. * Support scrolling to device code. * Resizing works decent enough for me. Fix cost-model allocate block costs. * Print better vector_reduce calls. * Optionally enable VizTree through an env variable. * Fix the device code tab for non-PTX. * StmtHTML: Tabs renamed to panes. Fix linter warnings. Cut trailing 0 byte from device code buffers. * Fix clang-format. * Fixed typos and copy paste error. * Fix HL_EXTRA_OUTPUTS behaviour to respect the defaults. * Nuked VizTree * Finalize StmtToViz nuke and rename StmtToHTML. * Improved HTML correctness by running output through an online validator. Quite some bugs fixed. * Cost model visualization improvement. Fix button not being allowed in the checkbox/label combination. * Fix collapsing being triggered by jump-to-xxx buttons. * How did this work? * Process Andrew's feedback. * Process Andrew's feedback. * Process Andrew's feedback, part 3. * Improve color palette. Few minor improvements. * Clang-format... | 06 October 2023, 19:40:35 UTC |
24a64f8 | Andrew Adams | 06 October 2023, 18:40:46 UTC | Update onnx app to work with newer versions of protobuf (#7879) and to work on mac | 06 October 2023, 18:40:46 UTC |
120e5fd | Andrew Adams | 05 October 2023, 23:52:58 UTC | Consider all dimensions before deciding to slide over a new dimension (#7875) * Don't deduce unreachability from predicated out of bounds stores Fixes #7873 * Consider all dimensions before deciding to slide over a new dimension Even ones we've already slid over. The previous version of this code could try to slide over a loop where multiple dimensions depend on the loop var, because it ignored dimensions that had already been slid over. Moving a check resolves the issue. Fixes #7872 | 05 October 2023, 23:52:58 UTC |
51ab364 | Andrew Adams | 05 October 2023, 17:42:23 UTC | Validate for types when fusing Vars with RVars (#7877) * Fix for llvm trunk * Validate for types when fusing Vars with RVars Fixes #7871 * Commit test | 05 October 2023, 17:42:23 UTC |
39f12a7 | Andrew Adams | 05 October 2023, 16:15:09 UTC | Fix for llvm trunk (#7876) | 05 October 2023, 16:15:09 UTC |
c31e8f7 | Andrew Adams | 04 October 2023, 18:12:09 UTC | Don't deduce unreachability from predicated out of bounds stores (#7874) Fixes #7873 | 04 October 2023, 18:12:09 UTC |
a24071c | Derek Gerstmann | 28 September 2023, 21:30:44 UTC | [serialization] Add support to serialize to memory, and a basic serialization tutorial (#7760) * Add in-memory buffer serialize/deserialize support. * Add basic serialization tutorial * Clang format pass * Update doc strings to use Doxygen formatted args * Clear out data buffer during serialization * Update serialization tutorial to use simple blur example with ImageParam * Make parameter map optional for serialize #7849 Add error messages to deserializer for missing params Update tutorial * Clang format pass --------- Co-authored-by: Derek Gerstmann <dgerstmann@adobe.com> Co-authored-by: Steven Johnson <srj@google.com> | 28 September 2023, 21:30:44 UTC |
76ac233 | Andrew Adams | 27 September 2023, 16:42:07 UTC | Handle unreachable code in bounds inference (#7866) * Handle unreachable code in bounds inference * Avoid ambiguous constructor * IRVisitor -> IRGraphVisitor * Add success print | 27 September 2023, 16:42:07 UTC |
9f96b25 | Steven Johnson | 27 September 2023, 01:55:12 UTC | Prevent use of uninitialized scalar Parameters in JIT code (#7847, partial) (#7853) * Prevent use of uninitialized scalar Parameters in JIT code (#7847, partial) * Fix broken tests * Update Parameter.h * Update func_clone.cpp * Fix Generators too * Fixes * Update InferArguments.cpp * Fixes * pacify clang-tidy * fixes | 27 September 2023, 01:55:12 UTC |
3926b02 | Andrew Adams | 26 September 2023, 01:58:20 UTC | Respect input buffer constraints in root-level bounds inference exprs (#7865) * Respect input buffer constraints in bounds inference lets Fixes #7761 * Add test | 26 September 2023, 01:58:20 UTC |
05d5efa | Andrew Adams | 25 September 2023, 19:14:25 UTC | Handle nested vectorization in store predicates (#7864) Fix #7851 In one place in PartitionLoops and in another place in the simplifier we were neglecting to consider nested vectorization. I added the fuzzer output as a new test, because I have no idea how I'd generate this error with human-readable code. It stems from an interaction of several tail strategies. | 25 September 2023, 19:14:25 UTC |
26619d2 | Pranav Bhandarkar | 18 September 2023, 19:48:34 UTC | [Hexagon] - Fix 8-bit unsigned saturating downcasts for HVX (Fixes #7806) (#7825) * Dump the IR more frequently in HexagonOptimize.cpp * Fix 8bit unsigned saturating downcasts for HVX We do not have a way of reliably lowering the following expression to LLVM bitcode for HVX. u8_sat(uint16x) where uint16x is a vector (preferably a HVX double vector) with element type uint16. Since there is no native HVX instruction to do this, this patch introduces two helper functions in hvx_128.ll to perform this operation. One function interleaves its input (trunc_satub.vuh) and the other does not (pack_satub.vuh) This patch also removes declaration of some intrinsics not use any longer in hvx_128.ll * Make IR dump messages in HexagonOptimize.cpp consistent with those in CodeGen_Hexagon.cpp * fix clang-format complaints --------- Co-authored-by: Steven Johnson <srj@google.com> | 18 September 2023, 19:48:34 UTC |
68a0341 | Derek Gerstmann | 18 September 2023, 17:09:11 UTC | [api] Promote Internal::Parameter to Halide::Parameter (#7829) * Promote Internal::Parameter to Halide::Parameter (to support Serialization API refactoring) * Make raw_buffer(), scalar_address(), and scalar_raw_value() methods protected. Make Pipeline and Serializer protected friend classes. * Add Parameter public interface to python bindings. Remove old stub internal interface from PyParam. * Remove blank line at start of function --------- Co-authored-by: Derek Gerstmann <dgerstmann@adobe.com> Co-authored-by: Steven Johnson <srj@google.com> | 18 September 2023, 17:09:11 UTC |
ab4067f | Steven Johnson | 15 September 2023, 19:45:10 UTC | Fixes for top-of-tree Halide (#7850) * Fixes for top-of-tree Halide * I am a bonehead | 15 September 2023, 19:45:10 UTC |
d7760f5 | Derek Gerstmann | 15 September 2023, 01:05:12 UTC | [tutorials] Add tutorial on JIT compile/execute performance (#7838) * Add tutorial on JIT compile/execute performance * Addressing comments from review. Fix punctuation and comment nits. Add timing estimates as comments. Add std::function example. Enable advanced scheduling directives. * Addressing comments from review. Added cases that match real usage patterns: 1. Defining and compiling the whole pipeline every time you want to run it (i.e. in the benchmarking loop) 2. Defining the pipeline outside the benchmarking loop, and realizing it repeatedly. 3. (optional) Same as 2), but calling compile_jit() outside the loop, saying what it does, and saying why the time isn't actually different to case 2 (benchmark() runs multiple times and takes a min, and realize only compiiles on the first run) 4. Compiling to a callable outside the benchmarking loop and showing that it has lower overhead than case 3 (if indeed it does. If not we may need to change the example so that it does, e.g. by adding a real input buffer.) * Addressing comments from review for style nits, and typos in comments. --------- Co-authored-by: Derek Gerstmann <dgerstmann@adobe.com> Co-authored-by: Steven Johnson <srj@google.com> | 15 September 2023, 01:05:12 UTC |
8797287 | Volodymyr Kysenko | 12 September 2023, 20:35:23 UTC | Update arguments in driver.cpp to match what correctness/simd_op_check has (#7842) | 12 September 2023, 20:35:23 UTC |
6569a83 | Andrew Adams | 11 September 2023, 17:40:29 UTC | Zen4 support (#7840) * Enable emission of float16/32 casts on x86 Fixes #7836 Fixes #4166 * Add support for zen4 * Add avx512_Zen4 target flag It's a superset of cannon lake, and a subset of sapphire rapids * Fix runtime detection, sapphire rapids CPUID bits * Fix comment * Don't catch bfloat casts * Fix Zen4 model number * Use llvm BFloat type for bfloat intrinsics * Give up on native bfloat16 conversion for now * Don't use llvm's bfloat type at all * Add missing enum * Fix constant in comment * clang-format | 11 September 2023, 17:40:29 UTC |
b704abd | Volodymyr Kysenko | 06 September 2023, 21:32:44 UTC | Iterate over lets in the correct order in VectorizeLoops (#7830) * Iterate over lets in correct order * Comments * Comments * Comments | 06 September 2023, 21:32:44 UTC |
836879e | Andrew Adams | 06 September 2023, 21:29:27 UTC | Enable emission of float16/32 casts on x86 (#7837) * Enable emission of float16/32 casts on x86 Fixes #7836 Fixes #4166 * Fix comment * Don't catch bfloat casts * Fix missing word in comment | 06 September 2023, 21:29:27 UTC |
02865e2 | Xuanda Yang | 05 September 2023, 20:28:11 UTC | Add a check that PredicateLoads must be used in the outermost split of a dimension (#7788) * add a check that PredicateLoads must be used in the outermost split of a dimension * newline * use the repro example * fix * avoid check for every other tail strategy * update error message to point out what's not allowed --------- Co-authored-by: Steven Johnson <srj@google.com> | 05 September 2023, 20:28:11 UTC |
8188b42 | Andrew Adams | 01 September 2023, 17:38:19 UTC | Avoid generating name collisions in CSE (#7821) * Avoid generating name collisions in CSE Alternative to #7801 (See the discussion there) Fixes #4124 * Add missing test * Minor cleanup * clang-format | 01 September 2023, 17:38:19 UTC |
ddfb1dc | Andrew Adams | 01 September 2023, 17:37:50 UTC | Don't return an undefined Stmt() from IfThenElse visitor (#7816) Fixes #7815 | 01 September 2023, 17:37:50 UTC |
24d846c | Steven Johnson | 30 August 2023, 23:54:48 UTC | Remove dead `auto-schedule` label in CMake (#7818) These were replaced by more granular labels. Also, drive-by fix to comment that needed plurals. | 30 August 2023, 23:54:48 UTC |
afc61b2 | Steven Johnson | 30 August 2023, 23:54:09 UTC | Update 'Check CMake file lists' action (#7809) * Update 'Check CMake file lists' action Several subcategories were missing -- let's add them and see if they should be there or not * bogus change * Add missing comments * Revert "bogus change" This reverts commit 80454b1313e1c06b5432d15287fa1f51185f70b6. | 30 August 2023, 23:54:09 UTC |
3a1dffe | Steven Johnson | 29 August 2023, 16:23:44 UTC | Move clang-tidy checks back to Linux (#7817) * Move clang-tidy checks back to Linux Recent changes in the GHA runners for macOS don't play well with clang-tidy; rather than sink any more time into debugging it, I'm going to revert the relevant parts of #7746 so that it runs on the less-finicky Linux runners instead. * bogus * Update Generator.cpp * Update Generator.cpp | 29 August 2023, 16:23:44 UTC |
fa136cb | Steven Johnson | 29 August 2023, 16:21:59 UTC | Ensure that multitarget AOT builds have consistent random sequence (#7717) * Fix CMake test for generator_aot_multitarget * Ensure that multitarget AOT builds have consistent random numbers If a Generator uses random_float() (or the int or uint versions), and is used in a multitarget build, we weren't resetting the counters for random generation between each subtarget... meaning that each subtarget would get a different random sequence, leading to some ery hard-to-debug test failures when running on different hardware variants. This PR ensures that the relevant counters are all reset before each subtarget is generated, so that each should see the same sequence of random number generation. * Update CMakeLists.txt * Update multitarget_aottest.cpp * Combine float/uint counters | 29 August 2023, 16:21:59 UTC |
fe9f0b7 | Derek Gerstmann | 28 August 2023, 18:13:53 UTC | [serialization] Add serialization support to generator interface (#7792) * Add serialization support to Generator interface * Clang format pass * Make target required when emitting a serialized pipeline (since schedule may be target dependent). Apply auto-scheduler before serialization so that schedules can be serialized. * Fix enum ordering for hlpipe. Fix hlpipe comments. Add missing hlpipe enum to pyenums. * Remove unused Serialization build_mode * Fix formatting * Remove unused serializable flag. Remove redundant cpp_stub check. Fix comments. * Safeguard emit_hlpipe calls with #ifdef WITH_SERIALIZATION --------- Co-authored-by: Derek Gerstmann <dgerstmann@adobe.com> Co-authored-by: Steven Johnson <srj@google.com> | 28 August 2023, 18:13:53 UTC |
79d2be3 | Steven Johnson | 28 August 2023, 17:21:38 UTC | Update clang-tidy action to stop breaking (#7808) * Switch clang-tidy action from macos-13 to macos-latest `macos-latest` is actually macos-12 (macos13 is considered "beta" on the GHA runners). Hopefully this will fix the recent install snafus that are breaking clang-tidy. * Bogus change to trigger check * Update presubmit.yml * Update presubmit.yml * Update presubmit.yml * Revert "Bogus change to trigger check" This reverts commit a70f9ed8e6032d4b7799ff0cf6c009a7d2f92b3a. * Update presubmit.yml | 28 August 2023, 17:21:38 UTC |
8ac1e1c | Martijn Courteaux | 28 August 2023, 16:46:45 UTC | Add jump-buttons to get fro Stmt directly to Assembly (#7793) Co-authored-by: Steven Johnson <srj@google.com> | 28 August 2023, 16:46:45 UTC |
69c75b3 | Steven Johnson | 24 August 2023, 23:12:19 UTC | Update WebGPU to latest Emscripten/Dawn API (#7804) * Update WebGPU to latest Emscripten/Dawn API - Updated mini_webgpu.h to be in sync with Dawn as of commit ded6610f45a8826db37b52d73121a66b74d8aa61 - Updated the use of SetDeviceLost callbacks to be in the DeviceDescriptor instead of a separate call - Updated a couple of fields that got renamed - Update webgpu.cpp and gpu_context.h to always use wgpuCreateInstance() and wgpuInstanceRelease(), since the Dawn node bindings now support & require them * clang-tidy | 24 August 2023, 23:12:19 UTC |
84faa68 | Derek Gerstmann | 24 August 2023, 22:22:22 UTC | [wasm] Enable PIC for WebAssembly on LLVM v18.x (#7803) * Enable PIC code generation for WebAssembly for LLVM >18. Enable +mutable-globals to support dynamic linking * Fix LLVM v18 interface changes for writeArchive() Add RelLookupTableConverterPass for PIC (in LLVM v18) * Resolve conflict for writeArchive interface changes. * Clang format pass --------- Co-authored-by: Derek Gerstmann <dgerstmann@adobe.com> | 24 August 2023, 22:22:22 UTC |
84af2cd | Andrew Adams | 24 August 2023, 22:18:09 UTC | Add support to the makefile for serialization (#7762) * Add support to the makefile for serialization * Fix deps * Fix for no flatc, and for homebrew --------- Co-authored-by: Steven Johnson <srj@google.com> | 24 August 2023, 22:18:09 UTC |
f56b9ad | Andrew Adams | 24 August 2023, 21:48:26 UTC | Remove some unused includes (#7799) | 24 August 2023, 21:48:26 UTC |
678ea32 | Alexander Root | 24 August 2023, 19:49:57 UTC | [ARM] support new udot/sdot patterns (#7800) | 24 August 2023, 19:49:57 UTC |
88c75ec | Alexander Root | 24 August 2023, 17:31:26 UTC | [ARM] Distribute shifts as muls (#7790) * [ARM] distribute shifts as muls This reverts commit eba8f325edfaaa7b11c52a19435200f6b28e539a. --------- Co-authored-by: Steven Johnson <srj@google.com> | 24 August 2023, 17:31:26 UTC |