https://github.com/halide/Halide
- HEAD
- refs/heads/Halide_unsharp
- refs/heads/abadams/align_strided_const_loads
- refs/heads/abadams/alloca
- refs/heads/abadams/atomic_parallel_compiled_in
- refs/heads/abadams/atomic_vector_non_recursive
- refs/heads/abadams/averaging_tree
- refs/heads/abadams/avoid_name_mangling_in_cross_module_dependencies
- refs/heads/abadams/better_absd
- refs/heads/abadams/better_codegen_for_non_const_ramps
- refs/heads/abadams/bgu_cholesky
- refs/heads/abadams/braces_around_statements
- refs/heads/abadams/cache_tighten_producer_consumer_nodes
- refs/heads/abadams/check_reorder_dups
- refs/heads/abadams/clarify_broadcast_shuffle
- refs/heads/abadams/compositing_app
- refs/heads/abadams/cond_wait_spin
- refs/heads/abadams/cse_in_unroll_split_tuples
- refs/heads/abadams/custom_cuda_context
- refs/heads/abadams/custom_cuda_context_2
- refs/heads/abadams/custom_cuda_context_3
- refs/heads/abadams/d3d12abi
- refs/heads/abadams/deflake_mullapudi_reorder
- refs/heads/abadams/delete_prepare_for_early_exit
- refs/heads/abadams/depthwise_separable_conv
- refs/heads/abadams/diagnose_boundary_condition_failure
- refs/heads/abadams/disable_onnx_app_on_mac
- refs/heads/abadams/divide_using_pavgw
- refs/heads/abadams/dont_link_to_cudart
- refs/heads/abadams/dont_reinterpret_concat
- refs/heads/abadams/early_out
- refs/heads/abadams/enable_f16c
- refs/heads/abadams/extract_concat_bits
- refs/heads/abadams/fast_integer_divide_round_to_zero
- refs/heads/abadams/faster_runtime_integer_division
- refs/heads/abadams/faster_unroll
- refs/heads/abadams/fix-arm-seg2
- refs/heads/abadams/fix_4211
- refs/heads/abadams/fix_5323
- refs/heads/abadams/fix_5329
- refs/heads/abadams/fix_5889
- refs/heads/abadams/fix_6984
- refs/heads/abadams/fix_7229
- refs/heads/abadams/fix_7260
- refs/heads/abadams/fix_7365
- refs/heads/abadams/fix_7374
- refs/heads/abadams/fix_7504
- refs/heads/abadams/fix_7514
- refs/heads/abadams/fix_7531
- refs/heads/abadams/fix_7584
- refs/heads/abadams/fix_7584_v2
- refs/heads/abadams/fix_7742
- refs/heads/abadams/fix_7756
- refs/heads/abadams/fix_7761
- refs/heads/abadams/fix_7768
- refs/heads/abadams/fix_7786
- refs/heads/abadams/fix_7810
- refs/heads/abadams/fix_7811
- refs/heads/abadams/fix_7815
- refs/heads/abadams/fix_7867
- refs/heads/abadams/fix_7871
- refs/heads/abadams/fix_7872
- refs/heads/abadams/fix_7873
- refs/heads/abadams/fix_7888
- refs/heads/abadams/fix_7890
- refs/heads/abadams/fix_7891
- refs/heads/abadams/fix_7892
- refs/heads/abadams/fix_7893
- refs/heads/abadams/fix_7906
- refs/heads/abadams/fix_7909
- refs/heads/abadams/fix_7968
- refs/heads/abadams/fix_8038
- refs/heads/abadams/fix_8054
- refs/heads/abadams/fix_arm_fcvtmp
- refs/heads/abadams/fix_autoschedule_feature_transposition
- refs/heads/abadams/fix_cse_name_collisions
- refs/heads/abadams/fix_cuda_mat_mul_assert
- refs/heads/abadams/fix_deinterleave_bug
- refs/heads/abadams/fix_deinterleave_for_reinterpret
- refs/heads/abadams/fix_div_round_to_zero
- refs/heads/abadams/fix_fft_compile_time_regression
- refs/heads/abadams/fix_generate_output_snippets
- refs/heads/abadams/fix_if_nesting_condition
- refs/heads/abadams/fix_leaks_in_memoize_test
- refs/heads/abadams/fix_lgtm_warnings
- refs/heads/abadams/fix_links_to_master
- refs/heads/abadams/fix_load_of_broadcast
- refs/heads/abadams/fix_lossless_cast_of_sub
- refs/heads/abadams/fix_onnx_app
- refs/heads/abadams/fix_pointless_lower_condition
- refs/heads/abadams/fix_potential_gpu_deadlock
- refs/heads/abadams/fix_realize_condition_depends_on_tuple
- refs/heads/abadams/fix_reduce_expr_modulo_of_vector
- refs/heads/abadams/fix_riscv_vx_vi
- refs/heads/abadams/fix_round
- refs/heads/abadams/fix_stencil_chain_gpu_schedule
- refs/heads/abadams/fix_track_bounds_intervals
- refs/heads/abadams/fix_tutorial_2
- refs/heads/abadams/forward_partition_methods
- refs/heads/abadams/fully_fused_depthwise_separable_conv
- refs/heads/abadams/fuzz_sliding_window
- refs/heads/abadams/gaussian_blur_app
- refs/heads/abadams/generator_infinite_default_timeout
- refs/heads/abadams/gpu_autoscheduler_parallel_random_probes
- refs/heads/abadams/include_riscv_in_readme
- refs/heads/abadams/interleave_nested_vector
- refs/heads/abadams/ir_match_by_ref
- refs/heads/abadams/lerp_plus_cast
- refs/heads/abadams/local_laplacian_code_size
- refs/heads/abadams/lower_halving_sub
- refs/heads/abadams/lower_rounding_shift_right
- refs/heads/abadams/mac-arm-fixes
- refs/heads/abadams/make_fast_inverse_test_throughput_limited
- refs/heads/abadams/makefile_serialization_support
- refs/heads/abadams/mismatched_new_delete
- refs/heads/abadams/mixed_sign_mul_shift_right
- refs/heads/abadams/mixed_width_mul_shift_right
- refs/heads/abadams/multiple_scatter
- refs/heads/abadams/mux_intrinsic
- refs/heads/abadams/name_helpers
- refs/heads/abadams/narrow_predicates
- refs/heads/abadams/nested_vectorization_compile_time_regression_fix
- refs/heads/abadams/nested_vectorization_tweaks
- refs/heads/abadams/parallel_simd_op_check
- refs/heads/abadams/per_instance_profiling
- refs/heads/abadams/precompute_shared_mem_size
- refs/heads/abadams/prefer_no_gather
- refs/heads/abadams/print_uncaught_exception
- refs/heads/abadams/promote_fixed_point_intrinsics
- refs/heads/abadams/psabdw
- refs/heads/abadams/random_pipelines
- refs/heads/abadams/rationalize_gpu_for_loop_names
- refs/heads/abadams/reenable_unscheduled_stage_warning
- refs/heads/abadams/reinterpret_vector
- refs/heads/abadams/remove_arch_os_for_shaders
- refs/heads/abadams/remove_bad_pruning
- refs/heads/abadams/remove_parameter_self_references
- refs/heads/abadams/remove_readnone_on_functions
- refs/heads/abadams/remove_use_of_python_config_in_onnx_makefile
- refs/heads/abadams/reschedule_bgu
- refs/heads/abadams/reschedule_bilateral_grid
- refs/heads/abadams/rewrite_atomic_pass
- refs/heads/abadams/rounding_shift_right_use_average
- refs/heads/abadams/rungenmain_error
- refs/heads/abadams/sampling_profiler_overhead_v2
- refs/heads/abadams/scope_improvements
- refs/heads/abadams/simpler_broadcasts
- refs/heads/abadams/simplify_correlated_pyramid
- refs/heads/abadams/siotas_20
- refs/heads/abadams/sioutas_20
- refs/heads/abadams/slide_over_split_loop
- refs/heads/abadams/sorting_network_working_branch
- refs/heads/abadams/stable_topological_order
- refs/heads/abadams/string_view
- refs/heads/abadams/strip_asserts_last
- refs/heads/abadams/switch_stmt
- refs/heads/abadams/target_specific_lerp
- refs/heads/abadams/time_lowering_passes
- refs/heads/abadams/track_failedness_through_solver_lets
- refs/heads/abadams/turn_off_slp_vectorization_for_avx512
- refs/heads/abadams/tweak_unpack_buffers
- refs/heads/abadams/undo_pointless_widening
- refs/heads/abadams/unordered_blocks
- refs/heads/abadams/unsigned_demosaic
- refs/heads/abadams/update_makefile_for_llvm_19
- refs/heads/abadams/use_arm_for_runtime_triple
- refs/heads/abadams/use_pmaddubsw_for_downsample
- refs/heads/abadams/validate_gpu_schedules
- refs/heads/abadams/vector_reduce_hexagon_predicate
- refs/heads/abadams/vector_scan
- refs/heads/abadams/vst_type_fix
- refs/heads/abadams/widening_let_bug
- refs/heads/abadams/x86_avg
- refs/heads/abadams/zen4
- refs/heads/adadams/profile_allocator
- refs/heads/add_image_checks_after_bounds_inference_plus_new_rules
- refs/heads/add_outermost_to_extern
- refs/heads/add_vectorization_to_search_space
- refs/heads/aelphy/feature_cadence_changes
- refs/heads/aelphy/float_extracts
- refs/heads/align_loads_comment_fix
- refs/heads/alina-strided-store
- refs/heads/another_buffer_copy_fix
- refs/heads/arm_sve_redux
- refs/heads/ataei-block_asserts-codegen
- refs/heads/ataei-debug_info
- refs/heads/ataei-fix-pow
- refs/heads/ataei-gen_str_param
- refs/heads/ataei-implicit_lhs_vars
- refs/heads/ataei-onnx
- refs/heads/ataei-onnx_converter_update
- refs/heads/ataei-onnx_pybind
- refs/heads/ataei-resnet50_benchmarks
- refs/heads/ataei-standalone_autoscheduler
- refs/heads/ataei_lots_of_inputs
- refs/heads/auto_sched_benchmarks
- refs/heads/auto_sched_estimates
- refs/heads/auto_sched_inline
- refs/heads/auto_sched_test_notparallel
- refs/heads/autoschedule_top_down
- refs/heads/autoschedule_with_convnet
- refs/heads/autoscheduler_scalar_imageparam_fix
- refs/heads/backports/10.x
- refs/heads/backports/11.x
- refs/heads/backports/12.x
- refs/heads/backports/13.x
- refs/heads/balance_expressions
- refs/heads/bazel
- refs/heads/benchmarks
- refs/heads/blaze
- refs/heads/bounds_buffer_lets_fix
- refs/heads/bounds_correct_vs_bounds_loaded_reduced
- refs/heads/buffer_device_api_target
- refs/heads/bug_device_free
- refs/heads/bug_inline_unbounded
- refs/heads/build/fix-xcode-2
- refs/heads/build/manylinux-fixes
- refs/heads/circ_buffer
- refs/heads/cmake-no-runtime-debug-symbols
- refs/heads/cmake/asan
- refs/heads/cmake/deps-cleanup
- refs/heads/cmake/find-modules
- refs/heads/cmake/spirv
- refs/heads/cmake_wasm_features
- refs/heads/compute_at_guard_with_if_goes_on_stack
- refs/heads/compute_with_at
- refs/heads/compute_with_check
- refs/heads/compute_with_excessive_bounds
- refs/heads/compute_with_inlined
- refs/heads/compute_with_remove_is_right_level
- refs/heads/cpack/nuget
- refs/heads/ctest/wrappers
- refs/heads/cuda-constant
- refs/heads/d3d12-allocation-cache
- refs/heads/deferred_cse_after_inlining
- refs/heads/destructor_calls_deinit
- refs/heads/dg/deserialize_unmapped_objects
- refs/heads/dg/fix_vulkan_codegen_bool_conversion
- refs/heads/dg/vulkan_conform_api
- refs/heads/dg/vulkan_region_allocator_fixes
- refs/heads/dgerstmann/fix-vulkan-memory-config-init
- refs/heads/disable_acquire_release_test_vulkan
- refs/heads/distinct_wrapper_names
- refs/heads/dkg/6863_asan_fixes
- refs/heads/dkg/vulkan
- refs/heads/dpalermo_dmabuf
- refs/heads/dpalermo_dmabuf_libion
- refs/heads/dpalermo_hexagon_remote_202003
- refs/heads/dpalermo_sdk4_2_0_2
- refs/heads/ds/buffer-get-pure
- refs/heads/ds/opt-tile-size
- refs/heads/ds/tail-none
- refs/heads/ds/while
- refs/heads/dsharletg/bitwise-intrinsics
- refs/heads/dsharletg/find-vector-reduce
- refs/heads/dsharletg/jit-optimization
- refs/heads/dsharletg/memcpy-copy_from
- refs/heads/dsharletg/pattern-headroom
- refs/heads/dsharletg/refactor-host-alignment
- refs/heads/dsharletg/runtime-size
- refs/heads/dsharletg/simplify-abs
- refs/heads/dsharletg/simplify-type-bounds
- refs/heads/dsharletg/specialize-bounds
- refs/heads/dsharletg/upsample-channels
- refs/heads/empty_prefetch
- refs/heads/emscripten_vector_fix
- refs/heads/export_all-wsmoses
- refs/heads/expr_auto_sched
- refs/heads/extern_bugs
- refs/heads/extern_host_alloc
- refs/heads/factor_parallel_codegen_hack
- refs/heads/fast_sync_tsan
- refs/heads/faster_integer_division
- refs/heads/feature/apps-external
- refs/heads/feature/cmake-presets
- refs/heads/feature/convert
- refs/heads/feature/f16_interleave
- refs/heads/feature/gather_load_q7
- refs/heads/feature/llvm-codemodel
- refs/heads/feature/load_predicated
- refs/heads/feature/luma_regression
- refs/heads/feature/maintanence
- refs/heads/feature/reinterprets
- refs/heads/feature/tcm_bump_allocator
- refs/heads/feature/xtensa_fix_interleave_q8
- refs/heads/feature/xtensa_q8_tests
- refs/heads/find_intrinsics_issue
- refs/heads/find_intrinsics_widening_lets
- refs/heads/fix-floated-pure-stage
- refs/heads/fix-race-condition
- refs/heads/fix_hexagon_alignment
- refs/heads/fix_hvx_intrinsics
- refs/heads/fix_prefetch_test
- refs/heads/fix_windows_vs15_build
- refs/heads/fixed_length_vectors
- refs/heads/fixed_point_local_laplac
- refs/heads/gemmlowp
- refs/heads/generate
- refs/heads/gha/pip
- refs/heads/gpu_canon_fix
- refs/heads/halide_ir_flatbuffer
- refs/heads/hex_dma2_async
- refs/heads/hexagon_le_runtime
- refs/heads/hexagon_priority
- refs/heads/hexagon_setpriority
- refs/heads/hexagon_strided_pred_load
- refs/heads/hexagon_sysmon_markers
- refs/heads/imaging-synthesis
- refs/heads/includes_fix
- refs/heads/ios_fast_sync_fix
- refs/heads/jia-kai-fix-runtime-cuda-init
- refs/heads/kamil-openglcompute-infinity
- refs/heads/kamil/name_pthread_workers
- refs/heads/kp_bit_shift
- refs/heads/line_buffer
- refs/heads/loop_carry_not_working
- refs/heads/lower_on_huge_stack
- refs/heads/main
- refs/heads/master
- refs/heads/memoize_with_extents
- refs/heads/metal_float16
- refs/heads/metaprogrammed_simplifier_mod
- refs/heads/mohamedadaly-vmlal
- refs/heads/more_powerful_sliding
- refs/heads/new_autoschedule_with_new_simplifier_arm_worker_branch
- refs/heads/new_autoscheduler
- refs/heads/new_simplifier_rule_testing
- refs/heads/newer_ion_ioctl
- refs/heads/no_bounds_query_when_bounds_used
- refs/heads/opengl_compute_buffer_types_fix
- refs/heads/openglcompute_reuse_shared_allocations
- refs/heads/optmize_reorder
- refs/heads/par_for_opt
- refs/heads/pdb/fix_7806
- refs/heads/pdb/hexagon_remote_cmake
- refs/heads/pdb_add_libcpp_makefile_inc
- refs/heads/pdb_eliminate_interleaves_test
- refs/heads/pdb_fix_clang_build
- refs/heads/pdb_fix_install_qc
- refs/heads/pdb_fix_loop_carry
- refs/heads/pdb_fix_simd_op_check_hvx
- refs/heads/pdb_mul_div_mod_multi_thread
- refs/heads/pdb_remove_hvx_v64
- refs/heads/perform_inline_with_order
- refs/heads/pr/2572
- refs/heads/pr/2676
- refs/heads/pr/2975
- refs/heads/pr/3017
- refs/heads/pr/3081
- refs/heads/pr/3387
- refs/heads/pr/3939
- refs/heads/pr/3960
- refs/heads/pr/4380
- refs/heads/pr/4414
- refs/heads/pr/5331
- refs/heads/pr/5438
- refs/heads/pr/5455
- refs/heads/pr/5758_2
- refs/heads/predicated_vector
- refs/heads/prefetch_specialize
- refs/heads/print_schedule
- refs/heads/profile_hardware_counters
- refs/heads/random-pipelines
- refs/heads/rdom_with_pure_vars
- refs/heads/readme-fix-gcd
- refs/heads/realization_order
- refs/heads/refactor_module
- refs/heads/register_promotion
- refs/heads/release/10.x
- refs/heads/release/11.x
- refs/heads/release/12.x
- refs/heads/release/13.x
- refs/heads/release/14.x
- refs/heads/release/15.x
- refs/heads/release/16.x
- refs/heads/release/17.x
- refs/heads/release/8.x
- refs/heads/remove_max_on_fuse_factor
- refs/heads/reorder_rvar
- refs/heads/reset_unique_counter
- refs/heads/revert-3612-ataei-speedup_compiletime
- refs/heads/revert-7009-rootjalex/distribute-w_shl
- refs/heads/revert-7601-compile_hexagon_remote
- refs/heads/riscv_update
- refs/heads/rl_simplifier_rules
- refs/heads/rootjalex/add_simpl_rules
- refs/heads/rootjalex/arm-optimize
- refs/heads/rootjalex/autoscheduler_mcts
- refs/heads/rootjalex/bounds-rewriter
- refs/heads/rootjalex/bounds_synthesis
- refs/heads/rootjalex/cbounds
- refs/heads/rootjalex/cbounds_predicated
- refs/heads/rootjalex/fix-sat-overflow
- refs/heads/rootjalex/fix_estimate_issue
- refs/heads/rootjalex/fix_failed_unrolls
- refs/heads/rootjalex/gsoc_codegen
- refs/heads/rootjalex/improve_cbounds_fixed
- refs/heads/rootjalex/improve_constant_bounds
- refs/heads/rootjalex/pitchfork-arm
- refs/heads/rootjalex/reinterpret-simplify
- refs/heads/rootjalex/rts
- refs/heads/rootjalex/super_simplify_bounds
- refs/heads/rootjalex/test_cbounds_fixed
- refs/heads/rootjalex/test_constant_bounds
- refs/heads/rootjalex/trs-codegen
- refs/heads/rootjalex/trs-codegen-cross
- refs/heads/rootjalex/trs-merge
- refs/heads/rootjalex/uint32-int32-cast
- refs/heads/rootjalex/x86-hadds
- refs/heads/rootjalex/x86-optimize
- refs/heads/rootjalex/x86-optimize-test
- refs/heads/rootjalex/x86-sat
- refs/heads/rootjalex/x86-test
- refs/heads/rule_removal_experiments
- refs/heads/schedule-output-storage
- refs/heads/separate_bounds_query_entrypoint
- refs/heads/shallow
- refs/heads/shift_amount_type_change
- refs/heads/shoaibkamil/cmake-without-arm
- refs/heads/shoaibkamil/correct_memory_fences
- refs/heads/shoaibkamil/d3d-fixes
- refs/heads/shoaibkamil/deprecate_openglcompute
- refs/heads/shoaibkamil/json
- refs/heads/shoaibkamil/llvm_clone_tag
- refs/heads/shoaibkamil/minor-vcpkg-doc-change
- refs/heads/shoaibkamil/opengl_compute_tests
- refs/heads/shoaibkamil/performance_tests_as_generators
- refs/heads/shoaibkamil/rule_removal_experiments
- refs/heads/shoaibkamil/super_simplify_with_interpreter
- refs/heads/shoaibkamil/windows-arm-fix-attributes
- refs/heads/sim_shlib_addr_print
- refs/heads/simplify-nested-broadcasts
- refs/heads/simplify-vectorreduce-shuffles2
- refs/heads/simplify_mod
- refs/heads/sioutas_2020
- refs/heads/sioutas_2020_autoscheduler
- refs/heads/slomp/gpu-codegen-profiling
- refs/heads/slomp/msvc-static-analysis
- refs/heads/solve_div
- refs/heads/solve_div_master
- refs/heads/solve_div_simplifier_test
- refs/heads/sr/python-late-binding-defaults
- refs/heads/srj-aaa
- refs/heads/srj-alloc
- refs/heads/srj-alloca
- refs/heads/srj-appmake2
- refs/heads/srj-armv83a
- refs/heads/srj-aslog
- refs/heads/srj-assert
- refs/heads/srj-assoc
- refs/heads/srj-auto-multi
- refs/heads/srj-auto-multi2
- refs/heads/srj-auto_schedule_mat_mul
- refs/heads/srj-autosched
- refs/heads/srj-b2cpphide
- refs/heads/srj-barr
- refs/heads/srj-bits
- refs/heads/srj-blacklist
- refs/heads/srj-bounds
- refs/heads/srj-bufcalltype
- refs/heads/srj-bufcallwrap
- refs/heads/srj-bufcallwrap2
- refs/heads/srj-buffer
- refs/heads/srj-bv
- refs/heads/srj-classic-autotune
- refs/heads/srj-clean
- refs/heads/srj-constcall
- refs/heads/srj-crosscompile
- refs/heads/srj-ctlz
- refs/heads/srj-cvec-patch
- refs/heads/srj-dag
- refs/heads/srj-debug-to-file
- refs/heads/srj-deir
- refs/heads/srj-f16
- refs/heads/srj-fp16
- refs/heads/srj-fsch
- refs/heads/srj-fthru
- refs/heads/srj-g2
- refs/heads/srj-g3
- refs/heads/srj-gha-test-fixes
- refs/heads/srj-hidden
- refs/heads/srj-hide2
- refs/heads/srj-hvx
- refs/heads/srj-hvx-bug
- refs/heads/srj-hvx-codegen-bug
- refs/heads/srj-hvx-nocopy
- refs/heads/srj-hvxshift
- refs/heads/srj-iib
- refs/heads/srj-initshape
- refs/heads/srj-inv
- refs/heads/srj-ir
- refs/heads/srj-irmut2
- refs/heads/srj-iwyu
- refs/heads/srj-iwyu3
- refs/heads/srj-javascript_work_in_progress
- refs/heads/srj-lensblur
- refs/heads/srj-lessinc
- refs/heads/srj-llvm-loop-opt
- refs/heads/srj-mak
- refs/heads/srj-maxthreads
- refs/heads/srj-mod
- refs/heads/srj-msan
- refs/heads/srj-msan-call
- refs/heads/srj-muldivmod
- refs/heads/srj-mut
- refs/heads/srj-outputs-2
- refs/heads/srj-parse
- refs/heads/srj-pch
- refs/heads/srj-printfunc
- refs/heads/srj-pygp
- refs/heads/srj-revertbits
- refs/heads/srj-schedule-storage
- refs/heads/srj-shl-shr-2
- refs/heads/srj-sio
- refs/heads/srj-static-const
- refs/heads/srj-strided-store
- refs/heads/srj-tidyh
- refs/heads/srj-tiff
- refs/heads/srj-trace
- refs/heads/srj-tutorial
- refs/heads/srj-using
- refs/heads/srj-wasmfix
- refs/heads/srj-xor2
- refs/heads/srj/abstract-gen-without-get-output-func-KEEP
- refs/heads/srj/aligned-alloc
- refs/heads/srj/aligned-alloc-2
- refs/heads/srj/aligned-malloc-with-aligned-alloc
- refs/heads/srj/all-explicit-ctor
- refs/heads/srj/anderson-thread-info-ptr
- refs/heads/srj/aot-perf
- refs/heads/srj/argv-signatures
- refs/heads/srj/argv-types
- refs/heads/srj/async-test
- refs/heads/srj/b2cpp-const-data
- refs/heads/srj/better-xt-dispatch
- refs/heads/srj/bfloat1
- refs/heads/srj/bp
- refs/heads/srj/build_halide_h
- refs/heads/srj/c-bool
- refs/heads/srj/cache-clear
- refs/heads/srj/clang-fmt-ignore
- refs/heads/srj/clang-tidy
- refs/heads/srj/clear-c-cache
- refs/heads/srj/cmake-asan
- refs/heads/srj/cmake-asan2
- refs/heads/srj/cmake-jit-generators
- refs/heads/srj/configure-cmake
- refs/heads/srj/cpp-generator-v2-experiment-KEEP
- refs/heads/srj/crosscompile
- refs/heads/srj/ctad
- refs/heads/srj/depr
- refs/heads/srj/deprecation
- refs/heads/srj/device-copy
- refs/heads/srj/example
- refs/heads/srj/experiment
- refs/heads/srj/experiment-6967
- refs/heads/srj/exporting
- refs/heads/srj/expr_t
- refs/heads/srj/external-tensors
- refs/heads/srj/fix-pytorch
- refs/heads/srj/fixed-rollback
- refs/heads/srj/fopen-fix
- refs/heads/srj/forward
- refs/heads/srj/forward-name
- refs/heads/srj/gen-func
- refs/heads/srj/gen-func-2
- refs/heads/srj/gen-func-3
- refs/heads/srj/gen2-1
- refs/heads/srj/gen_closure
- refs/heads/srj/generator_aot_gpu_multi_context_threaded
- refs/heads/srj/globals
- refs/heads/srj/halide-buffer-crop
- refs/heads/srj/halide-malloc-alignment
- refs/heads/srj/halide-must-use
- refs/heads/srj/halide-runtime-must-use-result
- refs/heads/srj/hang-repro
- refs/heads/srj/hannk
- refs/heads/srj/hannk-aliasing
- refs/heads/srj/hannk-error-checking
- refs/heads/srj/hannk-errors
- refs/heads/srj/hannk-inplace
- refs/heads/srj/hannk-mmap
- refs/heads/srj/hannk-tflite-27
- refs/heads/srj/hannk-verbosity
- refs/heads/srj/hdrs
- refs/heads/srj/html-becomes-viz
- refs/heads/srj/implicit-mult-widening
- refs/heads/srj/issue-7076
- refs/heads/srj/iwyu
- refs/heads/srj/iwyu-2
- refs/heads/srj/iwyu-6
- refs/heads/srj/libHANNK
- refs/heads/srj/llvm_type_of
- refs/heads/srj/maybe-unused
- refs/heads/srj/meanop
- refs/heads/srj/metadata-calling-convention
- refs/heads/srj/more-tidy
- refs/heads/srj/msan-dtf
- refs/heads/srj/multimeta
- refs/heads/srj/nanobind
- refs/heads/srj/new-rt-1
- refs/heads/srj/no-threadpool
- refs/heads/srj/no-timeout-thread
- refs/heads/srj/oglc-mutexed
- refs/heads/srj/param-map
- refs/heads/srj/pip-15.x
- refs/heads/srj/pip-cron
- refs/heads/srj/possible-uninited
- refs/heads/srj/pr-7566
- refs/heads/srj/printer-size
- refs/heads/srj/profiler-data-race
- refs/heads/srj/ptr-int-cast
- refs/heads/srj/pyapps
- refs/heads/srj/pyext-fix
- refs/heads/srj/pygen-class
- refs/heads/srj/pygen-deux
- refs/heads/srj/pygen-func
- refs/heads/srj/pygen-native-types
- refs/heads/srj/pyinstall
- refs/heads/srj/pypi-try
- refs/heads/srj/pystuff
- refs/heads/srj/python-buffer-unpack
- refs/heads/srj/python-tutorial
- refs/heads/srj/reshape
- refs/heads/srj/rt-error-smallify
- refs/heads/srj/rt-return-types
- refs/heads/srj/runtime-error-handling
- refs/heads/srj/sat-fixes-exp
- refs/heads/srj/sat-fixes-exp-2
- refs/heads/srj/shadow-field
- refs/heads/srj/snprintf
- refs/heads/srj/spirv-license
- refs/heads/srj/stat-buf-deprecations
- refs/heads/srj/static-buffer-generators
- refs/heads/srj/stmt-html
- refs/heads/srj/stringify
- refs/heads/srj/synth-gen-params
- refs/heads/srj/synth-params-python
- refs/heads/srj/test-arm_sve_redux
- refs/heads/srj/test-intrinsics-bounds
- refs/heads/srj/test8076
- refs/heads/srj/test8078
- refs/heads/srj/test8094
- refs/heads/srj/test8105a
- refs/heads/srj/test8115
- refs/heads/srj/test_tmpdir_fix
- refs/heads/srj/tidy
- refs/heads/srj/tidy-format-14
- refs/heads/srj/tidymore
- refs/heads/srj/tidymore2
- refs/heads/srj/tls
- refs/heads/srj/tls-3
- refs/heads/srj/tls-4
- refs/heads/srj/tls-ucon
- refs/heads/srj/tmp-unschedule-experiment
- refs/heads/srj/tot-fix
- refs/heads/srj/try-revert-sat
- refs/heads/srj/type-traits
- refs/heads/srj/typed-func
- refs/heads/srj/ucon-all-const
- refs/heads/srj/ucon-non-const
- refs/heads/srj/visit-warnings
- refs/heads/srj/wasm-atomic2
- refs/heads/srj/wasm-simd
- refs/heads/srj/wasm-stuff
- refs/heads/srj/wasm-threads
- refs/heads/srj/wasm-updates
- refs/heads/srj/wasm-work
- refs/heads/srj/wip
- refs/heads/srj/x-rounding
- refs/heads/srj/xbuf
- refs/heads/srj/xc+plus+size+tmp
- refs/heads/srj/xc-types
- refs/heads/srj/xt-uint-cast-test
- refs/heads/srj/xtensa-arch
- refs/heads/srj/xtensa-merge
- refs/heads/srj/xvc-experimetn
- refs/heads/srj/zlib-embed
- refs/heads/standalone_autoscheduler
- refs/heads/standalone_autoscheduler_arm_worker
- refs/heads/standalone_autoscheduler_arm_worker_amazon
- refs/heads/standalone_autoscheduler_gpu
- refs/heads/standalone_autoscheduler_hexagon
- refs/heads/sticky_task_assignments
- refs/heads/store_with
- refs/heads/store_with_solver_for_super_simplify
- refs/heads/strict_float_cse_fix
- refs/heads/super_simplify
- refs/heads/super_simplify_v2
- refs/heads/super_simplify_v3
- refs/heads/transitive_wrapper
- refs/heads/trigger-release-v16
- refs/heads/tzumao-autodiff-boundarycond
- refs/heads/tzumao-gradient-autoscheduler-bug
- refs/heads/tzumao-predicate-store-load
- refs/heads/tzumao-python-buffer
- refs/heads/tzumao_autodiff_unbounded
- refs/heads/tzumao_improve_gradient_autoscheduler
- refs/heads/tzumao_issue_4297
- refs/heads/tzumao_licm_before_BI
- refs/heads/unbounded_bugs
- refs/heads/undo_async_copy_chain_black_list
- refs/heads/use_string_literals_for_blobs
- refs/heads/users/lukas/python-pip
- refs/heads/validate_sched_error_msg
- refs/heads/var_ir_fix
- refs/heads/vksnk/async-experiment
- refs/heads/vksnk/async-multiple-producers
- refs/heads/vksnk/async-order
- refs/heads/vksnk/better-loop-carry
- refs/heads/vksnk/better-message
- refs/heads/vksnk/bound-storage
- refs/heads/vksnk/bounds-widen-right
- refs/heads/vksnk/c-print-type
- refs/heads/vksnk/c-round
- refs/heads/vksnk/check-return-result
- refs/heads/vksnk/compute-with-bug
- refs/heads/vksnk/compute_with_async
- refs/heads/vksnk/dma-limit-channels
- refs/heads/vksnk/dma-min-max
- refs/heads/vksnk/expr-match-shuffle
- refs/heads/vksnk/extract-from-scalar
- refs/heads/vksnk/f16-load
- refs/heads/vksnk/fix-packvr
- refs/heads/vksnk/fix_halide_xtensa_narrow_with_rounding_shift_i16
- refs/heads/vksnk/fused-compute-with
- refs/heads/vksnk/hoist-storage-bug
- refs/heads/vksnk/lerp-intrinsics
- refs/heads/vksnk/lower-signed-shifts
- refs/heads/vksnk/missing-exception
- refs/heads/vksnk/non-widening-halves
- refs/heads/vksnk/optimize-shuffles
- refs/heads/vksnk/replace-all
- refs/heads/vksnk/restrict
- refs/heads/vksnk/roll-buffer
- refs/heads/vksnk/roundeven-arm
- refs/heads/vksnk/rvar-bounds
- refs/heads/vksnk/simplify-slice
- refs/heads/vksnk/skip-semaphores
- refs/heads/vksnk/storage-folding
- refs/heads/vksnk/strided-load-of-4_2
- refs/heads/vksnk/typed-scope
- refs/heads/vksnk/update-simd-driver
- refs/heads/vksnk/vectorize-bug
- refs/heads/vksnk/vectorize-scalarize
- refs/heads/vksnk/widening_absd
- refs/heads/vksnk/xtensa-codegen-fp16
- refs/heads/vksnk/xtensa-dma-improvements
- refs/heads/vksnk/xtensa-regroup-pass
- refs/heads/vksnk/xtensa/lift-allocs
- refs/heads/vulkan
- refs/heads/vulkan-diagnose-alloc-failures
- refs/heads/vulkan-phase0-adts
- refs/heads/vulkan-phase1-spirv
- refs/heads/vulkan-phase2-runtime
- refs/heads/vulkan2
- refs/heads/vulkan_fix_gpu_dynamic_shared_test
- refs/heads/vulkan_fix_subregion_memory_offsets
- refs/heads/webassembly-old
- refs/heads/winograd
- refs/heads/wording_fix
- refs/heads/xtensa-codegen
- refs/heads/xtensa-codegen-parallel
- refs/heads/xuanda/fix-serialize-bad-partition-always
- refs/remotes/origin/rootjalex/add_autosched_caching
- refs/tags/release_2018_02_15
- refs/tags/release_2019_08_27
- refs/tags/release_8.0.0
- refs/tags/v10.0.0
- refs/tags/v10.0.1
- refs/tags/v11.0.0
- refs/tags/v11.0.1
- refs/tags/v12.0.0
- refs/tags/v12.0.1
- refs/tags/v13.0.0
- refs/tags/v13.0.1
- refs/tags/v13.0.2
- refs/tags/v13.0.3
- refs/tags/v13.0.4
- refs/tags/v14.0.0
- refs/tags/v15.0.0
- refs/tags/v15.0.1
- refs/tags/v16.0.0
- refs/tags/v17.0.0
- refs/tags/v17.0.1
- refs/tags/v8.0.0
Take a new snapshot of a software origin
If the archived software origin currently browsed is not synchronized with its upstream version (for instance when new commits have been issued), you can explicitly request Software Heritage to take a new snapshot of it.
Use the form below to proceed. Once a request has been submitted and accepted, it will be processed as soon as possible. You can then check its processing state by visiting this dedicated page.Processing "take a new snapshot" request ...
Permalinks
To reference or cite the objects present in the Software Heritage archive, permalinks based on SoftWare Hash IDentifiers (SWHIDs) must be used.
Select below a type of object currently browsed in order to display its associated SWHID and permalink.
Revision | Author | Date | Message | Commit Date |
---|---|---|---|---|
5999356 | Steven Johnson | 06 August 2021, 16:58:01 UTC | Merge branch 'master' into srj-llvm-loop-opt | 06 August 2021, 16:58:01 UTC |
451cfa8 | Steven Johnson | 06 August 2021, 15:58:33 UTC | Add argv and metadata support to C++ backend (Issue #2071) (#6179) * Add argv and metadata support to C++ backend (Issue #2071) * legalize_name-> c_print_name * Fix user_context handling | 06 August 2021, 15:58:33 UTC |
657cb56 | Steven Johnson | 05 August 2021, 17:08:21 UTC | Merge branch 'master' into srj-llvm-loop-opt | 05 August 2021, 17:08:21 UTC |
2e229f5 | Evan Lee | 04 August 2021, 04:54:30 UTC | Rewrite Rules Evaluation Project - Merging Relevant Synthesized Rewrite Rules (#6174) Conducted experiments to analyze the performance effects of adding 4000+ synthesized rewrite rules to Halide. Narrowed down the rules to 11 rewrite rules whose associative & commutative variants are added in this PR. With these rewrite rules, Halide achieves >10% peak memory reductions in 192 cases in apps including camera_pipe, harris, nl_means, and stencil_chain, which is similar to the results (with all 4000+ rules) from this paper - https://dl.acm.org/doi/pdf/10.1145/3428234 | 04 August 2021, 04:54:30 UTC |
8b26454 | Steven Johnson | 03 August 2021, 20:06:53 UTC | Add more fine-grained prefetch() directive (Issue #3735) (#6155) Add more fine-grained prefetch() directive (Issue #3735) | 03 August 2021, 20:06:53 UTC |
4f8629c | Steven Johnson | 03 August 2021, 00:49:54 UTC | Fix broken wasm-simd extmul instructions due to changes from https://reviews.llvm.org/D106724 (#6177) | 03 August 2021, 00:49:54 UTC |
0a09bfb | Steven Johnson | 02 August 2021, 21:14:12 UTC | Fix for trunk LLVM (#6176) * Fix for trunk LLVM * More Fixes | 02 August 2021, 21:14:12 UTC |
e52d6ca | Alex Reinking | 31 July 2021, 04:43:06 UTC | Fix Xcode issue that requires at least one source file when building a library from objects. (#6175) * Fix Xcode issue that requires at least one source file when building a library from objects. Fixes #6167 * add newline to end of file | 31 July 2021, 04:43:06 UTC |
a7e8c43 | Dillon Sharlet | 29 July 2021, 15:53:11 UTC | Partial revert of 8f849ae6514e83f8bf94d05e452a467df352f74c (only (#6173) reverting halide_remote.cpp). | 29 July 2021, 15:53:11 UTC |
36f6b8c | Alex Reinking | 28 July 2021, 17:53:33 UTC | Use generic build command instead of make. Fixes #6163 (#6169) | 28 July 2021, 17:53:33 UTC |
2b8ec44 | Steven Johnson | 27 July 2021, 14:57:41 UTC | Remove deprecated realize() Python wrapprs (#6162) The C++ versions were removed in #6122, but the Python equivalents were overlooked. | 27 July 2021, 14:57:41 UTC |
a5585cb | Alexander Root | 27 July 2021, 02:07:40 UTC | Add various bounds-related simplifier rules (#6160) * add simplifier rules | 27 July 2021, 02:07:40 UTC |
2ab9a56 | Shoaib Kamil | 24 July 2021, 13:55:32 UTC | De-predicate loads and stores in Metal/OpenCL/D3D12 backend (#6158) * Depredicate loads and stores in Metal backend * Fix typo. * Mark override, add additional using * float_t -> float * Update CMakeLists.txt * clang-format * Also scalarize in D3D12 and OpenCL * use const_true() helper | 24 July 2021, 13:55:32 UTC |
b68393c | Steven Johnson | 21 July 2021, 23:08:12 UTC | [hannk] Add a --csv flag to compare_vs_tflite (#6149) * [hannk] Add optional taskset support to the run_on_device scripts * [hannk] Add a --csv flag to compare_vs_tflite This lets us output results in CSV format for easy copy/paste into (eg) spreadsheets. | 21 July 2021, 23:08:12 UTC |
025a9b9 | Dillon Sharlet | 21 July 2021, 22:11:55 UTC | Handle depth_multiplier != 1 in a separate op (#6154) * Implement depth_multiplier != 1 in a separate op. * Fix build on GCC * Remove stale comment * clang-format * Add more comments to inv_depth_multiplier | 21 July 2021, 22:11:55 UTC |
9d7284b | Dillon Sharlet | 20 July 2021, 20:50:14 UTC | Move quantization to a helper function depending on the target (#6150) * Move quantization + relu to a helper function depending on the target. * clang-format * x86 has these too actually * Fix typo | 20 July 2021, 20:50:14 UTC |
5ca8cdf | Dillon Sharlet | 20 July 2021, 16:43:26 UTC | Generalize Conv2D to be a Conv of any dimensionality (#6146) * Generalize Conv2D to be a Conv of any dimensionality. * clang-format | 20 July 2021, 16:43:26 UTC |
5812f33 | Volodymyr Kysenko | 20 July 2021, 15:45:49 UTC | Configurable minimum size for alignment in align_loads (#6143) Co-authored-by: Steven Johnson <srj@google.com> | 20 July 2021, 15:45:49 UTC |
b457d3c | Steven Johnson | 20 July 2021, 02:16:25 UTC | Add support for int16 output in Conv2D (#6145) This allows us to convert all (currently supported) FC ops into Conv2D ops. Remove all the FC-specific Halide and Op code. | 20 July 2021, 02:16:25 UTC |
9d1e1e3 | Steven Johnson | 20 July 2021, 00:33:52 UTC | [hannk] Rewrite FC in terms of Conv2D (#6144) * [hannk] Rewrite FC in terms of Conv2D FullyConnected is very similar to Conv2D, so rather than maintaining multiple similar implementations, let's translate a FullyConnected node into a Conv2D node (with some Reshape nodes as necessary). Note that we keep the old FC logic for int16 outputs, as Conv2D doesn't support those yet; if this PR is landed, a followup PR will add that ability to Conv2D, and the existing FC support will be removed entirely. | 20 July 2021, 00:33:52 UTC |
bd7ebf5 | Steven Johnson | 19 July 2021, 19:27:44 UTC | Fix for top-of-tree LLVM (#6142) * Fix for top-of-tree LLVM | 19 July 2021, 19:27:44 UTC |
557c8e4 | Dillon Sharlet | 16 July 2021, 17:56:05 UTC | Fix Hexagon vrmpy with 16-bit results (#4248) (#6137) * Fix #4248 * clang-format | 16 July 2021, 17:56:05 UTC |
769b855 | Dillon Sharlet | 15 July 2021, 16:22:28 UTC | Add optimization for corner case in conv (#6139) * Add silly optimization for weird cases. * Use transpose | 15 July 2021, 16:22:28 UTC |
42e1d45 | Steven Johnson | 15 July 2021, 16:02:00 UTC | [hannk] Allow aliasing of Reshape tensors (#6138) * Allow aliasing of Reshape tensors Previously we didn't allow this because aliased tensors had to have the same rank, which is ~never the case for Reshape. Aliasing for Reshape is a huge win because it essentially becomes a no-op rather than a memcpy. Running against standard set of models shows no regression in differences vs. tflite. | 15 July 2021, 16:02:00 UTC |
19f2bc7 | Dillon Sharlet | 14 July 2021, 00:20:25 UTC | Reduce verbosity of compare_vs_tflite further (#6136) | 14 July 2021, 00:20:25 UTC |
802c22a | Andrew Adams | 13 July 2021, 23:13:02 UTC | Don't reinterpret cast when codegenning vector concat (#6125) It confuses the HVX LLVM backend, and shouldn't be necessary anyway. | 13 July 2021, 23:13:02 UTC |
77207a5 | Dillon Sharlet | 13 July 2021, 23:04:21 UTC | Optimize shallow depthwise convolutions (#6134) * Add TailStrategy::PredicateLoads and TailStrategy::PredicateStores * Different compilers * PredicateStores is faster than specialize + ShiftInwards * Update comments. * Allow PredicateStores for RVars * Fix test to avoid realize bounds query issues. * Add comments. * clang-format * predicate* is not pure * Fix documentation bugs * Don't allow PredicateStores for reductions. * Substitute more strongly around Provide * Change these back to pure for now to satisfy some logic in ScheduleFunctions * Fix use after free of pred. * Update comments. * Refactor implementation of predication * Visit predicates * Partition loops with predicated loads/stores. * Clean up ApplySplit * Fix inappropriate predicated vectorization of VectorReduce * De-dup GuardWithIf and Predicate * These also handle scalar predicated loads/stores. * Print provide predicates * Don't allow predicated non-innermost splits. * Remove debugging code * Forgot to add new file * Add test to CMake build * Fix bug in simplification of extract_element * Fix issue with mixing uses of guarded expressions inside and outside calls. * Don't lift impure exprs. * clang-format * clang-format again * Add "shallow" version of depthwise for small numbers of channels. * Better name for input_stride_x * Fix performance regression in deep case. * Update performance * Missed rename * Enable tiling of shallow case. * Require x be a dummy dim for shallow depthwise * Small cleanup to avoid ternary * clang-format * Can't use shallow depthwise when stride_x != 1 | 13 July 2021, 23:04:21 UTC |
a762c34 | Dillon Sharlet | 13 July 2021, 21:54:11 UTC | Add TailStrategy::PredicateLoads and TailStrategy::PredicateStores (#6126) * Add TailStrategy::PredicateLoads and TailStrategy::PredicateStores * Different compilers * PredicateStores is faster than specialize + ShiftInwards * Update comments. * Allow PredicateStores for RVars * Fix test to avoid realize bounds query issues. * Add comments. * clang-format * predicate* is not pure * Fix documentation bugs * Don't allow PredicateStores for reductions. * Substitute more strongly around Provide * Change these back to pure for now to satisfy some logic in ScheduleFunctions * Fix use after free of pred. * Update comments. * Refactor implementation of predication * Visit predicates * Partition loops with predicated loads/stores. * Clean up ApplySplit * Fix inappropriate predicated vectorization of VectorReduce * De-dup GuardWithIf and Predicate * These also handle scalar predicated loads/stores. * Print provide predicates * Don't allow predicated non-innermost splits. * Remove debugging code * Forgot to add new file * Add test to CMake build * Fix bug in simplification of extract_element * Fix issue with mixing uses of guarded expressions inside and outside calls. * Don't lift impure exprs. * clang-format * clang-format again | 13 July 2021, 21:54:11 UTC |
867b6c8 | Steven Johnson | 13 July 2021, 20:01:21 UTC | [hannk] Make compare_vs_tflite with --verbose 0 less noisy (#6135) Minor fixes to eliminate noise. | 13 July 2021, 20:01:21 UTC |
e705253 | Steven Johnson | 13 July 2021, 01:17:41 UTC | [hannk] Implement greedy algorithm in AllocationPlanner (#6117) * [hannk] Rework most of hannk's Tensor storage to be arena-based. * Update interpreter.cpp * Restore get_tensor * clang-format * Add missing include * Fix arena alignment issues * Remove redundant assert * Rework AllocationPlanner API a bit * [hannk] Implement greedy algorithm in AllocationPlanner This uses a basic greedy approach to doing an allocation plan for tensors in hannk. Initial testing shows exact result matches between old and new code. Drive-by changes: - Change Interpreter's `verbose` -> `verbosity` to allow more output granularity, and update callers as needed. - Fix two places in ModelRunner that should have called the function hooks rather than the functions directly. * clang-format * Add missing includes * Add missing includes * trigger buildbots * Minor fixes and comments in AllocationPlanner * Suggested fixes | 13 July 2021, 01:17:41 UTC |
3e9cb4f | Steven Johnson | 12 July 2021, 23:22:05 UTC | Fix wasm regression at ToT LLVM (#6132) llvm.wasm.promote.low was removed. Calling fpext directly is the preferred approach now. | 12 July 2021, 23:22:05 UTC |
a2c47a9 | aankit-ca | 08 July 2021, 19:22:22 UTC | [Hexagon] Use LLVM masked stores. (#6129) * [Hexagon] Use LLVM masked stores. Letting CodeGen_LLVM handle predicated stores for Hexagon allows us to generate HVX predicated stores instead of scalar predicated stores. * Corrections to run haank on hexagon-sim Co-authored-by: Ankit Aggarwal <aankit@quicinc.com> | 08 July 2021, 19:22:22 UTC |
f48a8da | Zalman Stern | 07 July 2021, 23:33:53 UTC | Adding padding byte size to outermost header byte count in MATLAB5 file format. (#6128) Adding padding byte size to outermost header byte count when writing MATLAB5 file format matrix. This ensures SciPy will successfully read files written by this routine. | 07 July 2021, 23:33:53 UTC |
27a2348 | Zalman Stern | 07 July 2021, 23:31:06 UTC | Track argument change to LLVM's CreateMaskedLoad. (#6130) | 07 July 2021, 23:31:06 UTC |
a914574 | aankit-ca | 02 July 2021, 04:56:36 UTC | [Hexagon] Makefile changes for hannk on Hexagon (#6066) * [Hexagon] Makefile changes for hannk on Hexagon Initial commit to get hannk app. Works on device. Qurt crash on sim. * Add missing file stubs.c * Run clang-format * correction * clang-format * clang-format * address comments * Run all tests * sim-constants in seperate file * add file * Changes * changes Co-authored-by: Ankit Aggarwal <aankit@quicinc.com> Co-authored-by: Steven Johnson <srj@google.com> | 02 July 2021, 04:56:36 UTC |
240f6a3 | Alexander Root | 01 July 2021, 14:55:28 UTC | Use bound_correlated_differences in find_constant_bounds (#6059) * move PartiallyCancelDifferences to outside of SimplifyCorrelatedDifferences * add a rule to address #6044 correlation * use bound_correlated_differences in find_constant_bounds | 01 July 2021, 14:55:28 UTC |
f7aa53b | Steven Johnson | 30 June 2021, 19:24:19 UTC | Remove deprecated realize() variants from Func and PIpeline (#6122) These were deprecated in Halide 12. Let's remove them for Halide 13. | 30 June 2021, 19:24:19 UTC |
d1d7359 | Andrew Adams | 29 June 2021, 23:15:12 UTC | Relax overzealous pruning rule (#6115) We don't allow schedules that fuse to the extent that we can no longer vectorize. This was implemented incorrectly though. The check assumed that something was going to be compute_at inside the innermost loop, and neglected the possibility that we were about to tile that loop. | 29 June 2021, 23:15:12 UTC |
21eef77 | Steven Johnson | 29 June 2021, 23:01:50 UTC | Merge branch 'master' into srj-llvm-loop-opt | 29 June 2021, 23:01:50 UTC |
84b78da | Svenn-Arne Dragly | 29 June 2021, 22:27:00 UTC | Fix potential undefined behavior in `set_flag` (#6118) Previously, the Clang UndefinedBehaviorSanitizer (UBSan) complained about potential undefined behavior in `halide_buffer_t::set_flag` because the enum `halide_buffer_flags` is interpreted as an int32_t and implicitly converted to a uint64_t: ``` runtime error: implicit conversion from type 'int' of value -2 (32-bit, signed) to type 'unsigned long' changed the value to 18446744073709551614 (64-bit, unsigned) ``` On most compilers and hardware, this causes no issues, since the conversion and implementation of `set_flag` together produce the expected behavior still. However, it is better to be on the safe side and make the explicit conversion to a uint64_t before doing the bitwise negation. This change makes sure the conversion from int32_t is made before the bitwise negation, which fixes the potential undefined behavior and keeps UBSan from complaining. | 29 June 2021, 22:27:00 UTC |
1b7f369 | Steven Johnson | 29 June 2021, 16:30:51 UTC | [hannk] Rework most of hannk's Tensor storage to be arena-based. (#6104) * [hannk] Rework most of hannk's Tensor storage to be arena-based. | 29 June 2021, 16:30:51 UTC |
408a277 | Steve Suzuki | 28 June 2021, 17:57:29 UTC | Float16 support in CodeGen_ARM (#6102) * Add definition of Target::ARMFp16 Add the definition of the feature for ARMv8.2-a half-precision floating point data processing * Added test to generate 'float16' neon assembly; * Add check for data type in float16 NEON test The test simd_op_check doesn't check the suffix of operand which indicates the data type in case of AArch64 NEON instruction. e.g. FADD V0.4S, V0.4S, V0.4S In order to distinguish instruction of fp16 from fp32, the suffix such as ".4S" in the above needs to be checked. * Generate float16 Arm aarch64 LLVM-IR Armv8-a extension of Half-precision floating point data processing is supported by CodeGen_ARM. The target needs to be set as 64-bit with "arm_fp16" feature. 32-bit is not supported in this commit. Upgrading fp16 to fp32 with emulated conversion is replaced with either fp16 native instruction or fp32 operation with native type conversion of fp16-fp32 * Fix format and comments for arm_fp16 feature Co-authored-by: Liam O'Neil <liam.oneil@arm.com> | 28 June 2021, 17:57:29 UTC |
bfd9cea | Kai Wolf | 28 June 2021, 17:23:32 UTC | Update LoopNest.cpp (#6086) Remove obsolete assert for output accessing other outputs | 28 June 2021, 17:23:32 UTC |
2816567 | Alex Reinking | 26 June 2021, 00:39:42 UTC | Enable ubuntu packaging (#6113) * Revert "Disable Ubuntu Packaging Action (Issue #6111) (#6112)" This reverts commit 3f3dd702 * Explicitly update to avoid out of date package lists. Fixes #6111 | 26 June 2021, 00:39:42 UTC |
0da1354 | Steven Johnson | 25 June 2021, 23:24:33 UTC | Avoid pathological cases in halide_benchmark() (#6110) In the variant that tries to compute a good samples/iters value based on min_time, there's a pathological case if the environment's timer is relatively coarse, and the op being profiled is relatively fast; in this case, you can end up with timings that are very close to zero (or *literally* zero), and our attempt to calculate the number of iterations can explode into the billions, making the benchmark appear to hang (as it may take an absurd length of time to run). To fix this, add a maximum value for iters_per_sample, and smarten the calculation for when the measured time is tiny. | 25 June 2021, 23:24:33 UTC |
3f3dd70 | Steven Johnson | 25 June 2021, 21:38:28 UTC | Disable Ubuntu Packaging Action (Issue #6111) (#6112) * Disable Ubuntu Packaging Action (Issue #6111) | 25 June 2021, 21:38:28 UTC |
7791e84 | Dillon Sharlet | 24 June 2021, 18:23:48 UTC | Remove floats from extern_producer (#6109) * Don't rely on floats/trig unnecessarily * Use different period | 24 June 2021, 18:23:48 UTC |
a987222 | Dillon Sharlet | 23 June 2021, 17:34:37 UTC | Remove likelies and promises before trying to check for monotonicity. (#6105) | 23 June 2021, 17:34:37 UTC |
2da7ca5 | Alexander Root | 23 June 2021, 06:12:06 UTC | Call simplify and remove_likelies for find_constant_bounds (#6099) | 23 June 2021, 06:12:06 UTC |
f285f08 | Steven Johnson | 22 June 2021, 22:09:33 UTC | [hannk] Minor cleanups (#6103) * [hannk] Minor cleanups * Restore get_tensor | 22 June 2021, 22:09:33 UTC |
93292a2 | Steven Johnson | 22 June 2021, 20:51:20 UTC | [hannk] Add --keep_going flag to ModelRunner (#6101) This allows you to run a compare operation against a bunch of graphs without exiting at the first one that is out-of-spec for comparison. (Useful when you want to verify that no *new* differences are introduced by a change.) | 22 June 2021, 20:51:20 UTC |
d82fec4 | Steven Johnson | 22 June 2021, 16:16:37 UTC | [hannk] Fix various build glitches for Bazel/Blaze (#6098) - Make small_vector.h standalone-compilable - Move Tensor::replace_all_consumers_with() to a local function near PadForOps to dodge a circular include dep between Tensor and Model | 22 June 2021, 16:16:37 UTC |
b94a526 | Steven Johnson | 21 June 2021, 23:11:55 UTC | [hannk] Replace Tensor::set_external_host with set_external_buffer (#6100) * Replace Tensor::set_external_host with set_external_buffer * Also remove stale comment | 21 June 2021, 23:11:55 UTC |
45f31f7 | Dillon Sharlet | 19 June 2021, 00:49:52 UTC | Fix is_monotonic issue (#6081) (#6083) * Fix #6081 * Slightly less bizarre implementation of select visitor. Co-authored-by: Steven Johnson <srj@google.com> | 19 June 2021, 00:49:52 UTC |
5aeb8db | Steven Johnson | 17 June 2021, 23:04:26 UTC | [hannk] Don't mark Tensors as input or output (#6094) * Refactor transforms.cpp, no functional change * Use Op::is_input(), Op::is_output * Update configure_cmake.sh | 17 June 2021, 23:04:26 UTC |
d81f5c3 | Volodymyr Kysenko | 17 June 2021, 16:10:33 UTC | Provide bounds of rvars for all functions in the fused group (#6078) * Provide bounds of rvars for all functions in the fused group * Just use constant * Comments + rename variable | 17 June 2021, 16:10:33 UTC |
27ae113 | Steven Johnson | 17 June 2021, 00:57:38 UTC | [hannk] More Hygiene (#6093) * [hannk] More Hygiene - TensorStorage takes a more sensible set of args for ctor - Tensors don't need to be movable or copyable - Since we are now using C++17, we can use std::make_unique instead of make_op * Restore make_op * clang-format * Remove unnecessary TensorStorage methods | 17 June 2021, 00:57:38 UTC |
66ff71f | Steven Johnson | 16 June 2021, 23:33:35 UTC | [hannk] Cleanup: move SmallVector, Tensor to their own source files (#6091) * Move SmallVector, Tensor to their own files * cleanup | 16 June 2021, 23:33:35 UTC |
a590c17 | Steven Johnson | 16 June 2021, 23:30:47 UTC | [hannk] Remove unused Op::clone() methods (#6092) We don't call these anymore, so remove them and the related TensorMap code. | 16 June 2021, 23:30:47 UTC |
4fda2c6 | Volodymyr Kysenko | 16 June 2021, 17:28:21 UTC | Handle negative shifts in CodeGen_C (#6087) * Handle negative shifts in CodeGen_C * trigger buildbots * Emit code directly if shift was casted to signed int Co-authored-by: Steven Johnson <srj@google.com> | 16 June 2021, 17:28:21 UTC |
292a35a | Evan Lee | 15 June 2021, 20:06:31 UTC | Added fixes to issues regarding using HALIDE_DEBUG_MATCHED_RULES (#6088) * added operator<< for IsMaxValue, IsMinValue, and moved build_replacement(after) to be called before debug matched rules Co-authored-by: Steven Johnson <srj@google.com> | 15 June 2021, 20:06:31 UTC |
ebae3cd | dpalermo | 14 June 2021, 21:51:22 UTC | Changes for building with Hexagon SDK 4.3.0.0 & android-ndk-r19c (#6072) * Changes for building with Hexagon SDK 4.2.0.2 & android-ndk-r19c * Drop libsim_qurt_vtcm.a (now part of libsim_qurt.a) * Fix for clang-format-lint * Update to use SDK 4.3.0.0 / HEXAGON_Tools 8.4.11 * Updated binaries & README.md * trigger buildbots * Updated binaries after merge of master * Update SDK comment for >sm8350 Co-authored-by: Steven Johnson <srj@google.com> | 14 June 2021, 21:51:22 UTC |
98fdd9a | dpalermo | 10 June 2021, 01:44:16 UTC | Add more ways for DMA-BUF to fallback to libion.so (#6085) - Try to access libdmabufheap.so, if it succeeds try using DMA-BUF - If there are any errors seen with DMA-BUF, fallback to libion.so | 10 June 2021, 01:44:16 UTC |
f468bcd | Steven Johnson | 09 June 2021, 18:13:13 UTC | [hannk] Move flag-parsing code into ModelRunner (#6082) * [hannk] Move flag-parsing code into ModelRunner This allows compare_vs_tflite's main function to be very thin, so we don't have to replicate logic for different main() functions elsewhere * Update model_runner.h | 09 June 2021, 18:13:13 UTC |
ea1aabd | Steven Johnson | 09 June 2021, 15:58:41 UTC | Fix for upstream LLVM (Fixes #6079) (#6080) | 09 June 2021, 15:58:41 UTC |
cdac77b | Steven Johnson | 08 June 2021, 17:05:16 UTC | Require C++17 for Halide. (#5282) Require C++17 for Halide. | 08 June 2021, 17:05:16 UTC |
3b046ea | Steven Johnson | 07 June 2021, 20:57:24 UTC | Convert some Intrinsic calls to PureIntrinsic (#6070) * Convert some Intrinsic calls to PureIntrinsic * Fixes | 07 June 2021, 20:57:24 UTC |
d2ca93a | Dillon Sharlet | 07 June 2021, 19:33:31 UTC | Remove large_buffers flag (#6077) | 07 June 2021, 19:33:31 UTC |
af628a7 | Steven Johnson | 03 June 2021, 23:39:03 UTC | Fix dubious "gather" intrinsic for hvx (#6069) I'm not sure if this is a bug (per se) or not, but: We define an intrinsic for `Call::hvx_gather`, and at several points check for `is_intrinsic(Call::hvx_gather)`, but we never actually create such a Call. Instead, `make_gather` just uses the naked string `"gather"`, which is not the same thing. How is this working? (Is it working?) Opening this as a PR to gather input (no pun intended) about what's going on here. | 03 June 2021, 23:39:03 UTC |
2ca7e4e | Steven Johnson | 03 June 2021, 23:16:01 UTC | Merge branch 'master' into srj-llvm-loop-opt | 03 June 2021, 23:16:01 UTC |
00bfad7 | Alex Reinking | 02 June 2021, 22:33:45 UTC | Rework cross-compiling integration test to use simpler two-stage build (#6068) | 02 June 2021, 22:33:45 UTC |
8f849ae | dpalermo | 01 June 2021, 17:40:32 UTC | Add DMA-BUF support to host_malloc (#6042) * Add DMA-BUF support to host_malloc - use DMA-BUF if libdmabufheap.so is present - fallback to ION for older devices/OS - ION APIs are no longer supported on Android-S/12 - added context to the HAP_power_get/set calls - NULL power context not allowed on newer devices * Fixes for clang-format check * Fix more clang-format checks * Replace int with ion_user_handle_t and update comments * Sync changes to be closer to upstream * Updates from review * Remove unused attribute((weak)) protos * Add HAP_power_destroy/atexit * Add HAP_power_destroy/HAP_power_destroy_client for SDK 3.3.3 * Replace power_context malloc/free with address of global * Remove free_HAP_power_context * Update skel (since they snuck in anyway) * Just return address of global as power_context, don't store it * Updates from review | 01 June 2021, 17:40:32 UTC |
445ddd0 | Steven Johnson | 27 May 2021, 23:14:21 UTC | Allow hannk-delegate input and output tensors to share memory with tflite (#6030) * Allow hannk-delegate input and output tensors to share memory with tflite We previously allocated duplicate memory buffers for the input and output tensors, and just copied them back and forth as needed, which wastes memory and cycles. Now, we declare the input and output tensors to be 'external', and update the host pointer before every interpreter run. Note that dynamic tensors are still a bit of a special case: we still do duplicate allocations there (plus memcpy), because in the most general case we can't know the final size needed until we run the pipeline, but we need to allocate that output to run the pipeline. There are ways we could finesse this -- e.g., give dynamic tensors a lambda callback to allow them to resize the TFLite tensor and then use that storage -- but since dynamic tensors don't seem to be common or large in our test cases, I'm doing it this way for now. (I may circle back and try the lambda approach later.) Note that this moves the implicit init of `is_constant_` out of the Tensor ctor, and instead always requires an explicit call to `set_constant()`, which I think makes the situation much clearer. * Update hannk_delegate.cpp | 27 May 2021, 23:14:21 UTC |
a500607 | Dillon Sharlet | 27 May 2021, 18:03:48 UTC | Strengthen constant upper bound logic slightly (#6062) * Relax requirements for MemoryType::Register * Add comment | 27 May 2021, 18:03:48 UTC |
7b6be78 | Steven Johnson | 26 May 2021, 21:14:23 UTC | Revert nonsense | 26 May 2021, 21:14:23 UTC |
d878aea | Steven Johnson | 26 May 2021, 01:38:07 UTC | Update cuda.cpp | 26 May 2021, 01:38:07 UTC |
dee41ca | Steven Johnson | 26 May 2021, 01:23:47 UTC | Update cuda.cpp | 26 May 2021, 01:23:47 UTC |
beb795f | Steven Johnson | 26 May 2021, 01:10:56 UTC | Update cuda.cpp | 26 May 2021, 01:10:56 UTC |
36eae97 | Steven Johnson | 26 May 2021, 00:55:39 UTC | Update CMakeLists.txt | 26 May 2021, 00:55:39 UTC |
de4d4d7 | Steven Johnson | 26 May 2021, 00:55:07 UTC | Update CMakeLists.txt | 26 May 2021, 00:55:07 UTC |
43178e3 | Steven Johnson | 26 May 2021, 00:53:35 UTC | still more | 26 May 2021, 00:53:35 UTC |
0b1cb0d | Steven Johnson | 26 May 2021, 00:30:55 UTC | yep more | 26 May 2021, 00:30:55 UTC |
0b1eb79 | Steven Johnson | 26 May 2021, 00:05:26 UTC | still more debugging | 26 May 2021, 00:05:26 UTC |
e287112 | Steven Johnson | 25 May 2021, 23:10:42 UTC | more debugging | 25 May 2021, 23:10:42 UTC |
5b5f418 | Steven Johnson | 25 May 2021, 22:14:54 UTC | Add debugging info | 25 May 2021, 22:14:54 UTC |
f73bd8a | Steven Johnson | 25 May 2021, 18:22:25 UTC | Merge branch 'master' into srj-llvm-loop-opt | 25 May 2021, 18:22:25 UTC |
037d7ed | Steven Johnson | 24 May 2021, 20:47:23 UTC | Upgrade d8 version for wasm testing (#6055) We were using a variant of v8.9, but various late-breaking variations in the final spec implementation didn't make it into V8 until v9.1; using top-of-tree LLVM and EMCC require v9.1+ to avoid obscure errors. | 24 May 2021, 20:47:23 UTC |
d64b713 | Alexander Root | 23 May 2021, 15:06:01 UTC | Fix bounds information in ExprInfo for overflow in simplifier (#6012) * fix bounds in ExprInfo for overflow in simplifier * move bounds clearance to visitor on signed_integer_overflow * add test for bad Let behavior on overflow | 23 May 2021, 15:06:01 UTC |
677bf30 | Volodymyr Kysenko | 21 May 2021, 03:53:55 UTC | Scalar loads/stores shouldn't invalidate predicate vectorization (#6041) * Scalar loads/stores shouldn't invalidate predicate vectorization * Allow only scalar vars which don't depend on the vectorized var. * Add test * Fix build * Change test values and remove TODO, because now predicated load/stores are generated again | 21 May 2021, 03:53:55 UTC |
626c34a | Andrew Adams | 20 May 2021, 19:11:24 UTC | Don't emit aligned loads to unaligned addresses (#6047) * Don't emit aligned loads to unaligned addresses Fixes #6046 | 20 May 2021, 19:11:24 UTC |
c87976e | Dillon Sharlet | 20 May 2021, 19:00:07 UTC | Use signed exponents for conv too, clean up bounds query (#6050) | 20 May 2021, 19:00:07 UTC |
9ac150f | Alex Reinking | 20 May 2021, 17:26:51 UTC | Fix tgz/package.sh (#6048) * Use `realpath` instead of `readlink` * Pipe in toolchain file to tgz/package.sh | 20 May 2021, 17:26:51 UTC |
bfd5416 | Alex Reinking | 19 May 2021, 21:44:18 UTC | Bump Halide version to 13.0.0 (#6040) | 19 May 2021, 21:44:18 UTC |
4027dc4 | Steven Johnson | 19 May 2021, 21:20:43 UTC | Allow negative output shifts in depthwise_conv (#6039) At least one tflite model found in the wild uses this, so we must support it. | 19 May 2021, 21:20:43 UTC |
62eb147 | Dillon Sharlet | 19 May 2021, 21:20:26 UTC | Fix bug with tensor aliasing. (#6038) | 19 May 2021, 21:20:26 UTC |
b5a34c3 | Alex Reinking | 19 May 2021, 20:47:20 UTC | Update README for Halide 12 release. (#6034) | 19 May 2021, 20:47:20 UTC |
1c0ff0f | Alex Reinking | 19 May 2021, 20:11:38 UTC | Fix Windows ZIP package script. (#6035) | 19 May 2021, 20:11:38 UTC |
dfe0f97 | Steven Johnson | 19 May 2021, 19:57:54 UTC | Enable a wasm-simd op in simd_op_check that is now generated in LLVM13 (#6024) | 19 May 2021, 19:57:54 UTC |
6a1e529 | Steven Johnson | 19 May 2021, 17:21:37 UTC | Remove duplicate -e argument in bilateral_grid (#6008) (#6033) | 19 May 2021, 17:21:37 UTC |
5bd5a04 | Dillon Sharlet | 19 May 2021, 05:17:07 UTC | Also invaldiate alignment if the type can't represent it. (#6032) | 19 May 2021, 05:17:07 UTC |