https://github.com/halide/Halide
- HEAD
- refs/heads/Halide_unsharp
- refs/heads/abadams/align_strided_const_loads
- refs/heads/abadams/alloca
- refs/heads/abadams/atomic_parallel_compiled_in
- refs/heads/abadams/atomic_vector_non_recursive
- refs/heads/abadams/averaging_tree
- refs/heads/abadams/avoid_name_mangling_in_cross_module_dependencies
- refs/heads/abadams/better_absd
- refs/heads/abadams/better_codegen_for_non_const_ramps
- refs/heads/abadams/bgu_cholesky
- refs/heads/abadams/braces_around_statements
- refs/heads/abadams/cache_tighten_producer_consumer_nodes
- refs/heads/abadams/check_reorder_dups
- refs/heads/abadams/clarify_broadcast_shuffle
- refs/heads/abadams/compositing_app
- refs/heads/abadams/cond_wait_spin
- refs/heads/abadams/cse_in_unroll_split_tuples
- refs/heads/abadams/custom_cuda_context
- refs/heads/abadams/custom_cuda_context_2
- refs/heads/abadams/custom_cuda_context_3
- refs/heads/abadams/d3d12abi
- refs/heads/abadams/deflake_mullapudi_reorder
- refs/heads/abadams/delete_prepare_for_early_exit
- refs/heads/abadams/depthwise_separable_conv
- refs/heads/abadams/diagnose_boundary_condition_failure
- refs/heads/abadams/disable_onnx_app_on_mac
- refs/heads/abadams/divide_using_pavgw
- refs/heads/abadams/dont_link_to_cudart
- refs/heads/abadams/dont_reinterpret_concat
- refs/heads/abadams/early_out
- refs/heads/abadams/enable_f16c
- refs/heads/abadams/extract_concat_bits
- refs/heads/abadams/fast_integer_divide_round_to_zero
- refs/heads/abadams/faster_runtime_integer_division
- refs/heads/abadams/faster_unroll
- refs/heads/abadams/fix-arm-seg2
- refs/heads/abadams/fix_4211
- refs/heads/abadams/fix_5323
- refs/heads/abadams/fix_5329
- refs/heads/abadams/fix_5889
- refs/heads/abadams/fix_6984
- refs/heads/abadams/fix_7229
- refs/heads/abadams/fix_7260
- refs/heads/abadams/fix_7365
- refs/heads/abadams/fix_7374
- refs/heads/abadams/fix_7504
- refs/heads/abadams/fix_7514
- refs/heads/abadams/fix_7531
- refs/heads/abadams/fix_7584
- refs/heads/abadams/fix_7584_v2
- refs/heads/abadams/fix_7742
- refs/heads/abadams/fix_7756
- refs/heads/abadams/fix_7761
- refs/heads/abadams/fix_7768
- refs/heads/abadams/fix_7786
- refs/heads/abadams/fix_7810
- refs/heads/abadams/fix_7811
- refs/heads/abadams/fix_7815
- refs/heads/abadams/fix_7867
- refs/heads/abadams/fix_7871
- refs/heads/abadams/fix_7872
- refs/heads/abadams/fix_7873
- refs/heads/abadams/fix_7888
- refs/heads/abadams/fix_7890
- refs/heads/abadams/fix_7891
- refs/heads/abadams/fix_7892
- refs/heads/abadams/fix_7893
- refs/heads/abadams/fix_7906
- refs/heads/abadams/fix_7909
- refs/heads/abadams/fix_7968
- refs/heads/abadams/fix_8038
- refs/heads/abadams/fix_8054
- refs/heads/abadams/fix_arm_fcvtmp
- refs/heads/abadams/fix_autoschedule_feature_transposition
- refs/heads/abadams/fix_cse_name_collisions
- refs/heads/abadams/fix_cuda_mat_mul_assert
- refs/heads/abadams/fix_deinterleave_bug
- refs/heads/abadams/fix_deinterleave_for_reinterpret
- refs/heads/abadams/fix_div_round_to_zero
- refs/heads/abadams/fix_fft_compile_time_regression
- refs/heads/abadams/fix_generate_output_snippets
- refs/heads/abadams/fix_if_nesting_condition
- refs/heads/abadams/fix_leaks_in_memoize_test
- refs/heads/abadams/fix_lgtm_warnings
- refs/heads/abadams/fix_links_to_master
- refs/heads/abadams/fix_load_of_broadcast
- refs/heads/abadams/fix_lossless_cast_of_sub
- refs/heads/abadams/fix_onnx_app
- refs/heads/abadams/fix_pointless_lower_condition
- refs/heads/abadams/fix_potential_gpu_deadlock
- refs/heads/abadams/fix_realize_condition_depends_on_tuple
- refs/heads/abadams/fix_reduce_expr_modulo_of_vector
- refs/heads/abadams/fix_riscv_vx_vi
- refs/heads/abadams/fix_round
- refs/heads/abadams/fix_stencil_chain_gpu_schedule
- refs/heads/abadams/fix_track_bounds_intervals
- refs/heads/abadams/fix_tutorial_2
- refs/heads/abadams/forward_partition_methods
- refs/heads/abadams/fully_fused_depthwise_separable_conv
- refs/heads/abadams/fuzz_sliding_window
- refs/heads/abadams/gaussian_blur_app
- refs/heads/abadams/generator_infinite_default_timeout
- refs/heads/abadams/gpu_autoscheduler_parallel_random_probes
- refs/heads/abadams/include_riscv_in_readme
- refs/heads/abadams/interleave_nested_vector
- refs/heads/abadams/ir_match_by_ref
- refs/heads/abadams/lerp_plus_cast
- refs/heads/abadams/local_laplacian_code_size
- refs/heads/abadams/lower_halving_sub
- refs/heads/abadams/lower_rounding_shift_right
- refs/heads/abadams/mac-arm-fixes
- refs/heads/abadams/make_fast_inverse_test_throughput_limited
- refs/heads/abadams/makefile_serialization_support
- refs/heads/abadams/mismatched_new_delete
- refs/heads/abadams/mixed_sign_mul_shift_right
- refs/heads/abadams/mixed_width_mul_shift_right
- refs/heads/abadams/multiple_scatter
- refs/heads/abadams/mux_intrinsic
- refs/heads/abadams/name_helpers
- refs/heads/abadams/narrow_predicates
- refs/heads/abadams/nested_vectorization_compile_time_regression_fix
- refs/heads/abadams/nested_vectorization_tweaks
- refs/heads/abadams/parallel_simd_op_check
- refs/heads/abadams/per_instance_profiling
- refs/heads/abadams/precompute_shared_mem_size
- refs/heads/abadams/prefer_no_gather
- refs/heads/abadams/print_uncaught_exception
- refs/heads/abadams/promote_fixed_point_intrinsics
- refs/heads/abadams/psabdw
- refs/heads/abadams/random_pipelines
- refs/heads/abadams/rationalize_gpu_for_loop_names
- refs/heads/abadams/reenable_unscheduled_stage_warning
- refs/heads/abadams/reinterpret_vector
- refs/heads/abadams/remove_arch_os_for_shaders
- refs/heads/abadams/remove_bad_pruning
- refs/heads/abadams/remove_parameter_self_references
- refs/heads/abadams/remove_readnone_on_functions
- refs/heads/abadams/remove_use_of_python_config_in_onnx_makefile
- refs/heads/abadams/reschedule_bgu
- refs/heads/abadams/reschedule_bilateral_grid
- refs/heads/abadams/rewrite_atomic_pass
- refs/heads/abadams/rounding_shift_right_use_average
- refs/heads/abadams/rungenmain_error
- refs/heads/abadams/sampling_profiler_overhead_v2
- refs/heads/abadams/scope_improvements
- refs/heads/abadams/simpler_broadcasts
- refs/heads/abadams/simplify_correlated_pyramid
- refs/heads/abadams/siotas_20
- refs/heads/abadams/sioutas_20
- refs/heads/abadams/slide_over_split_loop
- refs/heads/abadams/sorting_network_working_branch
- refs/heads/abadams/stable_topological_order
- refs/heads/abadams/string_view
- refs/heads/abadams/strip_asserts_last
- refs/heads/abadams/switch_stmt
- refs/heads/abadams/target_specific_lerp
- refs/heads/abadams/time_lowering_passes
- refs/heads/abadams/track_failedness_through_solver_lets
- refs/heads/abadams/turn_off_slp_vectorization_for_avx512
- refs/heads/abadams/tweak_unpack_buffers
- refs/heads/abadams/undo_pointless_widening
- refs/heads/abadams/unordered_blocks
- refs/heads/abadams/unsigned_demosaic
- refs/heads/abadams/update_makefile_for_llvm_19
- refs/heads/abadams/use_arm_for_runtime_triple
- refs/heads/abadams/use_pmaddubsw_for_downsample
- refs/heads/abadams/validate_gpu_schedules
- refs/heads/abadams/vector_reduce_hexagon_predicate
- refs/heads/abadams/vector_scan
- refs/heads/abadams/vst_type_fix
- refs/heads/abadams/widening_let_bug
- refs/heads/abadams/x86_avg
- refs/heads/abadams/zen4
- refs/heads/adadams/profile_allocator
- refs/heads/add_image_checks_after_bounds_inference_plus_new_rules
- refs/heads/add_outermost_to_extern
- refs/heads/add_vectorization_to_search_space
- refs/heads/aelphy/feature_cadence_changes
- refs/heads/aelphy/float_extracts
- refs/heads/align_loads_comment_fix
- refs/heads/alina-strided-store
- refs/heads/another_buffer_copy_fix
- refs/heads/arm_sve_redux
- refs/heads/ataei-block_asserts-codegen
- refs/heads/ataei-debug_info
- refs/heads/ataei-fix-pow
- refs/heads/ataei-gen_str_param
- refs/heads/ataei-implicit_lhs_vars
- refs/heads/ataei-onnx
- refs/heads/ataei-onnx_converter_update
- refs/heads/ataei-onnx_pybind
- refs/heads/ataei-resnet50_benchmarks
- refs/heads/ataei-standalone_autoscheduler
- refs/heads/ataei_lots_of_inputs
- refs/heads/auto_sched_benchmarks
- refs/heads/auto_sched_estimates
- refs/heads/auto_sched_inline
- refs/heads/auto_sched_test_notparallel
- refs/heads/autoschedule_top_down
- refs/heads/autoschedule_with_convnet
- refs/heads/autoscheduler_scalar_imageparam_fix
- refs/heads/backports/10.x
- refs/heads/backports/11.x
- refs/heads/backports/12.x
- refs/heads/backports/13.x
- refs/heads/balance_expressions
- refs/heads/bazel
- refs/heads/benchmarks
- refs/heads/blaze
- refs/heads/bounds_buffer_lets_fix
- refs/heads/bounds_correct_vs_bounds_loaded_reduced
- refs/heads/buffer_device_api_target
- refs/heads/bug_device_free
- refs/heads/bug_inline_unbounded
- refs/heads/build/fix-xcode-2
- refs/heads/build/manylinux-fixes
- refs/heads/circ_buffer
- refs/heads/cmake-no-runtime-debug-symbols
- refs/heads/cmake/asan
- refs/heads/cmake/deps-cleanup
- refs/heads/cmake/find-modules
- refs/heads/cmake/spirv
- refs/heads/cmake_wasm_features
- refs/heads/compute_at_guard_with_if_goes_on_stack
- refs/heads/compute_with_at
- refs/heads/compute_with_check
- refs/heads/compute_with_excessive_bounds
- refs/heads/compute_with_inlined
- refs/heads/compute_with_remove_is_right_level
- refs/heads/cpack/nuget
- refs/heads/ctest/wrappers
- refs/heads/cuda-constant
- refs/heads/d3d12-allocation-cache
- refs/heads/deferred_cse_after_inlining
- refs/heads/destructor_calls_deinit
- refs/heads/dg/deserialize_unmapped_objects
- refs/heads/dg/fix_vulkan_codegen_bool_conversion
- refs/heads/dg/vulkan_conform_api
- refs/heads/dg/vulkan_region_allocator_fixes
- refs/heads/dgerstmann/fix-vulkan-memory-config-init
- refs/heads/disable_acquire_release_test_vulkan
- refs/heads/distinct_wrapper_names
- refs/heads/dkg/6863_asan_fixes
- refs/heads/dkg/vulkan
- refs/heads/dpalermo_dmabuf
- refs/heads/dpalermo_dmabuf_libion
- refs/heads/dpalermo_hexagon_remote_202003
- refs/heads/dpalermo_sdk4_2_0_2
- refs/heads/ds/buffer-get-pure
- refs/heads/ds/opt-tile-size
- refs/heads/ds/tail-none
- refs/heads/ds/while
- refs/heads/dsharletg/bitwise-intrinsics
- refs/heads/dsharletg/find-vector-reduce
- refs/heads/dsharletg/jit-optimization
- refs/heads/dsharletg/memcpy-copy_from
- refs/heads/dsharletg/pattern-headroom
- refs/heads/dsharletg/refactor-host-alignment
- refs/heads/dsharletg/runtime-size
- refs/heads/dsharletg/simplify-abs
- refs/heads/dsharletg/simplify-type-bounds
- refs/heads/dsharletg/specialize-bounds
- refs/heads/dsharletg/upsample-channels
- refs/heads/empty_prefetch
- refs/heads/emscripten_vector_fix
- refs/heads/export_all-wsmoses
- refs/heads/expr_auto_sched
- refs/heads/extern_bugs
- refs/heads/extern_host_alloc
- refs/heads/factor_parallel_codegen_hack
- refs/heads/fast_sync_tsan
- refs/heads/faster_integer_division
- refs/heads/feature/apps-external
- refs/heads/feature/cmake-presets
- refs/heads/feature/convert
- refs/heads/feature/f16_interleave
- refs/heads/feature/gather_load_q7
- refs/heads/feature/llvm-codemodel
- refs/heads/feature/load_predicated
- refs/heads/feature/luma_regression
- refs/heads/feature/maintanence
- refs/heads/feature/reinterprets
- refs/heads/feature/tcm_bump_allocator
- refs/heads/feature/xtensa_fix_interleave_q8
- refs/heads/feature/xtensa_q8_tests
- refs/heads/find_intrinsics_issue
- refs/heads/find_intrinsics_widening_lets
- refs/heads/fix-floated-pure-stage
- refs/heads/fix-race-condition
- refs/heads/fix_hexagon_alignment
- refs/heads/fix_hvx_intrinsics
- refs/heads/fix_prefetch_test
- refs/heads/fix_windows_vs15_build
- refs/heads/fixed_length_vectors
- refs/heads/fixed_point_local_laplac
- refs/heads/gemmlowp
- refs/heads/generate
- refs/heads/gha/pip
- refs/heads/gpu_canon_fix
- refs/heads/halide_ir_flatbuffer
- refs/heads/hex_dma2_async
- refs/heads/hexagon_le_runtime
- refs/heads/hexagon_priority
- refs/heads/hexagon_setpriority
- refs/heads/hexagon_strided_pred_load
- refs/heads/hexagon_sysmon_markers
- refs/heads/imaging-synthesis
- refs/heads/includes_fix
- refs/heads/ios_fast_sync_fix
- refs/heads/jia-kai-fix-runtime-cuda-init
- refs/heads/kamil-openglcompute-infinity
- refs/heads/kamil/name_pthread_workers
- refs/heads/kp_bit_shift
- refs/heads/line_buffer
- refs/heads/loop_carry_not_working
- refs/heads/lower_on_huge_stack
- refs/heads/main
- refs/heads/master
- refs/heads/memoize_with_extents
- refs/heads/metal_float16
- refs/heads/metaprogrammed_simplifier_mod
- refs/heads/mohamedadaly-vmlal
- refs/heads/more_powerful_sliding
- refs/heads/new_autoschedule_with_new_simplifier_arm_worker_branch
- refs/heads/new_autoscheduler
- refs/heads/new_simplifier_rule_testing
- refs/heads/newer_ion_ioctl
- refs/heads/no_bounds_query_when_bounds_used
- refs/heads/opengl_compute_buffer_types_fix
- refs/heads/openglcompute_reuse_shared_allocations
- refs/heads/optmize_reorder
- refs/heads/par_for_opt
- refs/heads/pdb/fix_7806
- refs/heads/pdb/hexagon_remote_cmake
- refs/heads/pdb_add_libcpp_makefile_inc
- refs/heads/pdb_eliminate_interleaves_test
- refs/heads/pdb_fix_clang_build
- refs/heads/pdb_fix_install_qc
- refs/heads/pdb_fix_loop_carry
- refs/heads/pdb_fix_simd_op_check_hvx
- refs/heads/pdb_mul_div_mod_multi_thread
- refs/heads/pdb_remove_hvx_v64
- refs/heads/perform_inline_with_order
- refs/heads/pr/2572
- refs/heads/pr/2676
- refs/heads/pr/2975
- refs/heads/pr/3017
- refs/heads/pr/3081
- refs/heads/pr/3387
- refs/heads/pr/3939
- refs/heads/pr/3960
- refs/heads/pr/4380
- refs/heads/pr/4414
- refs/heads/pr/5331
- refs/heads/pr/5438
- refs/heads/pr/5455
- refs/heads/pr/5758_2
- refs/heads/predicated_vector
- refs/heads/prefetch_specialize
- refs/heads/print_schedule
- refs/heads/profile_hardware_counters
- refs/heads/random-pipelines
- refs/heads/rdom_with_pure_vars
- refs/heads/readme-fix-gcd
- refs/heads/realization_order
- refs/heads/refactor_module
- refs/heads/register_promotion
- refs/heads/release/10.x
- refs/heads/release/11.x
- refs/heads/release/12.x
- refs/heads/release/13.x
- refs/heads/release/14.x
- refs/heads/release/15.x
- refs/heads/release/16.x
- refs/heads/release/17.x
- refs/heads/release/8.x
- refs/heads/remove_max_on_fuse_factor
- refs/heads/reorder_rvar
- refs/heads/reset_unique_counter
- refs/heads/revert-3612-ataei-speedup_compiletime
- refs/heads/revert-7009-rootjalex/distribute-w_shl
- refs/heads/revert-7601-compile_hexagon_remote
- refs/heads/riscv_update
- refs/heads/rl_simplifier_rules
- refs/heads/rootjalex/add_simpl_rules
- refs/heads/rootjalex/arm-optimize
- refs/heads/rootjalex/autoscheduler_mcts
- refs/heads/rootjalex/bounds-rewriter
- refs/heads/rootjalex/bounds_synthesis
- refs/heads/rootjalex/cbounds
- refs/heads/rootjalex/cbounds_predicated
- refs/heads/rootjalex/fix-sat-overflow
- refs/heads/rootjalex/fix_estimate_issue
- refs/heads/rootjalex/fix_failed_unrolls
- refs/heads/rootjalex/gsoc_codegen
- refs/heads/rootjalex/improve_cbounds_fixed
- refs/heads/rootjalex/improve_constant_bounds
- refs/heads/rootjalex/pitchfork-arm
- refs/heads/rootjalex/reinterpret-simplify
- refs/heads/rootjalex/rts
- refs/heads/rootjalex/super_simplify_bounds
- refs/heads/rootjalex/test_cbounds_fixed
- refs/heads/rootjalex/test_constant_bounds
- refs/heads/rootjalex/trs-codegen
- refs/heads/rootjalex/trs-codegen-cross
- refs/heads/rootjalex/trs-merge
- refs/heads/rootjalex/uint32-int32-cast
- refs/heads/rootjalex/x86-hadds
- refs/heads/rootjalex/x86-optimize
- refs/heads/rootjalex/x86-optimize-test
- refs/heads/rootjalex/x86-sat
- refs/heads/rootjalex/x86-test
- refs/heads/rule_removal_experiments
- refs/heads/schedule-output-storage
- refs/heads/separate_bounds_query_entrypoint
- refs/heads/shallow
- refs/heads/shift_amount_type_change
- refs/heads/shoaibkamil/cmake-without-arm
- refs/heads/shoaibkamil/correct_memory_fences
- refs/heads/shoaibkamil/d3d-fixes
- refs/heads/shoaibkamil/deprecate_openglcompute
- refs/heads/shoaibkamil/json
- refs/heads/shoaibkamil/llvm_clone_tag
- refs/heads/shoaibkamil/minor-vcpkg-doc-change
- refs/heads/shoaibkamil/opengl_compute_tests
- refs/heads/shoaibkamil/performance_tests_as_generators
- refs/heads/shoaibkamil/rule_removal_experiments
- refs/heads/shoaibkamil/super_simplify_with_interpreter
- refs/heads/shoaibkamil/windows-arm-fix-attributes
- refs/heads/sim_shlib_addr_print
- refs/heads/simplify-nested-broadcasts
- refs/heads/simplify-vectorreduce-shuffles2
- refs/heads/simplify_mod
- refs/heads/sioutas_2020
- refs/heads/sioutas_2020_autoscheduler
- refs/heads/slomp/gpu-codegen-profiling
- refs/heads/slomp/msvc-static-analysis
- refs/heads/solve_div
- refs/heads/solve_div_master
- refs/heads/solve_div_simplifier_test
- refs/heads/sr/python-late-binding-defaults
- refs/heads/srj-aaa
- refs/heads/srj-alloc
- refs/heads/srj-alloca
- refs/heads/srj-appmake2
- refs/heads/srj-armv83a
- refs/heads/srj-aslog
- refs/heads/srj-assert
- refs/heads/srj-assoc
- refs/heads/srj-auto-multi
- refs/heads/srj-auto-multi2
- refs/heads/srj-auto_schedule_mat_mul
- refs/heads/srj-autosched
- refs/heads/srj-b2cpphide
- refs/heads/srj-barr
- refs/heads/srj-bits
- refs/heads/srj-blacklist
- refs/heads/srj-bounds
- refs/heads/srj-bufcalltype
- refs/heads/srj-bufcallwrap
- refs/heads/srj-bufcallwrap2
- refs/heads/srj-buffer
- refs/heads/srj-bv
- refs/heads/srj-classic-autotune
- refs/heads/srj-clean
- refs/heads/srj-constcall
- refs/heads/srj-crosscompile
- refs/heads/srj-ctlz
- refs/heads/srj-cvec-patch
- refs/heads/srj-dag
- refs/heads/srj-debug-to-file
- refs/heads/srj-deir
- refs/heads/srj-f16
- refs/heads/srj-fp16
- refs/heads/srj-fsch
- refs/heads/srj-fthru
- refs/heads/srj-g2
- refs/heads/srj-g3
- refs/heads/srj-gha-test-fixes
- refs/heads/srj-hidden
- refs/heads/srj-hide2
- refs/heads/srj-hvx
- refs/heads/srj-hvx-bug
- refs/heads/srj-hvx-codegen-bug
- refs/heads/srj-hvx-nocopy
- refs/heads/srj-hvxshift
- refs/heads/srj-iib
- refs/heads/srj-initshape
- refs/heads/srj-inv
- refs/heads/srj-ir
- refs/heads/srj-irmut2
- refs/heads/srj-iwyu
- refs/heads/srj-iwyu3
- refs/heads/srj-javascript_work_in_progress
- refs/heads/srj-lensblur
- refs/heads/srj-lessinc
- refs/heads/srj-llvm-loop-opt
- refs/heads/srj-mak
- refs/heads/srj-maxthreads
- refs/heads/srj-mod
- refs/heads/srj-msan
- refs/heads/srj-msan-call
- refs/heads/srj-muldivmod
- refs/heads/srj-mut
- refs/heads/srj-outputs-2
- refs/heads/srj-parse
- refs/heads/srj-pch
- refs/heads/srj-printfunc
- refs/heads/srj-pygp
- refs/heads/srj-revertbits
- refs/heads/srj-schedule-storage
- refs/heads/srj-shl-shr-2
- refs/heads/srj-sio
- refs/heads/srj-static-const
- refs/heads/srj-strided-store
- refs/heads/srj-tidyh
- refs/heads/srj-tiff
- refs/heads/srj-trace
- refs/heads/srj-tutorial
- refs/heads/srj-using
- refs/heads/srj-wasmfix
- refs/heads/srj-xor2
- refs/heads/srj/abstract-gen-without-get-output-func-KEEP
- refs/heads/srj/aligned-alloc
- refs/heads/srj/aligned-alloc-2
- refs/heads/srj/aligned-malloc-with-aligned-alloc
- refs/heads/srj/all-explicit-ctor
- refs/heads/srj/anderson-thread-info-ptr
- refs/heads/srj/aot-perf
- refs/heads/srj/argv-signatures
- refs/heads/srj/argv-types
- refs/heads/srj/async-test
- refs/heads/srj/b2cpp-const-data
- refs/heads/srj/better-xt-dispatch
- refs/heads/srj/bfloat1
- refs/heads/srj/bp
- refs/heads/srj/build_halide_h
- refs/heads/srj/c-bool
- refs/heads/srj/cache-clear
- refs/heads/srj/clang-fmt-ignore
- refs/heads/srj/clang-tidy
- refs/heads/srj/clear-c-cache
- refs/heads/srj/cmake-asan
- refs/heads/srj/cmake-asan2
- refs/heads/srj/cmake-jit-generators
- refs/heads/srj/configure-cmake
- refs/heads/srj/cpp-generator-v2-experiment-KEEP
- refs/heads/srj/crosscompile
- refs/heads/srj/ctad
- refs/heads/srj/depr
- refs/heads/srj/deprecation
- refs/heads/srj/device-copy
- refs/heads/srj/example
- refs/heads/srj/experiment
- refs/heads/srj/experiment-6967
- refs/heads/srj/exporting
- refs/heads/srj/expr_t
- refs/heads/srj/external-tensors
- refs/heads/srj/fix-pytorch
- refs/heads/srj/fixed-rollback
- refs/heads/srj/fopen-fix
- refs/heads/srj/forward
- refs/heads/srj/forward-name
- refs/heads/srj/gen-func
- refs/heads/srj/gen-func-2
- refs/heads/srj/gen-func-3
- refs/heads/srj/gen2-1
- refs/heads/srj/gen_closure
- refs/heads/srj/generator_aot_gpu_multi_context_threaded
- refs/heads/srj/globals
- refs/heads/srj/halide-buffer-crop
- refs/heads/srj/halide-malloc-alignment
- refs/heads/srj/halide-must-use
- refs/heads/srj/halide-runtime-must-use-result
- refs/heads/srj/hang-repro
- refs/heads/srj/hannk
- refs/heads/srj/hannk-aliasing
- refs/heads/srj/hannk-error-checking
- refs/heads/srj/hannk-errors
- refs/heads/srj/hannk-inplace
- refs/heads/srj/hannk-mmap
- refs/heads/srj/hannk-tflite-27
- refs/heads/srj/hannk-verbosity
- refs/heads/srj/hdrs
- refs/heads/srj/html-becomes-viz
- refs/heads/srj/implicit-mult-widening
- refs/heads/srj/issue-7076
- refs/heads/srj/iwyu
- refs/heads/srj/iwyu-2
- refs/heads/srj/iwyu-6
- refs/heads/srj/libHANNK
- refs/heads/srj/llvm_type_of
- refs/heads/srj/maybe-unused
- refs/heads/srj/meanop
- refs/heads/srj/metadata-calling-convention
- refs/heads/srj/more-tidy
- refs/heads/srj/msan-dtf
- refs/heads/srj/multimeta
- refs/heads/srj/nanobind
- refs/heads/srj/new-rt-1
- refs/heads/srj/no-threadpool
- refs/heads/srj/no-timeout-thread
- refs/heads/srj/oglc-mutexed
- refs/heads/srj/param-map
- refs/heads/srj/pip-15.x
- refs/heads/srj/pip-cron
- refs/heads/srj/possible-uninited
- refs/heads/srj/pr-7566
- refs/heads/srj/printer-size
- refs/heads/srj/profiler-data-race
- refs/heads/srj/ptr-int-cast
- refs/heads/srj/pyapps
- refs/heads/srj/pyext-fix
- refs/heads/srj/pygen-class
- refs/heads/srj/pygen-deux
- refs/heads/srj/pygen-func
- refs/heads/srj/pygen-native-types
- refs/heads/srj/pyinstall
- refs/heads/srj/pypi-try
- refs/heads/srj/pystuff
- refs/heads/srj/python-buffer-unpack
- refs/heads/srj/python-tutorial
- refs/heads/srj/reshape
- refs/heads/srj/rt-error-smallify
- refs/heads/srj/rt-return-types
- refs/heads/srj/runtime-error-handling
- refs/heads/srj/sat-fixes-exp
- refs/heads/srj/sat-fixes-exp-2
- refs/heads/srj/shadow-field
- refs/heads/srj/snprintf
- refs/heads/srj/spirv-license
- refs/heads/srj/stat-buf-deprecations
- refs/heads/srj/static-buffer-generators
- refs/heads/srj/stmt-html
- refs/heads/srj/stringify
- refs/heads/srj/synth-gen-params
- refs/heads/srj/synth-params-python
- refs/heads/srj/test-arm_sve_redux
- refs/heads/srj/test-intrinsics-bounds
- refs/heads/srj/test8076
- refs/heads/srj/test8078
- refs/heads/srj/test8094
- refs/heads/srj/test8105a
- refs/heads/srj/test8115
- refs/heads/srj/test_tmpdir_fix
- refs/heads/srj/tidy
- refs/heads/srj/tidy-format-14
- refs/heads/srj/tidymore
- refs/heads/srj/tidymore2
- refs/heads/srj/tls
- refs/heads/srj/tls-3
- refs/heads/srj/tls-4
- refs/heads/srj/tls-ucon
- refs/heads/srj/tmp-unschedule-experiment
- refs/heads/srj/tot-fix
- refs/heads/srj/try-revert-sat
- refs/heads/srj/type-traits
- refs/heads/srj/typed-func
- refs/heads/srj/ucon-all-const
- refs/heads/srj/ucon-non-const
- refs/heads/srj/visit-warnings
- refs/heads/srj/wasm-atomic2
- refs/heads/srj/wasm-simd
- refs/heads/srj/wasm-stuff
- refs/heads/srj/wasm-threads
- refs/heads/srj/wasm-updates
- refs/heads/srj/wasm-work
- refs/heads/srj/wip
- refs/heads/srj/x-rounding
- refs/heads/srj/xbuf
- refs/heads/srj/xc+plus+size+tmp
- refs/heads/srj/xc-types
- refs/heads/srj/xt-uint-cast-test
- refs/heads/srj/xtensa-arch
- refs/heads/srj/xtensa-merge
- refs/heads/srj/xvc-experimetn
- refs/heads/srj/zlib-embed
- refs/heads/standalone_autoscheduler
- refs/heads/standalone_autoscheduler_arm_worker
- refs/heads/standalone_autoscheduler_arm_worker_amazon
- refs/heads/standalone_autoscheduler_gpu
- refs/heads/standalone_autoscheduler_hexagon
- refs/heads/sticky_task_assignments
- refs/heads/store_with
- refs/heads/store_with_solver_for_super_simplify
- refs/heads/strict_float_cse_fix
- refs/heads/super_simplify
- refs/heads/super_simplify_v2
- refs/heads/super_simplify_v3
- refs/heads/transitive_wrapper
- refs/heads/trigger-release-v16
- refs/heads/tzumao-autodiff-boundarycond
- refs/heads/tzumao-gradient-autoscheduler-bug
- refs/heads/tzumao-predicate-store-load
- refs/heads/tzumao-python-buffer
- refs/heads/tzumao_autodiff_unbounded
- refs/heads/tzumao_improve_gradient_autoscheduler
- refs/heads/tzumao_issue_4297
- refs/heads/tzumao_licm_before_BI
- refs/heads/unbounded_bugs
- refs/heads/undo_async_copy_chain_black_list
- refs/heads/use_string_literals_for_blobs
- refs/heads/users/lukas/python-pip
- refs/heads/validate_sched_error_msg
- refs/heads/var_ir_fix
- refs/heads/vksnk/async-experiment
- refs/heads/vksnk/async-multiple-producers
- refs/heads/vksnk/async-order
- refs/heads/vksnk/better-loop-carry
- refs/heads/vksnk/better-message
- refs/heads/vksnk/bound-storage
- refs/heads/vksnk/bounds-widen-right
- refs/heads/vksnk/c-print-type
- refs/heads/vksnk/c-round
- refs/heads/vksnk/check-return-result
- refs/heads/vksnk/compute-with-bug
- refs/heads/vksnk/compute_with_async
- refs/heads/vksnk/dma-limit-channels
- refs/heads/vksnk/dma-min-max
- refs/heads/vksnk/expr-match-shuffle
- refs/heads/vksnk/extract-from-scalar
- refs/heads/vksnk/f16-load
- refs/heads/vksnk/fix-packvr
- refs/heads/vksnk/fix_halide_xtensa_narrow_with_rounding_shift_i16
- refs/heads/vksnk/fused-compute-with
- refs/heads/vksnk/hoist-storage-bug
- refs/heads/vksnk/lerp-intrinsics
- refs/heads/vksnk/lower-signed-shifts
- refs/heads/vksnk/missing-exception
- refs/heads/vksnk/non-widening-halves
- refs/heads/vksnk/optimize-shuffles
- refs/heads/vksnk/replace-all
- refs/heads/vksnk/restrict
- refs/heads/vksnk/roll-buffer
- refs/heads/vksnk/roundeven-arm
- refs/heads/vksnk/rvar-bounds
- refs/heads/vksnk/simplify-slice
- refs/heads/vksnk/skip-semaphores
- refs/heads/vksnk/storage-folding
- refs/heads/vksnk/strided-load-of-4_2
- refs/heads/vksnk/typed-scope
- refs/heads/vksnk/update-simd-driver
- refs/heads/vksnk/vectorize-bug
- refs/heads/vksnk/vectorize-scalarize
- refs/heads/vksnk/widening_absd
- refs/heads/vksnk/xtensa-codegen-fp16
- refs/heads/vksnk/xtensa-dma-improvements
- refs/heads/vksnk/xtensa-regroup-pass
- refs/heads/vksnk/xtensa/lift-allocs
- refs/heads/vulkan
- refs/heads/vulkan-diagnose-alloc-failures
- refs/heads/vulkan-phase0-adts
- refs/heads/vulkan-phase1-spirv
- refs/heads/vulkan-phase2-runtime
- refs/heads/vulkan2
- refs/heads/vulkan_fix_gpu_dynamic_shared_test
- refs/heads/vulkan_fix_subregion_memory_offsets
- refs/heads/webassembly-old
- refs/heads/winograd
- refs/heads/wording_fix
- refs/heads/xtensa-codegen
- refs/heads/xtensa-codegen-parallel
- refs/heads/xuanda/fix-serialize-bad-partition-always
- refs/remotes/origin/rootjalex/add_autosched_caching
- refs/tags/release_2018_02_15
- refs/tags/release_2019_08_27
- refs/tags/release_8.0.0
- refs/tags/v10.0.0
- refs/tags/v10.0.1
- refs/tags/v11.0.0
- refs/tags/v11.0.1
- refs/tags/v12.0.0
- refs/tags/v12.0.1
- refs/tags/v13.0.0
- refs/tags/v13.0.1
- refs/tags/v13.0.2
- refs/tags/v13.0.3
- refs/tags/v13.0.4
- refs/tags/v14.0.0
- refs/tags/v15.0.0
- refs/tags/v15.0.1
- refs/tags/v16.0.0
- refs/tags/v17.0.0
- refs/tags/v17.0.1
- refs/tags/v8.0.0
Take a new snapshot of a software origin
If the archived software origin currently browsed is not synchronized with its upstream version (for instance when new commits have been issued), you can explicitly request Software Heritage to take a new snapshot of it.
Use the form below to proceed. Once a request has been submitted and accepted, it will be processed as soon as possible. You can then check its processing state by visiting this dedicated page.Processing "take a new snapshot" request ...
Permalinks
To reference or cite the objects present in the Software Heritage archive, permalinks based on SoftWare Hash IDentifiers (SWHIDs) must be used.
Select below a type of object currently browsed in order to display its associated SWHID and permalink.
Revision | Author | Date | Message | Commit Date |
---|---|---|---|---|
94d7f01 | Andrew Adams | 02 July 2021, 17:23:21 UTC | Don't reinterpret cast when codegenning vector concat It confuses the HVX LLVM backend, and shouldn't be necessary anyway. | 02 July 2021, 17:23:21 UTC |
5aeb8db | Steven Johnson | 17 June 2021, 23:04:26 UTC | [hannk] Don't mark Tensors as input or output (#6094) * Refactor transforms.cpp, no functional change * Use Op::is_input(), Op::is_output * Update configure_cmake.sh | 17 June 2021, 23:04:26 UTC |
d81f5c3 | Volodymyr Kysenko | 17 June 2021, 16:10:33 UTC | Provide bounds of rvars for all functions in the fused group (#6078) * Provide bounds of rvars for all functions in the fused group * Just use constant * Comments + rename variable | 17 June 2021, 16:10:33 UTC |
27ae113 | Steven Johnson | 17 June 2021, 00:57:38 UTC | [hannk] More Hygiene (#6093) * [hannk] More Hygiene - TensorStorage takes a more sensible set of args for ctor - Tensors don't need to be movable or copyable - Since we are now using C++17, we can use std::make_unique instead of make_op * Restore make_op * clang-format * Remove unnecessary TensorStorage methods | 17 June 2021, 00:57:38 UTC |
66ff71f | Steven Johnson | 16 June 2021, 23:33:35 UTC | [hannk] Cleanup: move SmallVector, Tensor to their own source files (#6091) * Move SmallVector, Tensor to their own files * cleanup | 16 June 2021, 23:33:35 UTC |
a590c17 | Steven Johnson | 16 June 2021, 23:30:47 UTC | [hannk] Remove unused Op::clone() methods (#6092) We don't call these anymore, so remove them and the related TensorMap code. | 16 June 2021, 23:30:47 UTC |
4fda2c6 | Volodymyr Kysenko | 16 June 2021, 17:28:21 UTC | Handle negative shifts in CodeGen_C (#6087) * Handle negative shifts in CodeGen_C * trigger buildbots * Emit code directly if shift was casted to signed int Co-authored-by: Steven Johnson <srj@google.com> | 16 June 2021, 17:28:21 UTC |
292a35a | Evan Lee | 15 June 2021, 20:06:31 UTC | Added fixes to issues regarding using HALIDE_DEBUG_MATCHED_RULES (#6088) * added operator<< for IsMaxValue, IsMinValue, and moved build_replacement(after) to be called before debug matched rules Co-authored-by: Steven Johnson <srj@google.com> | 15 June 2021, 20:06:31 UTC |
ebae3cd | dpalermo | 14 June 2021, 21:51:22 UTC | Changes for building with Hexagon SDK 4.3.0.0 & android-ndk-r19c (#6072) * Changes for building with Hexagon SDK 4.2.0.2 & android-ndk-r19c * Drop libsim_qurt_vtcm.a (now part of libsim_qurt.a) * Fix for clang-format-lint * Update to use SDK 4.3.0.0 / HEXAGON_Tools 8.4.11 * Updated binaries & README.md * trigger buildbots * Updated binaries after merge of master * Update SDK comment for >sm8350 Co-authored-by: Steven Johnson <srj@google.com> | 14 June 2021, 21:51:22 UTC |
98fdd9a | dpalermo | 10 June 2021, 01:44:16 UTC | Add more ways for DMA-BUF to fallback to libion.so (#6085) - Try to access libdmabufheap.so, if it succeeds try using DMA-BUF - If there are any errors seen with DMA-BUF, fallback to libion.so | 10 June 2021, 01:44:16 UTC |
f468bcd | Steven Johnson | 09 June 2021, 18:13:13 UTC | [hannk] Move flag-parsing code into ModelRunner (#6082) * [hannk] Move flag-parsing code into ModelRunner This allows compare_vs_tflite's main function to be very thin, so we don't have to replicate logic for different main() functions elsewhere * Update model_runner.h | 09 June 2021, 18:13:13 UTC |
ea1aabd | Steven Johnson | 09 June 2021, 15:58:41 UTC | Fix for upstream LLVM (Fixes #6079) (#6080) | 09 June 2021, 15:58:41 UTC |
cdac77b | Steven Johnson | 08 June 2021, 17:05:16 UTC | Require C++17 for Halide. (#5282) Require C++17 for Halide. | 08 June 2021, 17:05:16 UTC |
3b046ea | Steven Johnson | 07 June 2021, 20:57:24 UTC | Convert some Intrinsic calls to PureIntrinsic (#6070) * Convert some Intrinsic calls to PureIntrinsic * Fixes | 07 June 2021, 20:57:24 UTC |
d2ca93a | Dillon Sharlet | 07 June 2021, 19:33:31 UTC | Remove large_buffers flag (#6077) | 07 June 2021, 19:33:31 UTC |
af628a7 | Steven Johnson | 03 June 2021, 23:39:03 UTC | Fix dubious "gather" intrinsic for hvx (#6069) I'm not sure if this is a bug (per se) or not, but: We define an intrinsic for `Call::hvx_gather`, and at several points check for `is_intrinsic(Call::hvx_gather)`, but we never actually create such a Call. Instead, `make_gather` just uses the naked string `"gather"`, which is not the same thing. How is this working? (Is it working?) Opening this as a PR to gather input (no pun intended) about what's going on here. | 03 June 2021, 23:39:03 UTC |
00bfad7 | Alex Reinking | 02 June 2021, 22:33:45 UTC | Rework cross-compiling integration test to use simpler two-stage build (#6068) | 02 June 2021, 22:33:45 UTC |
8f849ae | dpalermo | 01 June 2021, 17:40:32 UTC | Add DMA-BUF support to host_malloc (#6042) * Add DMA-BUF support to host_malloc - use DMA-BUF if libdmabufheap.so is present - fallback to ION for older devices/OS - ION APIs are no longer supported on Android-S/12 - added context to the HAP_power_get/set calls - NULL power context not allowed on newer devices * Fixes for clang-format check * Fix more clang-format checks * Replace int with ion_user_handle_t and update comments * Sync changes to be closer to upstream * Updates from review * Remove unused attribute((weak)) protos * Add HAP_power_destroy/atexit * Add HAP_power_destroy/HAP_power_destroy_client for SDK 3.3.3 * Replace power_context malloc/free with address of global * Remove free_HAP_power_context * Update skel (since they snuck in anyway) * Just return address of global as power_context, don't store it * Updates from review | 01 June 2021, 17:40:32 UTC |
445ddd0 | Steven Johnson | 27 May 2021, 23:14:21 UTC | Allow hannk-delegate input and output tensors to share memory with tflite (#6030) * Allow hannk-delegate input and output tensors to share memory with tflite We previously allocated duplicate memory buffers for the input and output tensors, and just copied them back and forth as needed, which wastes memory and cycles. Now, we declare the input and output tensors to be 'external', and update the host pointer before every interpreter run. Note that dynamic tensors are still a bit of a special case: we still do duplicate allocations there (plus memcpy), because in the most general case we can't know the final size needed until we run the pipeline, but we need to allocate that output to run the pipeline. There are ways we could finesse this -- e.g., give dynamic tensors a lambda callback to allow them to resize the TFLite tensor and then use that storage -- but since dynamic tensors don't seem to be common or large in our test cases, I'm doing it this way for now. (I may circle back and try the lambda approach later.) Note that this moves the implicit init of `is_constant_` out of the Tensor ctor, and instead always requires an explicit call to `set_constant()`, which I think makes the situation much clearer. * Update hannk_delegate.cpp | 27 May 2021, 23:14:21 UTC |
a500607 | Dillon Sharlet | 27 May 2021, 18:03:48 UTC | Strengthen constant upper bound logic slightly (#6062) * Relax requirements for MemoryType::Register * Add comment | 27 May 2021, 18:03:48 UTC |
037d7ed | Steven Johnson | 24 May 2021, 20:47:23 UTC | Upgrade d8 version for wasm testing (#6055) We were using a variant of v8.9, but various late-breaking variations in the final spec implementation didn't make it into V8 until v9.1; using top-of-tree LLVM and EMCC require v9.1+ to avoid obscure errors. | 24 May 2021, 20:47:23 UTC |
d64b713 | Alexander Root | 23 May 2021, 15:06:01 UTC | Fix bounds information in ExprInfo for overflow in simplifier (#6012) * fix bounds in ExprInfo for overflow in simplifier * move bounds clearance to visitor on signed_integer_overflow * add test for bad Let behavior on overflow | 23 May 2021, 15:06:01 UTC |
677bf30 | Volodymyr Kysenko | 21 May 2021, 03:53:55 UTC | Scalar loads/stores shouldn't invalidate predicate vectorization (#6041) * Scalar loads/stores shouldn't invalidate predicate vectorization * Allow only scalar vars which don't depend on the vectorized var. * Add test * Fix build * Change test values and remove TODO, because now predicated load/stores are generated again | 21 May 2021, 03:53:55 UTC |
626c34a | Andrew Adams | 20 May 2021, 19:11:24 UTC | Don't emit aligned loads to unaligned addresses (#6047) * Don't emit aligned loads to unaligned addresses Fixes #6046 | 20 May 2021, 19:11:24 UTC |
c87976e | Dillon Sharlet | 20 May 2021, 19:00:07 UTC | Use signed exponents for conv too, clean up bounds query (#6050) | 20 May 2021, 19:00:07 UTC |
9ac150f | Alex Reinking | 20 May 2021, 17:26:51 UTC | Fix tgz/package.sh (#6048) * Use `realpath` instead of `readlink` * Pipe in toolchain file to tgz/package.sh | 20 May 2021, 17:26:51 UTC |
bfd5416 | Alex Reinking | 19 May 2021, 21:44:18 UTC | Bump Halide version to 13.0.0 (#6040) | 19 May 2021, 21:44:18 UTC |
4027dc4 | Steven Johnson | 19 May 2021, 21:20:43 UTC | Allow negative output shifts in depthwise_conv (#6039) At least one tflite model found in the wild uses this, so we must support it. | 19 May 2021, 21:20:43 UTC |
62eb147 | Dillon Sharlet | 19 May 2021, 21:20:26 UTC | Fix bug with tensor aliasing. (#6038) | 19 May 2021, 21:20:26 UTC |
b5a34c3 | Alex Reinking | 19 May 2021, 20:47:20 UTC | Update README for Halide 12 release. (#6034) | 19 May 2021, 20:47:20 UTC |
1c0ff0f | Alex Reinking | 19 May 2021, 20:11:38 UTC | Fix Windows ZIP package script. (#6035) | 19 May 2021, 20:11:38 UTC |
dfe0f97 | Steven Johnson | 19 May 2021, 19:57:54 UTC | Enable a wasm-simd op in simd_op_check that is now generated in LLVM13 (#6024) | 19 May 2021, 19:57:54 UTC |
6a1e529 | Steven Johnson | 19 May 2021, 17:21:37 UTC | Remove duplicate -e argument in bilateral_grid (#6008) (#6033) | 19 May 2021, 17:21:37 UTC |
5bd5a04 | Dillon Sharlet | 19 May 2021, 05:17:07 UTC | Also invaldiate alignment if the type can't represent it. (#6032) | 19 May 2021, 05:17:07 UTC |
cdc0223 | Dillon Sharlet | 19 May 2021, 00:46:20 UTC | More simplifier rules (#6017) * More simplifier rules. * More simplifier rules. * More variations of these rules * Remove rules that try to pull negative out of multiply, add quantized ramp rules * Add is_const(x, value) predicate * These might be useful too. | 19 May 2021, 00:46:20 UTC |
622164e | Dillon Sharlet | 18 May 2021, 17:53:16 UTC | Various simplifier improvements (#5993) * Redo hoisting if statements * Track bounds through casts (fixes #5905). * Improve and add rules to simplify TailStrategy::Predicate/TailStrategy::GuardWithIf * Replace out of bounds loads/stores with undef. * Fix min * Replace rules with generated ones. * Replace rules with synthesized rules * Remove unnecessary predicates. * Fix out of bounds load/store removal for loads/stores that are a different type than the allocation * Fix no-op else cases. * Update test for new behavior. * Use unreachable instead of undef for out of bounds loads/stores * Update IROperator.h * Fix unsafe evaluate tests * Learn from (x * a) / b == c * Don't let initial leaves depend on the variable itself. * Remove sketchy rules, only learn from constants. * clang-format * Support any constant equation when learning facts. * clang-format * trigger buildbots * Don't need this with abadams/dont_substitute_complex_constraints * Don't treat unreachable as pure * Don't track possibly overflowing min/max * Remove unreachable in the simplifier. * More aggressive removal of unreachable for ifs * Remove for loops with unreachable bodies * Remove bad rules. * Fix both branches unreachable. * Also make adjacent code to unreachable loops unreachable. Co-authored-by: Steven Johnson <srj@google.com> | 18 May 2021, 17:53:16 UTC |
253e93d | Steven Johnson | 18 May 2021, 16:37:18 UTC | Add an ErrorReporter hook to TfLiteModelRunner (#6021) This allows us to silence some (but not all) of the noise that TfLite logs when we disable our own verbosity. | 18 May 2021, 16:37:18 UTC |
15f51f3 | Steven Johnson | 18 May 2021, 01:03:14 UTC | Fix apps/hannk configure_cmake.sh script (#6018) * Fix apps/hannk configure_cmake.sh script * trigger buildbots | 18 May 2021, 01:03:14 UTC |
b45caa8 | Alex Reinking | 17 May 2021, 23:48:48 UTC | Add Ubuntu packaging scripts and GHA testing (#5754) * Fix CMake & packaging glitches for Ubuntu package. * Add Ubuntu packaging scripts and presets. * Add GHA workflow to test packaging and usage on Ubuntu * Address review comments. | 17 May 2021, 23:48:48 UTC |
e711235 | Dillon Sharlet | 17 May 2021, 22:48:10 UTC | Small HVX fixes (#5990) * Small HVX fixes * Simpler align_up implementation. * Simpler fix for small vector loads. * Fix non-native deinterleaving of arguments to patterns * Tweak register count for Hexagon. * I don't know what happened but this works now. * Bits and bytes r hard * Avoid bias wrapper when not needed * I think these deinterleaves are safe. * Align the input of depthwise with depth multiplier 1. * Revert bad merge due to weird GitHub UI Co-authored-by: Steven Johnson <srj@google.com> | 17 May 2021, 22:48:10 UTC |
ed6eccc | Alex Reinking | 17 May 2021, 16:12:30 UTC | fix package.sh perms (#6016) | 17 May 2021, 16:12:30 UTC |
adb1e05 | Andrew Adams | 16 May 2021, 22:01:00 UTC | Fix local laplacian upsample (#6011) | 16 May 2021, 22:01:00 UTC |
deea5ec | Andrew Adams | 15 May 2021, 22:50:08 UTC | Substituting complex expressions for constrained scalar inputs makes … (#6014) * Substituting complex expressions for constrained scalar inputs makes a mess Substituting in constants and variables is probably fine. * Remove extra loop Co-authored-by: Dillon Sharlet <dsharlet@google.com> | 15 May 2021, 22:50:08 UTC |
50d7640 | Andrew Adams | 15 May 2021, 21:55:49 UTC | Permit "safe" parallel scatters, even when they race (#4841) lets the atomic() scheduling directive also apply to simple assignments in addition to associative commutative operators, e.g. hist(f(r.x), x) = g(x) is safe to parallelize over r.x if the stores are atomic, because the RHS doesn't depend on the hist or r.x | 15 May 2021, 21:55:49 UTC |
878c3ec | Volodymyr Kysenko | 15 May 2021, 16:14:45 UTC | Fix CodeGen_C::print_scalarized_expr (#6006) * Fix CodeGen_C::print_scalarized_expr * CppVector/NativeVector object doesn't have .replace() anymore. * Initialize vector with zero to avoid warning. * Actually, can't assign to CppVector (only to NativeVector), so do ::broadcast instead * Leave it uninitialized | 15 May 2021, 16:14:45 UTC |
b829d12 | Volodymyr Kysenko | 15 May 2021, 16:13:54 UTC | Support Shuffle::extract_element from list of scalars in CodeGen_C (#6007) | 15 May 2021, 16:13:54 UTC |
69eed6e | sksarda | 14 May 2021, 16:51:32 UTC | Add -fpic option to debug version on non-windows .ll file (#6000) Else, it eliminates reference to global offset table for symbols to be resolved from remote library causing runtime crash with -debug option. Co-authored-by: Suyog Sarda <ssarda@codeaurora.org> | 14 May 2021, 16:51:32 UTC |
e0ac07b | Steven Johnson | 14 May 2021, 16:49:01 UTC | Upgrade hannk TFLite version to 2.5.0 (#6009) https://github.com/tensorflow/tensorflow/releases/tag/v2.5.0 | 14 May 2021, 16:49:01 UTC |
75a079a | Alex Reinking | 08 April 2021, 00:17:40 UTC | Use presets in zip/package.bat | 13 May 2021, 18:36:59 UTC |
3f3ce62 | Alex Reinking | 18 February 2021, 21:44:50 UTC | Use presets in tgz/package.sh | 13 May 2021, 18:36:59 UTC |
a81b86b | Alex Reinking | 07 April 2021, 23:17:26 UTC | Move packaging scripts from tools/ to packaging/<type>/ | 13 May 2021, 18:36:59 UTC |
bf32585 | Alex Reinking | 18 February 2021, 21:41:59 UTC | Move packaging support files into common directory. | 13 May 2021, 18:36:59 UTC |
f0db1c6 | Alex Reinking | 14 April 2021, 06:51:16 UTC | Split Halide CMake helpers into separate package. | 13 May 2021, 18:36:59 UTC |
02a8f87 | Alex Reinking | 12 March 2021, 11:27:27 UTC | Remove same-directory shared/static mixing | 13 May 2021, 18:36:59 UTC |
5cdbcb0 | Alex Reinking | 18 February 2021, 21:40:19 UTC | Re-work packaging to support complex formats (like DEB) | 13 May 2021, 18:36:59 UTC |
568d18c | Alex Reinking | 12 May 2021, 22:18:35 UTC | Fix dependencies in FFT app. | 13 May 2021, 18:36:59 UTC |
c61b8cb | Alexander Root | 13 May 2021, 17:42:34 UTC | Guard against overflow in constant folding for EQ rewrite rules (Fixes #5998) (#6002) * guard against constant folding overflow in EQ rewrite rules * trigger buildbots Co-authored-by: Steven Johnson <srj@google.com> | 13 May 2021, 17:42:34 UTC |
b2947e9 | Alex Reinking | 12 May 2021, 20:55:49 UTC | Fix Windows apps (#5999) * Place DLLs on Windows by copying. * Disable Hannk on Windows by default | 12 May 2021, 20:55:49 UTC |
6bb87cf | Andrew Adams | 12 May 2021, 16:07:25 UTC | Stop interleaving stores from generating too-large vectors (#5996) * Stop interleaving stores from generating too-large vectors * Remove integer constant * Use mul_would_overflow helper instead | 12 May 2021, 16:07:25 UTC |
33308d9 | Dillon Sharlet | 12 May 2021, 16:06:49 UTC | Add pmaddubsw support (#5997) * Add pmaddubsw support * Move pmaddubsw checks to ssse3 * These patterns rae a bit finnicky | 12 May 2021, 16:06:49 UTC |
3dce2d5 | Dillon Sharlet | 11 May 2021, 19:18:20 UTC | Small H::R::B cleanups and improvements (#5957) * Reuse helpers from halide_buffer_t * Combine decref and decrev_dev to hopeuflly reduce overhead. * Remove redundant public * This old logic was necessary Co-authored-by: Steven Johnson <srj@google.com> | 11 May 2021, 19:18:20 UTC |
257b2f5 | Dillon Sharlet | 11 May 2021, 00:10:01 UTC | Small performance portabiilty tweaks (#5989) Co-authored-by: Steven Johnson <srj@google.com> | 11 May 2021, 00:10:01 UTC |
6b2732a | Dillon Sharlet | 10 May 2021, 22:51:37 UTC | Fix build with asserts enabled (#5987) * Minor cleanups after #5983 * Work around linker breakage!? * This doesn't need to be a constant. * Mark power_of_two constructor explicit Co-authored-by: Steven Johnson <srj@google.com> | 10 May 2021, 22:51:37 UTC |
fcf9046 | Dillon Sharlet | 10 May 2021, 21:47:22 UTC | Fix specializing on stride issue (fixes #5907) (#5950) * Fix specializing on stride issue (fixes #5907) * Remove stale comment * Add broadcasting test to CMakeLists.txt * remove_dead_lets -> remove_dead_code * Add test for specializing only on stride. * Remove broadcasting performance test. * Also remove from CMake | 10 May 2021, 21:47:22 UTC |
6e23346 | Steven Johnson | 10 May 2021, 20:12:06 UTC | Refactor hannk's compare_vs_tflite code to be mostly library (#5991) * Refactor compare_vs_tflite into library+shell Also, drive-by change to the test names to keep them matching filenames more closely * wip * Update compare_vs_tflite.cpp * wip * Fixes * clang-format * Fix Makefile * trigger buildbots | 10 May 2021, 20:12:06 UTC |
e33438a | Dillon Sharlet | 10 May 2021, 19:23:11 UTC | Revert "Stack input and filter to reduce generated code in FFT app (#5985)" (#5992) This reverts commit d2539287fe4c0c51128a78dc51c2c6d1812cd694. | 10 May 2021, 19:23:11 UTC |
d253928 | Dillon Sharlet | 10 May 2021, 17:57:07 UTC | Stack input and filter to reduce generated code in FFT app (#5985) * Stack input and filter to reduce generated code. * Change comments. | 10 May 2021, 17:57:07 UTC |
9eeade3 | Steven Johnson | 10 May 2021, 17:24:10 UTC | Rename CHECK->HCHECK, LOG->HLOG in hannk (#5986) Quick-n-dirty rename to avoid conflicts with Abseil/google3. Longer term fix will be forthcoming. | 10 May 2021, 17:24:10 UTC |
2e47968 | Steven Johnson | 10 May 2021, 17:23:24 UTC | Fix for upstream LLVM (#5988) * Fix for upstream LLVM * Fixes | 10 May 2021, 17:23:24 UTC |
5550f96 | Dillon Sharlet | 07 May 2021, 22:44:40 UTC | Don't hardcode depthwise padding. (#5984) | 07 May 2021, 22:44:40 UTC |
e0b7d8a | Dillon Sharlet | 07 May 2021, 22:44:03 UTC | Refactor quantized multiplications (#5983) * Refactor quantized multiplications * Move comment. * clang-format * base -> mantissa | 07 May 2021, 22:44:03 UTC |
e980b27 | Steven Johnson | 07 May 2021, 18:43:54 UTC | advance_ptrs() should use refs, not ptrs (#5981) * advance_ptrs() should use refs, not ptrs Examination of compiled output (x86-64, clang w/ optimizer) shows slightly better codegen. * Update HalideBuffer.h | 07 May 2021, 18:43:54 UTC |
aac383f | Steven Johnson | 07 May 2021, 18:32:39 UTC | Add dynamically-typed scalar inputs to Generator (#5953) (#5965) * Add dynamically-typed scalar inputs to Generator (#5953) * Update stubuser_generator.cpp * clang-format | 07 May 2021, 18:32:39 UTC |
aba3a80 | Andrew Adams | 07 May 2021, 02:47:08 UTC | Use a VectorReduce not to determine if any lanes are true in Hexagon backend (#5978) | 07 May 2021, 02:47:08 UTC |
9f7a459 | Steven Johnson | 06 May 2021, 21:57:47 UTC | Add missing #include in buffer_util.h (#5979) * Add missing #include in buffer_util.h * Update buffer_util.h | 06 May 2021, 21:57:47 UTC |
3f799ff | Dillon Sharlet | 06 May 2021, 21:35:55 UTC | Optimize copy_from a little (#5977) | 06 May 2021, 21:35:55 UTC |
3f83f5a | Dillon Sharlet | 06 May 2021, 20:25:57 UTC | Add transpose op (#5968) * Add transpose op. * Fix type requirements * Add tests for some misc ops. * Fix type checks | 06 May 2021, 20:25:57 UTC |
93c878e | Dillon Sharlet | 06 May 2021, 20:00:46 UTC | Optimize add generator (#5972) * Optimize add generator. * Update benchmarks * Vectorize wider for more ILP * Better add implementation. * More tweaks. ARM is really sensitive to exactly how these shifts are done. * More performance portable implementation of add. * Put signs back * Add comment. | 06 May 2021, 20:00:46 UTC |
42b5f79 | Dillon Sharlet | 06 May 2021, 19:54:35 UTC | Optimize fully connected when there are more than 4 batches (#5969) * Optimize fully connected when there are more than 4 batches * Fix crazy working typo * Do batches inside channels. * Fix missed constant | 06 May 2021, 19:54:35 UTC |
2ebaedd | Alex Reinking | 06 May 2021, 17:06:32 UTC | Don't add Halide DLL to PATH on Windows. (#5973) This conflicts with vcpkg's own binary copying on Windows and makes cross compiling more difficult. It also runs into issues with excessively long commands when the user's PATH variable is very long. | 06 May 2021, 17:06:32 UTC |
ed989db | Dillon Sharlet | 06 May 2021, 16:52:19 UTC | Remove some more old codegen workarounds and cleanups (#5932) * Remove old codegen workarounds * Pre-AVX2 codegen still needs this :( * trigger buildbots Co-authored-by: Steven Johnson <srj@google.com> | 06 May 2021, 16:52:19 UTC |
813180b | Steven Johnson | 06 May 2021, 16:38:22 UTC | HalideBuffer should use D=max_rank instead of D=4 (#5971) This prevents mallocs for some degenerate cases where we need buffers with > 4 dimensions. | 06 May 2021, 16:38:22 UTC |
ab3670d | Alex Reinking | 16 March 2021, 11:37:38 UTC | Clean up CMake helpers. 1. Add HEADER output to add_halide_library. 2. Use $<BUILD_INTERFACE:...> in generated target include paths. 3. Clean up logic (reduce nesting). 4. Use lower-case names for local variables. 5. Print paths to detected Clang and LLD config scripts. 6. Honor normal variable overrides for Halide_TARGET. | 06 May 2021, 05:52:34 UTC |
0b77168 | Alex Reinking | 08 April 2021, 18:20:37 UTC | Consistently use Halide_* prefixes in CMake. | 06 May 2021, 05:52:34 UTC |
0c0b117 | Steven Johnson | 05 May 2021, 23:41:34 UTC | Upgrade hannk's TFLite version to 2.5.0-rc3 (#5970) * Upgrade hannk's TFLite version to 2.5.0-rc3 * Drive-by cleanup of test names | 05 May 2021, 23:41:34 UTC |
438ddf8 | Dillon Sharlet | 05 May 2021, 23:39:44 UTC | Optimize depthwise convolution (#5964) * Try factoring depthwise reduction. * Revert unnecessary change. Co-authored-by: Steven Johnson <srj@google.com> | 05 May 2021, 23:39:44 UTC |
93dad62 | Alex Reinking | 05 May 2021, 20:23:56 UTC | Fix tutorial 15 test (#5966) | 05 May 2021, 20:23:56 UTC |
4a489e6 | Steven Johnson | 05 May 2021, 19:55:01 UTC | Ensure our local flatbuffers.h is included in preference to system variants (#5962) * Ensure our local flatbuffers.h is included before system variants The local version might be too old for TFLite * Update CMakeLists.txt * Update CMakeLists.txt * Silence the noise noise noise noise NOISE * fix policy | 05 May 2021, 19:55:01 UTC |
95047ca | Dillon Sharlet | 05 May 2021, 16:55:59 UTC | Optimize pooling ops (#5963) * Avoid padding for pool ops. * Use reciprocal to implement division. | 05 May 2021, 16:55:59 UTC |
115e597 | Steven Johnson | 05 May 2021, 01:28:21 UTC | Fix FetchContent for hannk (#5960) Apparently using SOURCE_SUBDIR + FetchContent_MakeAvailable() doesn't work on the buildbots. This is equivalent and does work. ¯\_(ツ)_/¯ | 05 May 2021, 01:28:21 UTC |
675748a | Steven Johnson | 05 May 2021, 00:02:02 UTC | Add apps/hannk to apps/CMakeLists.txt (#5958) ...but with an option for disabling it, for the buildbots | 05 May 2021, 00:02:02 UTC |
abefa3c | Dillon Sharlet | 04 May 2021, 23:45:21 UTC | Remove nn_ops app in favor of hannk app. (#5893) | 04 May 2021, 23:45:21 UTC |
7b79dca | Dillon Sharlet | 04 May 2021, 23:44:53 UTC | Add HANNK app (#5891) * More accurate approx_log2/exp2. * Add tests from inception_v4 * Improve precision of log2/exp2 related functions. * Add tanh and clean up generators. * Add version-checking to compare_vs_tflite and issue a warning if major and minor versions mismatch * Restore inadvertent @ removal * Add build_hannk/test_hannk targets to Makefile, to make specialized testing on select buildbots easier for now * More hacky padding for depthwise. * Add TODO * trigger buildbots * Add mean op, enable resnet50 to work. * Fix build failure on ARM. * Grammar * Remove stale TODO. * Make tensors shared_ptr. * Fuse double paddings. * Reduce padding for ARM. * Enable DimMap to express alignment. * Remove crops from execute. * Model -> OpGroup refactor. * Add DimMap::align. * Add proper alignment to DimMap. * Recursively transform. * Use cubic polynomials to approximate log2 and exp2. * Add --use_hannk option * Add mul op support. * Add TODO * Add disambiguating parens * Fix boneheaded broadcasting bug. * Less aggressive broadcasting. * Inline basic arithmetic. * Remove unnecessary using directives. * Fix stray unique_ptr<Tensor> * Implement space to depth and depth to space. * Enable scalar boolean comparison ops. * Support ReLUx as unary ops. * CHECK(false) -> LOG(FATAL) * Naming consistency. * More precise mul * Add some easy ops (NEG and SQUARE) * Fix asserts. * Don't segfault if interpreter can't be created * Add comment. * Remove dead file. * Fix excessive precision in softmax. * Lazy-init seeds in compare_vs_tflite, in case use_hannk=0 * Add TODO * Remove scalpel left in patient * Update model.h * Allow broadcasting of c of input 2 * Remove now-pointless specialization helper. * Put the common case specialization first. * Move pooling ops to the same generator file. * Fix softmax correctness issues * Don't benchmark when testing. * Rearrange input parameters. * Remove multiply_quantized helper. * kTfLiteError -> kTfLiteDelegateError * Remove unnecessary check for log2(0) * Fix details of ReshapeOp to match tflite's impl * Add Shape op. * Generically handle elementwise operations of any rank. * Some of these aren't elementwise. * Minor cleanups * Minor cleanup in ReshapeOp::execute() * Remove unused functions * Add Greater, GreaterEqual to delegate * clang-format * Update normalizations_generator.cpp * Avoid horrific clang-format suggestion. * clang-format * Fix common_halide test. * Fix typo. * Fix asserts. * clang-format * Save compare_vs_tflite outputs from first run (not post-benchmark) * Enable approx_exp2 for int16 results without overflow. * Clean up precision of transcendentals * Fix accidental widening of shift by a constant. * Move elementwise generators to the same file. * Report profiler after each test * Optimize fully connected a lot * Add elementwise program interpreter * Add elementwise program interpreter * clang-format * WIP LSTM * Fix Interpreter::inputs and outputs. * Fix some precision and scheduling issues of LSTM * Fix LSTM op * Fix build breakage. * Fix comments. * clang-format * Add wrapper for constructing elementwise programs. * clang-format * Use ElementwiseProgram to implement LstmElementwise * Compress programs and instructions by storing them in int16 and more CISC * Reduce verbose repetitive declarations. * Optimize constant zeros. * Use a named constant for the size of each instruction. * Remove unnecessary const instruction. * Optimize and clean up elementwise programs * Reduce overhead from H::R::B * More H::R::B overhead cleanup. * Add missing include. * Various fixes and improvements. * Add support for LSTM to the hannk delegate (#5943) * Add support for LSTM to the hannk delegate * clang-format * Add support for dynamic tensors to hannk (#5942) * Initial support for Dynamic Tensors in hannk * Update hannk_delegate.cpp * Fixes * Smarten Tensor::resize() * More H::R::B overhead cleanup * Refactor IsNodeSupported * Minor fixes * Fix member name style * Fix is_alias * Add is_no_op for some ops * Log reasons for node rejection if verbosity >= 1 * clang-format * Fix Concat handling for delegate * Properly parse split op * Add SPLIT_V * Fix regression in PadForOps * Add asserts * Revise ReshapeOp to just use a shape tensor * Update ops.cpp * clang-format * Scale tolerance with the data type. * Compress disassembly a bit * Fix bug with aliasing. * Add --use_tflite flag to compare_vs_tflite Allows disabling the reference run, for running just our delegate alone * Clean up hannk makefile * Regularize all of hannk's own include paths to be relative to apps/hannk; this simplifies things and will allow removing some hacks in Blaze/Bazel and also the upcoming CMake support * Remove some gratuitous uses of std::vector. * Remove unused function. * Remove more instances of std::vector * More SmallVector usage. * Avoid vector in can_use_elementwise_program. * Minor drive by fixes. * Remove some more H::R::B copies * Add broadcast support to elementwise programs * Also tweak the generated schema file include path * clang-format * clang-format * Stale TODO * Upgrade hannk to tf2.5 + more (#5948) * Upgrade hannk to tf2.5 + more - upgrade default TFLite to 2.5.0-rc2 - Revise build instructions & assumptions for TFlite (use CMake for it now instead of Bazel) - Revised Android build instructions (now assumes that tflite is built locally rather than pulled from a prebuilt) - Remove the need for flatc/flatbuffers - Minor fixes to the run-on-device scripts * Update Makefile * Update Makefile * Fix some harmless errors related to input slots * TFlite is too sloppy with dimensions. * Intervals are min, max, not min, extent. * Refactor compare_vs_tflite Lots of internal code motion to clean things up and prepare for adding an internal-delegate code path. Immediate change is just `--enable [h][t][x]` instead of the old "use" flags, and reducing the default max-num-of-diffs to 8 instead of 32. * Add an alias for OpPtr * Don't alias inputs that might be used elsewhere. * Use the right tensors when parsing. * Don't schedule sum_filter separately, and avoid 8-bit multiplies for x86 * Improve aliasing logic * Small optimizations. * Add no_bounds_query to elementwise pipelines. * Clean up std:;shared_ptr/H::R::B overhead. * More cvt refactoring, plus clang-format * Fix asserts. * Remove unused trace_ member * Remove unnecessary argument. * Fix x86 * Update compare_vs_tflite.cpp * Add internal-delegate option to CVT * Fix run_compare_on_device for recent changes * Revert possibly broken space to depth optimization. * Add gather and more binary op support. * Update compare_vs_tflite.cpp * Default external-delegate to disabled * clang-format * Better implementation of SpaceToDepth/DepthToSpace. * clang-format * Reduce buffer copy overhead. * Remove unnecessary types. * Consistent multiplication order. * Pad to at least FnRank * clang-format * De-inline two operator<<()'s * compare_vs_tflite error handling - If any of the comparisons fail, exit with a nonzero error code - add `--tolerance` flag to allow tweaking allowable tolerance on a per-pipeline basis * Add CMake build rules to apps/hannk (#5955) * Add CMake build rules to hannk * Update ops.cpp * Fix features * Update CMakeLists.txt * Add tests * Fix cmake issues * Use CMAKE_GENERATOR * Update configure_cmake.sh * Delegate Fixes * Update flag handling in compare_vs_tflite This is gratuitous but was bugging me: - Flags now understand both `--flag value` or `--flag=value` - Unknown flags now fail hard instead of being ignored Co-authored-by: Steven Johnson <srj@google.com> | 04 May 2021, 23:44:53 UTC |
f45d323 | Andrew Adams | 04 May 2021, 16:32:56 UTC | Non-widening lowering of rounding shifts (#5956) This version lowers it without needing to widen, which is a large win on x86 for 16 and 32-bit types (3.8x faster and 2.8x faster respectively). It's a very slight slowdown for 8-bit because x86 doesn't have 8-bit shift instructions. Also drive-by typo fix. | 04 May 2021, 16:32:56 UTC |
94c0eca | Dillon Sharlet | 04 May 2021, 00:17:39 UTC | Use dot products for sums. (#5954) | 04 May 2021, 00:17:39 UTC |
5a0d1e5 | Volodymyr Kysenko | 03 May 2021, 16:34:37 UTC | Support VectorReduce in CodeGen_C (#5952) | 03 May 2021, 16:34:37 UTC |
8b9deea | Dillon Sharlet | 30 April 2021, 20:56:30 UTC | Fix bugs when D != 4 (#5951) * Fix bugs when D != 4 * clang-format | 30 April 2021, 20:56:30 UTC |
093e8df | Fangrui Song | 29 April 2021, 22:58:43 UTC | Replace llvm::sys::fs::F_None with llvm::sys::fs::OF_None (#5946) The former is deprecated. | 29 April 2021, 22:58:43 UTC |
fcbd2ee | Dillon Sharlet | 27 April 2021, 23:52:57 UTC | Fix build issue in runtime. (#5944) | 27 April 2021, 23:52:57 UTC |
a391e9a | AbdouTlili | 27 April 2021, 23:13:06 UTC | adding a note in the README.md to use -j option in make --build (#5938) * adding a note in the README.md to use -j option in make --build * wrapped the added section to 80 column | 27 April 2021, 23:13:06 UTC |