https://github.com/halide/Halide
- HEAD
- refs/heads/Halide_unsharp
- refs/heads/abadams/align_strided_const_loads
- refs/heads/abadams/alloca
- refs/heads/abadams/atomic_parallel_compiled_in
- refs/heads/abadams/atomic_vector_non_recursive
- refs/heads/abadams/averaging_tree
- refs/heads/abadams/avoid_name_mangling_in_cross_module_dependencies
- refs/heads/abadams/better_absd
- refs/heads/abadams/better_codegen_for_non_const_ramps
- refs/heads/abadams/bgu_cholesky
- refs/heads/abadams/braces_around_statements
- refs/heads/abadams/cache_tighten_producer_consumer_nodes
- refs/heads/abadams/check_reorder_dups
- refs/heads/abadams/clarify_broadcast_shuffle
- refs/heads/abadams/compositing_app
- refs/heads/abadams/cond_wait_spin
- refs/heads/abadams/cse_in_unroll_split_tuples
- refs/heads/abadams/custom_cuda_context
- refs/heads/abadams/custom_cuda_context_2
- refs/heads/abadams/custom_cuda_context_3
- refs/heads/abadams/d3d12abi
- refs/heads/abadams/deflake_mullapudi_reorder
- refs/heads/abadams/delete_prepare_for_early_exit
- refs/heads/abadams/depthwise_separable_conv
- refs/heads/abadams/diagnose_boundary_condition_failure
- refs/heads/abadams/disable_onnx_app_on_mac
- refs/heads/abadams/divide_using_pavgw
- refs/heads/abadams/dont_link_to_cudart
- refs/heads/abadams/dont_reinterpret_concat
- refs/heads/abadams/early_out
- refs/heads/abadams/enable_f16c
- refs/heads/abadams/extract_concat_bits
- refs/heads/abadams/fast_integer_divide_round_to_zero
- refs/heads/abadams/faster_runtime_integer_division
- refs/heads/abadams/faster_unroll
- refs/heads/abadams/fix-arm-seg2
- refs/heads/abadams/fix_4211
- refs/heads/abadams/fix_5323
- refs/heads/abadams/fix_5329
- refs/heads/abadams/fix_5889
- refs/heads/abadams/fix_6984
- refs/heads/abadams/fix_7229
- refs/heads/abadams/fix_7260
- refs/heads/abadams/fix_7365
- refs/heads/abadams/fix_7374
- refs/heads/abadams/fix_7504
- refs/heads/abadams/fix_7514
- refs/heads/abadams/fix_7531
- refs/heads/abadams/fix_7584
- refs/heads/abadams/fix_7584_v2
- refs/heads/abadams/fix_7742
- refs/heads/abadams/fix_7756
- refs/heads/abadams/fix_7761
- refs/heads/abadams/fix_7768
- refs/heads/abadams/fix_7786
- refs/heads/abadams/fix_7810
- refs/heads/abadams/fix_7811
- refs/heads/abadams/fix_7815
- refs/heads/abadams/fix_7867
- refs/heads/abadams/fix_7871
- refs/heads/abadams/fix_7872
- refs/heads/abadams/fix_7873
- refs/heads/abadams/fix_7888
- refs/heads/abadams/fix_7890
- refs/heads/abadams/fix_7891
- refs/heads/abadams/fix_7892
- refs/heads/abadams/fix_7893
- refs/heads/abadams/fix_7906
- refs/heads/abadams/fix_7909
- refs/heads/abadams/fix_7968
- refs/heads/abadams/fix_8038
- refs/heads/abadams/fix_8054
- refs/heads/abadams/fix_arm_fcvtmp
- refs/heads/abadams/fix_autoschedule_feature_transposition
- refs/heads/abadams/fix_cse_name_collisions
- refs/heads/abadams/fix_cuda_mat_mul_assert
- refs/heads/abadams/fix_deinterleave_bug
- refs/heads/abadams/fix_deinterleave_for_reinterpret
- refs/heads/abadams/fix_div_round_to_zero
- refs/heads/abadams/fix_fft_compile_time_regression
- refs/heads/abadams/fix_generate_output_snippets
- refs/heads/abadams/fix_if_nesting_condition
- refs/heads/abadams/fix_leaks_in_memoize_test
- refs/heads/abadams/fix_lgtm_warnings
- refs/heads/abadams/fix_links_to_master
- refs/heads/abadams/fix_load_of_broadcast
- refs/heads/abadams/fix_lossless_cast_of_sub
- refs/heads/abadams/fix_onnx_app
- refs/heads/abadams/fix_pointless_lower_condition
- refs/heads/abadams/fix_potential_gpu_deadlock
- refs/heads/abadams/fix_realize_condition_depends_on_tuple
- refs/heads/abadams/fix_reduce_expr_modulo_of_vector
- refs/heads/abadams/fix_riscv_vx_vi
- refs/heads/abadams/fix_round
- refs/heads/abadams/fix_stencil_chain_gpu_schedule
- refs/heads/abadams/fix_track_bounds_intervals
- refs/heads/abadams/fix_tutorial_2
- refs/heads/abadams/forward_partition_methods
- refs/heads/abadams/fully_fused_depthwise_separable_conv
- refs/heads/abadams/fuzz_sliding_window
- refs/heads/abadams/gaussian_blur_app
- refs/heads/abadams/generator_infinite_default_timeout
- refs/heads/abadams/gpu_autoscheduler_parallel_random_probes
- refs/heads/abadams/include_riscv_in_readme
- refs/heads/abadams/interleave_nested_vector
- refs/heads/abadams/ir_match_by_ref
- refs/heads/abadams/lerp_plus_cast
- refs/heads/abadams/local_laplacian_code_size
- refs/heads/abadams/lower_halving_sub
- refs/heads/abadams/lower_rounding_shift_right
- refs/heads/abadams/mac-arm-fixes
- refs/heads/abadams/make_fast_inverse_test_throughput_limited
- refs/heads/abadams/makefile_serialization_support
- refs/heads/abadams/mismatched_new_delete
- refs/heads/abadams/mixed_sign_mul_shift_right
- refs/heads/abadams/mixed_width_mul_shift_right
- refs/heads/abadams/multiple_scatter
- refs/heads/abadams/mux_intrinsic
- refs/heads/abadams/name_helpers
- refs/heads/abadams/narrow_predicates
- refs/heads/abadams/nested_vectorization_compile_time_regression_fix
- refs/heads/abadams/nested_vectorization_tweaks
- refs/heads/abadams/parallel_simd_op_check
- refs/heads/abadams/per_instance_profiling
- refs/heads/abadams/precompute_shared_mem_size
- refs/heads/abadams/prefer_no_gather
- refs/heads/abadams/print_uncaught_exception
- refs/heads/abadams/promote_fixed_point_intrinsics
- refs/heads/abadams/psabdw
- refs/heads/abadams/random_pipelines
- refs/heads/abadams/rationalize_gpu_for_loop_names
- refs/heads/abadams/reenable_unscheduled_stage_warning
- refs/heads/abadams/reinterpret_vector
- refs/heads/abadams/remove_arch_os_for_shaders
- refs/heads/abadams/remove_bad_pruning
- refs/heads/abadams/remove_parameter_self_references
- refs/heads/abadams/remove_readnone_on_functions
- refs/heads/abadams/remove_use_of_python_config_in_onnx_makefile
- refs/heads/abadams/reschedule_bgu
- refs/heads/abadams/reschedule_bilateral_grid
- refs/heads/abadams/rewrite_atomic_pass
- refs/heads/abadams/rounding_shift_right_use_average
- refs/heads/abadams/rungenmain_error
- refs/heads/abadams/sampling_profiler_overhead_v2
- refs/heads/abadams/scope_improvements
- refs/heads/abadams/simpler_broadcasts
- refs/heads/abadams/simplify_correlated_pyramid
- refs/heads/abadams/siotas_20
- refs/heads/abadams/sioutas_20
- refs/heads/abadams/slide_over_split_loop
- refs/heads/abadams/sorting_network_working_branch
- refs/heads/abadams/stable_topological_order
- refs/heads/abadams/string_view
- refs/heads/abadams/strip_asserts_last
- refs/heads/abadams/switch_stmt
- refs/heads/abadams/target_specific_lerp
- refs/heads/abadams/time_lowering_passes
- refs/heads/abadams/track_failedness_through_solver_lets
- refs/heads/abadams/turn_off_slp_vectorization_for_avx512
- refs/heads/abadams/tweak_unpack_buffers
- refs/heads/abadams/undo_pointless_widening
- refs/heads/abadams/unordered_blocks
- refs/heads/abadams/unsigned_demosaic
- refs/heads/abadams/update_makefile_for_llvm_19
- refs/heads/abadams/use_arm_for_runtime_triple
- refs/heads/abadams/use_pmaddubsw_for_downsample
- refs/heads/abadams/validate_gpu_schedules
- refs/heads/abadams/vector_reduce_hexagon_predicate
- refs/heads/abadams/vector_scan
- refs/heads/abadams/vst_type_fix
- refs/heads/abadams/widening_let_bug
- refs/heads/abadams/x86_avg
- refs/heads/abadams/zen4
- refs/heads/adadams/profile_allocator
- refs/heads/add_image_checks_after_bounds_inference_plus_new_rules
- refs/heads/add_outermost_to_extern
- refs/heads/add_vectorization_to_search_space
- refs/heads/aelphy/feature_cadence_changes
- refs/heads/aelphy/float_extracts
- refs/heads/align_loads_comment_fix
- refs/heads/alina-strided-store
- refs/heads/another_buffer_copy_fix
- refs/heads/arm_sve_redux
- refs/heads/ataei-block_asserts-codegen
- refs/heads/ataei-debug_info
- refs/heads/ataei-fix-pow
- refs/heads/ataei-gen_str_param
- refs/heads/ataei-implicit_lhs_vars
- refs/heads/ataei-onnx
- refs/heads/ataei-onnx_converter_update
- refs/heads/ataei-onnx_pybind
- refs/heads/ataei-resnet50_benchmarks
- refs/heads/ataei-standalone_autoscheduler
- refs/heads/ataei_lots_of_inputs
- refs/heads/auto_sched_benchmarks
- refs/heads/auto_sched_estimates
- refs/heads/auto_sched_inline
- refs/heads/auto_sched_test_notparallel
- refs/heads/autoschedule_top_down
- refs/heads/autoschedule_with_convnet
- refs/heads/autoscheduler_scalar_imageparam_fix
- refs/heads/backports/10.x
- refs/heads/backports/11.x
- refs/heads/backports/12.x
- refs/heads/backports/13.x
- refs/heads/balance_expressions
- refs/heads/bazel
- refs/heads/benchmarks
- refs/heads/blaze
- refs/heads/bounds_buffer_lets_fix
- refs/heads/bounds_correct_vs_bounds_loaded_reduced
- refs/heads/buffer_device_api_target
- refs/heads/bug_device_free
- refs/heads/bug_inline_unbounded
- refs/heads/build/fix-xcode-2
- refs/heads/build/manylinux-fixes
- refs/heads/circ_buffer
- refs/heads/cmake-no-runtime-debug-symbols
- refs/heads/cmake/asan
- refs/heads/cmake/deps-cleanup
- refs/heads/cmake/find-modules
- refs/heads/cmake/spirv
- refs/heads/cmake_wasm_features
- refs/heads/compute_at_guard_with_if_goes_on_stack
- refs/heads/compute_with_at
- refs/heads/compute_with_check
- refs/heads/compute_with_excessive_bounds
- refs/heads/compute_with_inlined
- refs/heads/compute_with_remove_is_right_level
- refs/heads/cpack/nuget
- refs/heads/ctest/wrappers
- refs/heads/cuda-constant
- refs/heads/d3d12-allocation-cache
- refs/heads/deferred_cse_after_inlining
- refs/heads/destructor_calls_deinit
- refs/heads/dg/deserialize_unmapped_objects
- refs/heads/dg/fix_vulkan_codegen_bool_conversion
- refs/heads/dg/vulkan_conform_api
- refs/heads/dg/vulkan_region_allocator_fixes
- refs/heads/dgerstmann/fix-vulkan-memory-config-init
- refs/heads/disable_acquire_release_test_vulkan
- refs/heads/distinct_wrapper_names
- refs/heads/dkg/6863_asan_fixes
- refs/heads/dkg/vulkan
- refs/heads/dpalermo_dmabuf
- refs/heads/dpalermo_dmabuf_libion
- refs/heads/dpalermo_hexagon_remote_202003
- refs/heads/dpalermo_sdk4_2_0_2
- refs/heads/ds/buffer-get-pure
- refs/heads/ds/opt-tile-size
- refs/heads/ds/tail-none
- refs/heads/ds/while
- refs/heads/dsharletg/bitwise-intrinsics
- refs/heads/dsharletg/find-vector-reduce
- refs/heads/dsharletg/jit-optimization
- refs/heads/dsharletg/memcpy-copy_from
- refs/heads/dsharletg/pattern-headroom
- refs/heads/dsharletg/refactor-host-alignment
- refs/heads/dsharletg/runtime-size
- refs/heads/dsharletg/simplify-abs
- refs/heads/dsharletg/simplify-type-bounds
- refs/heads/dsharletg/specialize-bounds
- refs/heads/dsharletg/upsample-channels
- refs/heads/empty_prefetch
- refs/heads/emscripten_vector_fix
- refs/heads/export_all-wsmoses
- refs/heads/expr_auto_sched
- refs/heads/extern_bugs
- refs/heads/extern_host_alloc
- refs/heads/factor_parallel_codegen_hack
- refs/heads/fast_sync_tsan
- refs/heads/faster_integer_division
- refs/heads/feature/apps-external
- refs/heads/feature/cmake-presets
- refs/heads/feature/convert
- refs/heads/feature/f16_interleave
- refs/heads/feature/gather_load_q7
- refs/heads/feature/llvm-codemodel
- refs/heads/feature/load_predicated
- refs/heads/feature/luma_regression
- refs/heads/feature/maintanence
- refs/heads/feature/reinterprets
- refs/heads/feature/tcm_bump_allocator
- refs/heads/feature/xtensa_fix_interleave_q8
- refs/heads/feature/xtensa_q8_tests
- refs/heads/find_intrinsics_issue
- refs/heads/find_intrinsics_widening_lets
- refs/heads/fix-floated-pure-stage
- refs/heads/fix-race-condition
- refs/heads/fix_hexagon_alignment
- refs/heads/fix_hvx_intrinsics
- refs/heads/fix_prefetch_test
- refs/heads/fix_windows_vs15_build
- refs/heads/fixed_length_vectors
- refs/heads/fixed_point_local_laplac
- refs/heads/gemmlowp
- refs/heads/generate
- refs/heads/gha/pip
- refs/heads/gpu_canon_fix
- refs/heads/halide_ir_flatbuffer
- refs/heads/hex_dma2_async
- refs/heads/hexagon_le_runtime
- refs/heads/hexagon_priority
- refs/heads/hexagon_setpriority
- refs/heads/hexagon_strided_pred_load
- refs/heads/hexagon_sysmon_markers
- refs/heads/imaging-synthesis
- refs/heads/includes_fix
- refs/heads/ios_fast_sync_fix
- refs/heads/jia-kai-fix-runtime-cuda-init
- refs/heads/kamil-openglcompute-infinity
- refs/heads/kamil/name_pthread_workers
- refs/heads/kp_bit_shift
- refs/heads/line_buffer
- refs/heads/loop_carry_not_working
- refs/heads/lower_on_huge_stack
- refs/heads/main
- refs/heads/master
- refs/heads/memoize_with_extents
- refs/heads/metal_float16
- refs/heads/metaprogrammed_simplifier_mod
- refs/heads/mohamedadaly-vmlal
- refs/heads/more_powerful_sliding
- refs/heads/new_autoschedule_with_new_simplifier_arm_worker_branch
- refs/heads/new_autoscheduler
- refs/heads/new_simplifier_rule_testing
- refs/heads/newer_ion_ioctl
- refs/heads/no_bounds_query_when_bounds_used
- refs/heads/opengl_compute_buffer_types_fix
- refs/heads/openglcompute_reuse_shared_allocations
- refs/heads/optmize_reorder
- refs/heads/par_for_opt
- refs/heads/pdb/fix_7806
- refs/heads/pdb/hexagon_remote_cmake
- refs/heads/pdb_add_libcpp_makefile_inc
- refs/heads/pdb_eliminate_interleaves_test
- refs/heads/pdb_fix_clang_build
- refs/heads/pdb_fix_install_qc
- refs/heads/pdb_fix_loop_carry
- refs/heads/pdb_fix_simd_op_check_hvx
- refs/heads/pdb_mul_div_mod_multi_thread
- refs/heads/pdb_remove_hvx_v64
- refs/heads/perform_inline_with_order
- refs/heads/pr/2572
- refs/heads/pr/2676
- refs/heads/pr/2975
- refs/heads/pr/3017
- refs/heads/pr/3081
- refs/heads/pr/3387
- refs/heads/pr/3939
- refs/heads/pr/3960
- refs/heads/pr/4380
- refs/heads/pr/4414
- refs/heads/pr/5331
- refs/heads/pr/5438
- refs/heads/pr/5455
- refs/heads/pr/5758_2
- refs/heads/predicated_vector
- refs/heads/prefetch_specialize
- refs/heads/print_schedule
- refs/heads/profile_hardware_counters
- refs/heads/random-pipelines
- refs/heads/rdom_with_pure_vars
- refs/heads/readme-fix-gcd
- refs/heads/realization_order
- refs/heads/refactor_module
- refs/heads/register_promotion
- refs/heads/release/10.x
- refs/heads/release/11.x
- refs/heads/release/12.x
- refs/heads/release/13.x
- refs/heads/release/14.x
- refs/heads/release/15.x
- refs/heads/release/16.x
- refs/heads/release/17.x
- refs/heads/release/8.x
- refs/heads/remove_max_on_fuse_factor
- refs/heads/reorder_rvar
- refs/heads/reset_unique_counter
- refs/heads/revert-3612-ataei-speedup_compiletime
- refs/heads/revert-7009-rootjalex/distribute-w_shl
- refs/heads/revert-7601-compile_hexagon_remote
- refs/heads/riscv_update
- refs/heads/rl_simplifier_rules
- refs/heads/rootjalex/add_simpl_rules
- refs/heads/rootjalex/arm-optimize
- refs/heads/rootjalex/autoscheduler_mcts
- refs/heads/rootjalex/bounds-rewriter
- refs/heads/rootjalex/bounds_synthesis
- refs/heads/rootjalex/cbounds
- refs/heads/rootjalex/cbounds_predicated
- refs/heads/rootjalex/fix-sat-overflow
- refs/heads/rootjalex/fix_estimate_issue
- refs/heads/rootjalex/fix_failed_unrolls
- refs/heads/rootjalex/gsoc_codegen
- refs/heads/rootjalex/improve_cbounds_fixed
- refs/heads/rootjalex/improve_constant_bounds
- refs/heads/rootjalex/pitchfork-arm
- refs/heads/rootjalex/reinterpret-simplify
- refs/heads/rootjalex/rts
- refs/heads/rootjalex/super_simplify_bounds
- refs/heads/rootjalex/test_cbounds_fixed
- refs/heads/rootjalex/test_constant_bounds
- refs/heads/rootjalex/trs-codegen
- refs/heads/rootjalex/trs-codegen-cross
- refs/heads/rootjalex/trs-merge
- refs/heads/rootjalex/uint32-int32-cast
- refs/heads/rootjalex/x86-hadds
- refs/heads/rootjalex/x86-optimize
- refs/heads/rootjalex/x86-optimize-test
- refs/heads/rootjalex/x86-sat
- refs/heads/rootjalex/x86-test
- refs/heads/rule_removal_experiments
- refs/heads/schedule-output-storage
- refs/heads/separate_bounds_query_entrypoint
- refs/heads/shallow
- refs/heads/shift_amount_type_change
- refs/heads/shoaibkamil/cmake-without-arm
- refs/heads/shoaibkamil/correct_memory_fences
- refs/heads/shoaibkamil/d3d-fixes
- refs/heads/shoaibkamil/deprecate_openglcompute
- refs/heads/shoaibkamil/json
- refs/heads/shoaibkamil/llvm_clone_tag
- refs/heads/shoaibkamil/minor-vcpkg-doc-change
- refs/heads/shoaibkamil/opengl_compute_tests
- refs/heads/shoaibkamil/performance_tests_as_generators
- refs/heads/shoaibkamil/rule_removal_experiments
- refs/heads/shoaibkamil/super_simplify_with_interpreter
- refs/heads/shoaibkamil/windows-arm-fix-attributes
- refs/heads/sim_shlib_addr_print
- refs/heads/simplify-nested-broadcasts
- refs/heads/simplify-vectorreduce-shuffles2
- refs/heads/simplify_mod
- refs/heads/sioutas_2020
- refs/heads/sioutas_2020_autoscheduler
- refs/heads/slomp/gpu-codegen-profiling
- refs/heads/slomp/msvc-static-analysis
- refs/heads/solve_div
- refs/heads/solve_div_master
- refs/heads/solve_div_simplifier_test
- refs/heads/sr/python-late-binding-defaults
- refs/heads/srj-aaa
- refs/heads/srj-alloc
- refs/heads/srj-alloca
- refs/heads/srj-appmake2
- refs/heads/srj-armv83a
- refs/heads/srj-aslog
- refs/heads/srj-assert
- refs/heads/srj-assoc
- refs/heads/srj-auto-multi
- refs/heads/srj-auto-multi2
- refs/heads/srj-auto_schedule_mat_mul
- refs/heads/srj-autosched
- refs/heads/srj-b2cpphide
- refs/heads/srj-barr
- refs/heads/srj-bits
- refs/heads/srj-blacklist
- refs/heads/srj-bounds
- refs/heads/srj-bufcalltype
- refs/heads/srj-bufcallwrap
- refs/heads/srj-bufcallwrap2
- refs/heads/srj-buffer
- refs/heads/srj-bv
- refs/heads/srj-classic-autotune
- refs/heads/srj-clean
- refs/heads/srj-constcall
- refs/heads/srj-crosscompile
- refs/heads/srj-ctlz
- refs/heads/srj-cvec-patch
- refs/heads/srj-dag
- refs/heads/srj-debug-to-file
- refs/heads/srj-deir
- refs/heads/srj-f16
- refs/heads/srj-fp16
- refs/heads/srj-fsch
- refs/heads/srj-fthru
- refs/heads/srj-g2
- refs/heads/srj-g3
- refs/heads/srj-gha-test-fixes
- refs/heads/srj-hidden
- refs/heads/srj-hide2
- refs/heads/srj-hvx
- refs/heads/srj-hvx-bug
- refs/heads/srj-hvx-codegen-bug
- refs/heads/srj-hvx-nocopy
- refs/heads/srj-hvxshift
- refs/heads/srj-iib
- refs/heads/srj-initshape
- refs/heads/srj-inv
- refs/heads/srj-ir
- refs/heads/srj-irmut2
- refs/heads/srj-iwyu
- refs/heads/srj-iwyu3
- refs/heads/srj-javascript_work_in_progress
- refs/heads/srj-lensblur
- refs/heads/srj-lessinc
- refs/heads/srj-llvm-loop-opt
- refs/heads/srj-mak
- refs/heads/srj-maxthreads
- refs/heads/srj-mod
- refs/heads/srj-msan
- refs/heads/srj-msan-call
- refs/heads/srj-muldivmod
- refs/heads/srj-mut
- refs/heads/srj-outputs-2
- refs/heads/srj-parse
- refs/heads/srj-pch
- refs/heads/srj-printfunc
- refs/heads/srj-pygp
- refs/heads/srj-revertbits
- refs/heads/srj-schedule-storage
- refs/heads/srj-shl-shr-2
- refs/heads/srj-sio
- refs/heads/srj-static-const
- refs/heads/srj-strided-store
- refs/heads/srj-tidyh
- refs/heads/srj-tiff
- refs/heads/srj-trace
- refs/heads/srj-tutorial
- refs/heads/srj-using
- refs/heads/srj-wasmfix
- refs/heads/srj-xor2
- refs/heads/srj/abstract-gen-without-get-output-func-KEEP
- refs/heads/srj/aligned-alloc
- refs/heads/srj/aligned-alloc-2
- refs/heads/srj/aligned-malloc-with-aligned-alloc
- refs/heads/srj/all-explicit-ctor
- refs/heads/srj/anderson-thread-info-ptr
- refs/heads/srj/aot-perf
- refs/heads/srj/argv-signatures
- refs/heads/srj/argv-types
- refs/heads/srj/async-test
- refs/heads/srj/b2cpp-const-data
- refs/heads/srj/better-xt-dispatch
- refs/heads/srj/bfloat1
- refs/heads/srj/bp
- refs/heads/srj/build_halide_h
- refs/heads/srj/c-bool
- refs/heads/srj/cache-clear
- refs/heads/srj/clang-fmt-ignore
- refs/heads/srj/clang-tidy
- refs/heads/srj/clear-c-cache
- refs/heads/srj/cmake-asan
- refs/heads/srj/cmake-asan2
- refs/heads/srj/cmake-jit-generators
- refs/heads/srj/configure-cmake
- refs/heads/srj/cpp-generator-v2-experiment-KEEP
- refs/heads/srj/crosscompile
- refs/heads/srj/ctad
- refs/heads/srj/depr
- refs/heads/srj/deprecation
- refs/heads/srj/device-copy
- refs/heads/srj/example
- refs/heads/srj/experiment
- refs/heads/srj/experiment-6967
- refs/heads/srj/exporting
- refs/heads/srj/expr_t
- refs/heads/srj/external-tensors
- refs/heads/srj/fix-pytorch
- refs/heads/srj/fixed-rollback
- refs/heads/srj/fopen-fix
- refs/heads/srj/forward
- refs/heads/srj/forward-name
- refs/heads/srj/gen-func
- refs/heads/srj/gen-func-2
- refs/heads/srj/gen-func-3
- refs/heads/srj/gen2-1
- refs/heads/srj/gen_closure
- refs/heads/srj/generator_aot_gpu_multi_context_threaded
- refs/heads/srj/globals
- refs/heads/srj/halide-buffer-crop
- refs/heads/srj/halide-malloc-alignment
- refs/heads/srj/halide-must-use
- refs/heads/srj/halide-runtime-must-use-result
- refs/heads/srj/hang-repro
- refs/heads/srj/hannk
- refs/heads/srj/hannk-aliasing
- refs/heads/srj/hannk-error-checking
- refs/heads/srj/hannk-errors
- refs/heads/srj/hannk-inplace
- refs/heads/srj/hannk-mmap
- refs/heads/srj/hannk-tflite-27
- refs/heads/srj/hannk-verbosity
- refs/heads/srj/hdrs
- refs/heads/srj/html-becomes-viz
- refs/heads/srj/implicit-mult-widening
- refs/heads/srj/issue-7076
- refs/heads/srj/iwyu
- refs/heads/srj/iwyu-2
- refs/heads/srj/iwyu-6
- refs/heads/srj/libHANNK
- refs/heads/srj/llvm_type_of
- refs/heads/srj/maybe-unused
- refs/heads/srj/meanop
- refs/heads/srj/metadata-calling-convention
- refs/heads/srj/more-tidy
- refs/heads/srj/msan-dtf
- refs/heads/srj/multimeta
- refs/heads/srj/nanobind
- refs/heads/srj/new-rt-1
- refs/heads/srj/no-threadpool
- refs/heads/srj/no-timeout-thread
- refs/heads/srj/oglc-mutexed
- refs/heads/srj/param-map
- refs/heads/srj/pip-15.x
- refs/heads/srj/pip-cron
- refs/heads/srj/possible-uninited
- refs/heads/srj/pr-7566
- refs/heads/srj/printer-size
- refs/heads/srj/profiler-data-race
- refs/heads/srj/ptr-int-cast
- refs/heads/srj/pyapps
- refs/heads/srj/pyext-fix
- refs/heads/srj/pygen-class
- refs/heads/srj/pygen-deux
- refs/heads/srj/pygen-func
- refs/heads/srj/pygen-native-types
- refs/heads/srj/pyinstall
- refs/heads/srj/pypi-try
- refs/heads/srj/pystuff
- refs/heads/srj/python-buffer-unpack
- refs/heads/srj/python-tutorial
- refs/heads/srj/reshape
- refs/heads/srj/rt-error-smallify
- refs/heads/srj/rt-return-types
- refs/heads/srj/runtime-error-handling
- refs/heads/srj/sat-fixes-exp
- refs/heads/srj/sat-fixes-exp-2
- refs/heads/srj/shadow-field
- refs/heads/srj/snprintf
- refs/heads/srj/spirv-license
- refs/heads/srj/stat-buf-deprecations
- refs/heads/srj/static-buffer-generators
- refs/heads/srj/stmt-html
- refs/heads/srj/stringify
- refs/heads/srj/synth-gen-params
- refs/heads/srj/synth-params-python
- refs/heads/srj/test-arm_sve_redux
- refs/heads/srj/test-intrinsics-bounds
- refs/heads/srj/test8076
- refs/heads/srj/test8078
- refs/heads/srj/test8094
- refs/heads/srj/test8105a
- refs/heads/srj/test8115
- refs/heads/srj/test_tmpdir_fix
- refs/heads/srj/tidy
- refs/heads/srj/tidy-format-14
- refs/heads/srj/tidymore
- refs/heads/srj/tidymore2
- refs/heads/srj/tls
- refs/heads/srj/tls-3
- refs/heads/srj/tls-4
- refs/heads/srj/tls-ucon
- refs/heads/srj/tmp-unschedule-experiment
- refs/heads/srj/tot-fix
- refs/heads/srj/try-revert-sat
- refs/heads/srj/type-traits
- refs/heads/srj/typed-func
- refs/heads/srj/ucon-all-const
- refs/heads/srj/ucon-non-const
- refs/heads/srj/visit-warnings
- refs/heads/srj/wasm-atomic2
- refs/heads/srj/wasm-simd
- refs/heads/srj/wasm-stuff
- refs/heads/srj/wasm-threads
- refs/heads/srj/wasm-updates
- refs/heads/srj/wasm-work
- refs/heads/srj/wip
- refs/heads/srj/x-rounding
- refs/heads/srj/xbuf
- refs/heads/srj/xc+plus+size+tmp
- refs/heads/srj/xc-types
- refs/heads/srj/xt-uint-cast-test
- refs/heads/srj/xtensa-arch
- refs/heads/srj/xtensa-merge
- refs/heads/srj/xvc-experimetn
- refs/heads/srj/zlib-embed
- refs/heads/standalone_autoscheduler
- refs/heads/standalone_autoscheduler_arm_worker
- refs/heads/standalone_autoscheduler_arm_worker_amazon
- refs/heads/standalone_autoscheduler_gpu
- refs/heads/standalone_autoscheduler_hexagon
- refs/heads/sticky_task_assignments
- refs/heads/store_with
- refs/heads/store_with_solver_for_super_simplify
- refs/heads/strict_float_cse_fix
- refs/heads/super_simplify
- refs/heads/super_simplify_v2
- refs/heads/super_simplify_v3
- refs/heads/transitive_wrapper
- refs/heads/trigger-release-v16
- refs/heads/tzumao-autodiff-boundarycond
- refs/heads/tzumao-gradient-autoscheduler-bug
- refs/heads/tzumao-predicate-store-load
- refs/heads/tzumao-python-buffer
- refs/heads/tzumao_autodiff_unbounded
- refs/heads/tzumao_improve_gradient_autoscheduler
- refs/heads/tzumao_issue_4297
- refs/heads/tzumao_licm_before_BI
- refs/heads/unbounded_bugs
- refs/heads/undo_async_copy_chain_black_list
- refs/heads/use_string_literals_for_blobs
- refs/heads/users/lukas/python-pip
- refs/heads/validate_sched_error_msg
- refs/heads/var_ir_fix
- refs/heads/vksnk/async-experiment
- refs/heads/vksnk/async-multiple-producers
- refs/heads/vksnk/async-order
- refs/heads/vksnk/better-loop-carry
- refs/heads/vksnk/better-message
- refs/heads/vksnk/bound-storage
- refs/heads/vksnk/bounds-widen-right
- refs/heads/vksnk/c-print-type
- refs/heads/vksnk/c-round
- refs/heads/vksnk/check-return-result
- refs/heads/vksnk/compute-with-bug
- refs/heads/vksnk/compute_with_async
- refs/heads/vksnk/dma-limit-channels
- refs/heads/vksnk/dma-min-max
- refs/heads/vksnk/expr-match-shuffle
- refs/heads/vksnk/extract-from-scalar
- refs/heads/vksnk/f16-load
- refs/heads/vksnk/fix-packvr
- refs/heads/vksnk/fix_halide_xtensa_narrow_with_rounding_shift_i16
- refs/heads/vksnk/fused-compute-with
- refs/heads/vksnk/hoist-storage-bug
- refs/heads/vksnk/lerp-intrinsics
- refs/heads/vksnk/lower-signed-shifts
- refs/heads/vksnk/missing-exception
- refs/heads/vksnk/non-widening-halves
- refs/heads/vksnk/optimize-shuffles
- refs/heads/vksnk/replace-all
- refs/heads/vksnk/restrict
- refs/heads/vksnk/roll-buffer
- refs/heads/vksnk/roundeven-arm
- refs/heads/vksnk/rvar-bounds
- refs/heads/vksnk/simplify-slice
- refs/heads/vksnk/skip-semaphores
- refs/heads/vksnk/storage-folding
- refs/heads/vksnk/strided-load-of-4_2
- refs/heads/vksnk/typed-scope
- refs/heads/vksnk/update-simd-driver
- refs/heads/vksnk/vectorize-bug
- refs/heads/vksnk/vectorize-scalarize
- refs/heads/vksnk/widening_absd
- refs/heads/vksnk/xtensa-codegen-fp16
- refs/heads/vksnk/xtensa-dma-improvements
- refs/heads/vksnk/xtensa-regroup-pass
- refs/heads/vksnk/xtensa/lift-allocs
- refs/heads/vulkan
- refs/heads/vulkan-diagnose-alloc-failures
- refs/heads/vulkan-phase0-adts
- refs/heads/vulkan-phase1-spirv
- refs/heads/vulkan-phase2-runtime
- refs/heads/vulkan2
- refs/heads/vulkan_fix_gpu_dynamic_shared_test
- refs/heads/vulkan_fix_subregion_memory_offsets
- refs/heads/webassembly-old
- refs/heads/winograd
- refs/heads/wording_fix
- refs/heads/xtensa-codegen
- refs/heads/xtensa-codegen-parallel
- refs/heads/xuanda/fix-serialize-bad-partition-always
- refs/remotes/origin/rootjalex/add_autosched_caching
- refs/tags/release_2018_02_15
- refs/tags/release_2019_08_27
- refs/tags/release_8.0.0
- refs/tags/v10.0.0
- refs/tags/v10.0.1
- refs/tags/v11.0.0
- refs/tags/v11.0.1
- refs/tags/v12.0.0
- refs/tags/v12.0.1
- refs/tags/v13.0.0
- refs/tags/v13.0.1
- refs/tags/v13.0.2
- refs/tags/v13.0.3
- refs/tags/v13.0.4
- refs/tags/v14.0.0
- refs/tags/v15.0.0
- refs/tags/v15.0.1
- refs/tags/v16.0.0
- refs/tags/v17.0.0
- refs/tags/v17.0.1
- refs/tags/v8.0.0
Take a new snapshot of a software origin
If the archived software origin currently browsed is not synchronized with its upstream version (for instance when new commits have been issued), you can explicitly request Software Heritage to take a new snapshot of it.
Use the form below to proceed. Once a request has been submitted and accepted, it will be processed as soon as possible. You can then check its processing state by visiting this dedicated page.Processing "take a new snapshot" request ...
Permalinks
To reference or cite the objects present in the Software Heritage archive, permalinks based on SoftWare Hash IDentifiers (SWHIDs) must be used.
Select below a type of object currently browsed in order to display its associated SWHID and permalink.
Revision | Author | Date | Message | Commit Date |
---|---|---|---|---|
23c5307 | Volodymyr Kysenko | 13 May 2021, 18:54:15 UTC | Leave it uninitialized | 13 May 2021, 18:54:15 UTC |
e80c0a9 | Volodymyr Kysenko | 13 May 2021, 15:11:52 UTC | Actually, can't assign to CppVector (only to NativeVector), so do ::broadcast instead | 13 May 2021, 15:11:52 UTC |
10a85ec | Volodymyr Kysenko | 13 May 2021, 02:21:07 UTC | Fix CodeGen_C::print_scalarized_expr * CppVector/NativeVector object doesn't have .replace() anymore. * Initialize vector with zero to avoid warning. | 13 May 2021, 02:21:07 UTC |
b2947e9 | Alex Reinking | 12 May 2021, 20:55:49 UTC | Fix Windows apps (#5999) * Place DLLs on Windows by copying. * Disable Hannk on Windows by default | 12 May 2021, 20:55:49 UTC |
6bb87cf | Andrew Adams | 12 May 2021, 16:07:25 UTC | Stop interleaving stores from generating too-large vectors (#5996) * Stop interleaving stores from generating too-large vectors * Remove integer constant * Use mul_would_overflow helper instead | 12 May 2021, 16:07:25 UTC |
33308d9 | Dillon Sharlet | 12 May 2021, 16:06:49 UTC | Add pmaddubsw support (#5997) * Add pmaddubsw support * Move pmaddubsw checks to ssse3 * These patterns rae a bit finnicky | 12 May 2021, 16:06:49 UTC |
3dce2d5 | Dillon Sharlet | 11 May 2021, 19:18:20 UTC | Small H::R::B cleanups and improvements (#5957) * Reuse helpers from halide_buffer_t * Combine decref and decrev_dev to hopeuflly reduce overhead. * Remove redundant public * This old logic was necessary Co-authored-by: Steven Johnson <srj@google.com> | 11 May 2021, 19:18:20 UTC |
257b2f5 | Dillon Sharlet | 11 May 2021, 00:10:01 UTC | Small performance portabiilty tweaks (#5989) Co-authored-by: Steven Johnson <srj@google.com> | 11 May 2021, 00:10:01 UTC |
6b2732a | Dillon Sharlet | 10 May 2021, 22:51:37 UTC | Fix build with asserts enabled (#5987) * Minor cleanups after #5983 * Work around linker breakage!? * This doesn't need to be a constant. * Mark power_of_two constructor explicit Co-authored-by: Steven Johnson <srj@google.com> | 10 May 2021, 22:51:37 UTC |
fcf9046 | Dillon Sharlet | 10 May 2021, 21:47:22 UTC | Fix specializing on stride issue (fixes #5907) (#5950) * Fix specializing on stride issue (fixes #5907) * Remove stale comment * Add broadcasting test to CMakeLists.txt * remove_dead_lets -> remove_dead_code * Add test for specializing only on stride. * Remove broadcasting performance test. * Also remove from CMake | 10 May 2021, 21:47:22 UTC |
6e23346 | Steven Johnson | 10 May 2021, 20:12:06 UTC | Refactor hannk's compare_vs_tflite code to be mostly library (#5991) * Refactor compare_vs_tflite into library+shell Also, drive-by change to the test names to keep them matching filenames more closely * wip * Update compare_vs_tflite.cpp * wip * Fixes * clang-format * Fix Makefile * trigger buildbots | 10 May 2021, 20:12:06 UTC |
e33438a | Dillon Sharlet | 10 May 2021, 19:23:11 UTC | Revert "Stack input and filter to reduce generated code in FFT app (#5985)" (#5992) This reverts commit d2539287fe4c0c51128a78dc51c2c6d1812cd694. | 10 May 2021, 19:23:11 UTC |
d253928 | Dillon Sharlet | 10 May 2021, 17:57:07 UTC | Stack input and filter to reduce generated code in FFT app (#5985) * Stack input and filter to reduce generated code. * Change comments. | 10 May 2021, 17:57:07 UTC |
9eeade3 | Steven Johnson | 10 May 2021, 17:24:10 UTC | Rename CHECK->HCHECK, LOG->HLOG in hannk (#5986) Quick-n-dirty rename to avoid conflicts with Abseil/google3. Longer term fix will be forthcoming. | 10 May 2021, 17:24:10 UTC |
2e47968 | Steven Johnson | 10 May 2021, 17:23:24 UTC | Fix for upstream LLVM (#5988) * Fix for upstream LLVM * Fixes | 10 May 2021, 17:23:24 UTC |
5550f96 | Dillon Sharlet | 07 May 2021, 22:44:40 UTC | Don't hardcode depthwise padding. (#5984) | 07 May 2021, 22:44:40 UTC |
e0b7d8a | Dillon Sharlet | 07 May 2021, 22:44:03 UTC | Refactor quantized multiplications (#5983) * Refactor quantized multiplications * Move comment. * clang-format * base -> mantissa | 07 May 2021, 22:44:03 UTC |
e980b27 | Steven Johnson | 07 May 2021, 18:43:54 UTC | advance_ptrs() should use refs, not ptrs (#5981) * advance_ptrs() should use refs, not ptrs Examination of compiled output (x86-64, clang w/ optimizer) shows slightly better codegen. * Update HalideBuffer.h | 07 May 2021, 18:43:54 UTC |
aac383f | Steven Johnson | 07 May 2021, 18:32:39 UTC | Add dynamically-typed scalar inputs to Generator (#5953) (#5965) * Add dynamically-typed scalar inputs to Generator (#5953) * Update stubuser_generator.cpp * clang-format | 07 May 2021, 18:32:39 UTC |
aba3a80 | Andrew Adams | 07 May 2021, 02:47:08 UTC | Use a VectorReduce not to determine if any lanes are true in Hexagon backend (#5978) | 07 May 2021, 02:47:08 UTC |
9f7a459 | Steven Johnson | 06 May 2021, 21:57:47 UTC | Add missing #include in buffer_util.h (#5979) * Add missing #include in buffer_util.h * Update buffer_util.h | 06 May 2021, 21:57:47 UTC |
3f799ff | Dillon Sharlet | 06 May 2021, 21:35:55 UTC | Optimize copy_from a little (#5977) | 06 May 2021, 21:35:55 UTC |
3f83f5a | Dillon Sharlet | 06 May 2021, 20:25:57 UTC | Add transpose op (#5968) * Add transpose op. * Fix type requirements * Add tests for some misc ops. * Fix type checks | 06 May 2021, 20:25:57 UTC |
93c878e | Dillon Sharlet | 06 May 2021, 20:00:46 UTC | Optimize add generator (#5972) * Optimize add generator. * Update benchmarks * Vectorize wider for more ILP * Better add implementation. * More tweaks. ARM is really sensitive to exactly how these shifts are done. * More performance portable implementation of add. * Put signs back * Add comment. | 06 May 2021, 20:00:46 UTC |
42b5f79 | Dillon Sharlet | 06 May 2021, 19:54:35 UTC | Optimize fully connected when there are more than 4 batches (#5969) * Optimize fully connected when there are more than 4 batches * Fix crazy working typo * Do batches inside channels. * Fix missed constant | 06 May 2021, 19:54:35 UTC |
2ebaedd | Alex Reinking | 06 May 2021, 17:06:32 UTC | Don't add Halide DLL to PATH on Windows. (#5973) This conflicts with vcpkg's own binary copying on Windows and makes cross compiling more difficult. It also runs into issues with excessively long commands when the user's PATH variable is very long. | 06 May 2021, 17:06:32 UTC |
ed989db | Dillon Sharlet | 06 May 2021, 16:52:19 UTC | Remove some more old codegen workarounds and cleanups (#5932) * Remove old codegen workarounds * Pre-AVX2 codegen still needs this :( * trigger buildbots Co-authored-by: Steven Johnson <srj@google.com> | 06 May 2021, 16:52:19 UTC |
813180b | Steven Johnson | 06 May 2021, 16:38:22 UTC | HalideBuffer should use D=max_rank instead of D=4 (#5971) This prevents mallocs for some degenerate cases where we need buffers with > 4 dimensions. | 06 May 2021, 16:38:22 UTC |
ab3670d | Alex Reinking | 16 March 2021, 11:37:38 UTC | Clean up CMake helpers. 1. Add HEADER output to add_halide_library. 2. Use $<BUILD_INTERFACE:...> in generated target include paths. 3. Clean up logic (reduce nesting). 4. Use lower-case names for local variables. 5. Print paths to detected Clang and LLD config scripts. 6. Honor normal variable overrides for Halide_TARGET. | 06 May 2021, 05:52:34 UTC |
0b77168 | Alex Reinking | 08 April 2021, 18:20:37 UTC | Consistently use Halide_* prefixes in CMake. | 06 May 2021, 05:52:34 UTC |
0c0b117 | Steven Johnson | 05 May 2021, 23:41:34 UTC | Upgrade hannk's TFLite version to 2.5.0-rc3 (#5970) * Upgrade hannk's TFLite version to 2.5.0-rc3 * Drive-by cleanup of test names | 05 May 2021, 23:41:34 UTC |
438ddf8 | Dillon Sharlet | 05 May 2021, 23:39:44 UTC | Optimize depthwise convolution (#5964) * Try factoring depthwise reduction. * Revert unnecessary change. Co-authored-by: Steven Johnson <srj@google.com> | 05 May 2021, 23:39:44 UTC |
93dad62 | Alex Reinking | 05 May 2021, 20:23:56 UTC | Fix tutorial 15 test (#5966) | 05 May 2021, 20:23:56 UTC |
4a489e6 | Steven Johnson | 05 May 2021, 19:55:01 UTC | Ensure our local flatbuffers.h is included in preference to system variants (#5962) * Ensure our local flatbuffers.h is included before system variants The local version might be too old for TFLite * Update CMakeLists.txt * Update CMakeLists.txt * Silence the noise noise noise noise NOISE * fix policy | 05 May 2021, 19:55:01 UTC |
95047ca | Dillon Sharlet | 05 May 2021, 16:55:59 UTC | Optimize pooling ops (#5963) * Avoid padding for pool ops. * Use reciprocal to implement division. | 05 May 2021, 16:55:59 UTC |
115e597 | Steven Johnson | 05 May 2021, 01:28:21 UTC | Fix FetchContent for hannk (#5960) Apparently using SOURCE_SUBDIR + FetchContent_MakeAvailable() doesn't work on the buildbots. This is equivalent and does work. ¯\_(ツ)_/¯ | 05 May 2021, 01:28:21 UTC |
675748a | Steven Johnson | 05 May 2021, 00:02:02 UTC | Add apps/hannk to apps/CMakeLists.txt (#5958) ...but with an option for disabling it, for the buildbots | 05 May 2021, 00:02:02 UTC |
abefa3c | Dillon Sharlet | 04 May 2021, 23:45:21 UTC | Remove nn_ops app in favor of hannk app. (#5893) | 04 May 2021, 23:45:21 UTC |
7b79dca | Dillon Sharlet | 04 May 2021, 23:44:53 UTC | Add HANNK app (#5891) * More accurate approx_log2/exp2. * Add tests from inception_v4 * Improve precision of log2/exp2 related functions. * Add tanh and clean up generators. * Add version-checking to compare_vs_tflite and issue a warning if major and minor versions mismatch * Restore inadvertent @ removal * Add build_hannk/test_hannk targets to Makefile, to make specialized testing on select buildbots easier for now * More hacky padding for depthwise. * Add TODO * trigger buildbots * Add mean op, enable resnet50 to work. * Fix build failure on ARM. * Grammar * Remove stale TODO. * Make tensors shared_ptr. * Fuse double paddings. * Reduce padding for ARM. * Enable DimMap to express alignment. * Remove crops from execute. * Model -> OpGroup refactor. * Add DimMap::align. * Add proper alignment to DimMap. * Recursively transform. * Use cubic polynomials to approximate log2 and exp2. * Add --use_hannk option * Add mul op support. * Add TODO * Add disambiguating parens * Fix boneheaded broadcasting bug. * Less aggressive broadcasting. * Inline basic arithmetic. * Remove unnecessary using directives. * Fix stray unique_ptr<Tensor> * Implement space to depth and depth to space. * Enable scalar boolean comparison ops. * Support ReLUx as unary ops. * CHECK(false) -> LOG(FATAL) * Naming consistency. * More precise mul * Add some easy ops (NEG and SQUARE) * Fix asserts. * Don't segfault if interpreter can't be created * Add comment. * Remove dead file. * Fix excessive precision in softmax. * Lazy-init seeds in compare_vs_tflite, in case use_hannk=0 * Add TODO * Remove scalpel left in patient * Update model.h * Allow broadcasting of c of input 2 * Remove now-pointless specialization helper. * Put the common case specialization first. * Move pooling ops to the same generator file. * Fix softmax correctness issues * Don't benchmark when testing. * Rearrange input parameters. * Remove multiply_quantized helper. * kTfLiteError -> kTfLiteDelegateError * Remove unnecessary check for log2(0) * Fix details of ReshapeOp to match tflite's impl * Add Shape op. * Generically handle elementwise operations of any rank. * Some of these aren't elementwise. * Minor cleanups * Minor cleanup in ReshapeOp::execute() * Remove unused functions * Add Greater, GreaterEqual to delegate * clang-format * Update normalizations_generator.cpp * Avoid horrific clang-format suggestion. * clang-format * Fix common_halide test. * Fix typo. * Fix asserts. * clang-format * Save compare_vs_tflite outputs from first run (not post-benchmark) * Enable approx_exp2 for int16 results without overflow. * Clean up precision of transcendentals * Fix accidental widening of shift by a constant. * Move elementwise generators to the same file. * Report profiler after each test * Optimize fully connected a lot * Add elementwise program interpreter * Add elementwise program interpreter * clang-format * WIP LSTM * Fix Interpreter::inputs and outputs. * Fix some precision and scheduling issues of LSTM * Fix LSTM op * Fix build breakage. * Fix comments. * clang-format * Add wrapper for constructing elementwise programs. * clang-format * Use ElementwiseProgram to implement LstmElementwise * Compress programs and instructions by storing them in int16 and more CISC * Reduce verbose repetitive declarations. * Optimize constant zeros. * Use a named constant for the size of each instruction. * Remove unnecessary const instruction. * Optimize and clean up elementwise programs * Reduce overhead from H::R::B * More H::R::B overhead cleanup. * Add missing include. * Various fixes and improvements. * Add support for LSTM to the hannk delegate (#5943) * Add support for LSTM to the hannk delegate * clang-format * Add support for dynamic tensors to hannk (#5942) * Initial support for Dynamic Tensors in hannk * Update hannk_delegate.cpp * Fixes * Smarten Tensor::resize() * More H::R::B overhead cleanup * Refactor IsNodeSupported * Minor fixes * Fix member name style * Fix is_alias * Add is_no_op for some ops * Log reasons for node rejection if verbosity >= 1 * clang-format * Fix Concat handling for delegate * Properly parse split op * Add SPLIT_V * Fix regression in PadForOps * Add asserts * Revise ReshapeOp to just use a shape tensor * Update ops.cpp * clang-format * Scale tolerance with the data type. * Compress disassembly a bit * Fix bug with aliasing. * Add --use_tflite flag to compare_vs_tflite Allows disabling the reference run, for running just our delegate alone * Clean up hannk makefile * Regularize all of hannk's own include paths to be relative to apps/hannk; this simplifies things and will allow removing some hacks in Blaze/Bazel and also the upcoming CMake support * Remove some gratuitous uses of std::vector. * Remove unused function. * Remove more instances of std::vector * More SmallVector usage. * Avoid vector in can_use_elementwise_program. * Minor drive by fixes. * Remove some more H::R::B copies * Add broadcast support to elementwise programs * Also tweak the generated schema file include path * clang-format * clang-format * Stale TODO * Upgrade hannk to tf2.5 + more (#5948) * Upgrade hannk to tf2.5 + more - upgrade default TFLite to 2.5.0-rc2 - Revise build instructions & assumptions for TFlite (use CMake for it now instead of Bazel) - Revised Android build instructions (now assumes that tflite is built locally rather than pulled from a prebuilt) - Remove the need for flatc/flatbuffers - Minor fixes to the run-on-device scripts * Update Makefile * Update Makefile * Fix some harmless errors related to input slots * TFlite is too sloppy with dimensions. * Intervals are min, max, not min, extent. * Refactor compare_vs_tflite Lots of internal code motion to clean things up and prepare for adding an internal-delegate code path. Immediate change is just `--enable [h][t][x]` instead of the old "use" flags, and reducing the default max-num-of-diffs to 8 instead of 32. * Add an alias for OpPtr * Don't alias inputs that might be used elsewhere. * Use the right tensors when parsing. * Don't schedule sum_filter separately, and avoid 8-bit multiplies for x86 * Improve aliasing logic * Small optimizations. * Add no_bounds_query to elementwise pipelines. * Clean up std:;shared_ptr/H::R::B overhead. * More cvt refactoring, plus clang-format * Fix asserts. * Remove unused trace_ member * Remove unnecessary argument. * Fix x86 * Update compare_vs_tflite.cpp * Add internal-delegate option to CVT * Fix run_compare_on_device for recent changes * Revert possibly broken space to depth optimization. * Add gather and more binary op support. * Update compare_vs_tflite.cpp * Default external-delegate to disabled * clang-format * Better implementation of SpaceToDepth/DepthToSpace. * clang-format * Reduce buffer copy overhead. * Remove unnecessary types. * Consistent multiplication order. * Pad to at least FnRank * clang-format * De-inline two operator<<()'s * compare_vs_tflite error handling - If any of the comparisons fail, exit with a nonzero error code - add `--tolerance` flag to allow tweaking allowable tolerance on a per-pipeline basis * Add CMake build rules to apps/hannk (#5955) * Add CMake build rules to hannk * Update ops.cpp * Fix features * Update CMakeLists.txt * Add tests * Fix cmake issues * Use CMAKE_GENERATOR * Update configure_cmake.sh * Delegate Fixes * Update flag handling in compare_vs_tflite This is gratuitous but was bugging me: - Flags now understand both `--flag value` or `--flag=value` - Unknown flags now fail hard instead of being ignored Co-authored-by: Steven Johnson <srj@google.com> | 04 May 2021, 23:44:53 UTC |
f45d323 | Andrew Adams | 04 May 2021, 16:32:56 UTC | Non-widening lowering of rounding shifts (#5956) This version lowers it without needing to widen, which is a large win on x86 for 16 and 32-bit types (3.8x faster and 2.8x faster respectively). It's a very slight slowdown for 8-bit because x86 doesn't have 8-bit shift instructions. Also drive-by typo fix. | 04 May 2021, 16:32:56 UTC |
94c0eca | Dillon Sharlet | 04 May 2021, 00:17:39 UTC | Use dot products for sums. (#5954) | 04 May 2021, 00:17:39 UTC |
5a0d1e5 | Volodymyr Kysenko | 03 May 2021, 16:34:37 UTC | Support VectorReduce in CodeGen_C (#5952) | 03 May 2021, 16:34:37 UTC |
8b9deea | Dillon Sharlet | 30 April 2021, 20:56:30 UTC | Fix bugs when D != 4 (#5951) * Fix bugs when D != 4 * clang-format | 30 April 2021, 20:56:30 UTC |
093e8df | Fangrui Song | 29 April 2021, 22:58:43 UTC | Replace llvm::sys::fs::F_None with llvm::sys::fs::OF_None (#5946) The former is deprecated. | 29 April 2021, 22:58:43 UTC |
fcbd2ee | Dillon Sharlet | 27 April 2021, 23:52:57 UTC | Fix build issue in runtime. (#5944) | 27 April 2021, 23:52:57 UTC |
a391e9a | AbdouTlili | 27 April 2021, 23:13:06 UTC | adding a note in the README.md to use -j option in make --build (#5938) * adding a note in the README.md to use -j option in make --build * wrapped the added section to 80 column | 27 April 2021, 23:13:06 UTC |
5a69e9f | Dillon Sharlet | 26 April 2021, 21:12:53 UTC | Fix flattening of ramps involving 64-bit mins (#5940) * Fix flattening of ramps involving 64-bit mins. * Use make_const instead of cast. | 26 April 2021, 21:12:53 UTC |
91e42f4 | Steven Johnson | 26 April 2021, 20:10:21 UTC | Don't use as_const_int() on temporaries (#5939) Sometimes we get lucky and it's still valid, but it's always wrong. | 26 April 2021, 20:10:21 UTC |
1b3cbcb | aankit-ca | 26 April 2021, 17:55:12 UTC | [Hexagon] Try vdelta/vrdelta before vlut for some shuffles. (#5935) The patch tries to generate vdelta/vrdelta instructions for non-ramp shuffles. Eg: shuffle(lut_expr, < 0, 1, 2, 3, 4, 5, 6, 7, 9, 10, 11, 12, 13, 14, 15, 16, 18, 19, 20, 21, 22, 23, 24, 25, 27, 28, 29, 30, 31, 32, 33, 34, 36, 37, 38, 39, 40, 41, 42, 43, 45, 46, 47, 48, 49, 50, 51, 52, 54, 55, 56, 57, 58, 59, 60, 61, 63, 64, 65, 66, 67, 68, 69, 70>) can be generated using vrdelta. The patch also fixes a bug where we bitcast vdelta/vrdelta with 16/32 bits elements to wrong type. User would see the below error: llvm-project/llvm/lib/IR/Instructions.cpp:2905: static llvm::CastInst *llvm::CastInst::Create(Instruction::CastOps, llvm::Value *, llvm::Type *, const llvm::Twine &, llvm::Instruction *): Assertion `castIsValid(op, S, Ty) && "Invalid cast!"' failed. Co-authored-by: Ankit Aggarwal <aankit@quicinc.com> | 26 April 2021, 17:55:12 UTC |
ba89623 | Shivam Gupta | 23 April 2021, 16:19:40 UTC | Small Typo fix in lesson 06 (#5936) Signed-off-by: xgupta <shivam98.tkg@rediffmail.com> | 23 April 2021, 16:19:40 UTC |
a407acd | Steven Johnson | 22 April 2021, 16:29:01 UTC | Revert "Temporarily disable hanging test (#5925)" (#5933) This reverts commit 62505857694ab8af2a88a22edf291e630c8c0cfd. | 22 April 2021, 16:29:01 UTC |
fb13fb0 | Dillon Sharlet | 21 April 2021, 22:10:32 UTC | Add mul_shift_right intrinsic and related improvements (#5916) * Add multiply_quantized intrinsic * clang-format * Fix build on some compilers. * Fix incorrect saturating_pmulhrs * multiply_quantized -> mul_shift_right * Remove workaround and just cast shift amounts. * Fix error message * Fix declaration of mul_shift_right. | 21 April 2021, 22:10:32 UTC |
6867005 | Shoaib Kamil | 21 April 2021, 19:06:50 UTC | Suppress Metal unused function warning (#5913) Co-authored-by: Steven Johnson <srj@google.com> | 21 April 2021, 19:06:50 UTC |
5dd85ae | Andrew Adams | 21 April 2021, 16:50:56 UTC | Let the user pass the Func to use to the reduction helpers (#5929) * Let the user pass the Func to use to the reduction helpers * Pass Funcs by const ref | 21 April 2021, 16:50:56 UTC |
17d4771 | Dillon Sharlet | 21 April 2021, 16:04:27 UTC | Update test to reflect behavior we expect. (#5928) | 21 April 2021, 16:04:27 UTC |
087567f | Dillon Sharlet | 21 April 2021, 16:04:09 UTC | Remove old codegen. LLVM rewrites this back to a multiply anyways. (#5930) | 21 April 2021, 16:04:09 UTC |
6250585 | Steven Johnson | 20 April 2021, 21:23:26 UTC | Temporarily disable hanging test (#5925) * Temporarily disable hanging test LLVM13 is causing vector_reductions to hang (https://reviews.llvm.org/D100099 appears to be the injection point). Disabling this test to unbreak the buildbots. * Update vector_reductions.cpp | 20 April 2021, 21:23:26 UTC |
c1de142 | Alexander Root | 20 April 2021, 21:21:33 UTC | [adams2019] Add caching to autoscheduler (#5697) * add feature caching and block caching to adams2019 autoscheduler * added caching verification for feautures * add caching docstrings | 20 April 2021, 21:21:33 UTC |
ac23987 | Dillon Sharlet | 20 April 2021, 15:02:14 UTC | Speed up simd_op_check by only compiling one pipeline per op (#5918) * Speed up simd_op_check and compute_with * Dense vector loads can be written many different ways. | 20 April 2021, 15:02:14 UTC |
6963673 | Dillon Sharlet | 20 April 2021, 00:24:06 UTC | Add Target::ARMv81a and improve shift instruction selection (#5917) * Add Target::ARMv81a and improve shift instruction selection. * Remove merge mistake. * Don't use ARM intrinsic on arm32, it seems to be missing sometimes. | 20 April 2021, 00:24:06 UTC |
493dbd4 | Steven Johnson | 17 April 2021, 17:46:20 UTC | Comment out specialiations for f64x2.convert_low_i32x4_s/u (#5914) LLVM removed the primitives we need (so our code can't be used), but it also doesn't seem to be generating the expected instructions directly (as claimed). Commenting out to un-break tests; issue has been reported to wasm/llvm team. | 17 April 2021, 17:46:20 UTC |
9cdb4aa | Andrew Adams | 16 April 2021, 22:23:30 UTC | Simplify and improve cuda_mat_mul schedule (#5909) * Simplify and improve cuda_mat_mul schedule | 16 April 2021, 22:23:30 UTC |
a41cce7 | Volodymyr Kysenko | 16 April 2021, 20:47:16 UTC | Basic support of predicated loads/stores in C++ backend (#5908) * Basic support of predicated load/stores in C++ backend * Fix formatting and maybe build * Fix * trigger buildbots Co-authored-by: Steven Johnson <srj@google.com> | 16 April 2021, 20:47:16 UTC |
3531167 | Steven Johnson | 15 April 2021, 18:34:37 UTC | Drop LLVM10 support from master (#5740) * Drop LLVM10 support from master Update build files to require LLVM11+ in master branch. (Since we only regularly test master with 12 and 13 this is conservative.) Remove all code that is specialized for LLVM < 11.0. * Update CodeGen_ARM.cpp * Update CodeGen_LLVM.cpp | 15 April 2021, 18:34:37 UTC |
780ebd2 | Zalman Stern | 15 April 2021, 16:19:21 UTC | Add an error for realize with a different number of outputs than defined for pipeline. (#5906) * Add an error for calling realize with a different number of outputs than the pipeline was compiled with. * Forgot to add test. * A readability scarifice to the clang deity. * Add CMake file. * Minor change to error text. * Fix logic to handle Funcs returning Tuples. * Formatting. | 15 April 2021, 16:19:21 UTC |
da02c0d | Jiawen (Kevin) Chen | 14 April 2021, 22:17:01 UTC | Add missing "struct" before halide_type_t. (#5904) This allows it to compile as pure C instead of C++. Co-authored-by: Jiawen Chen <jiawen@adobe.com> | 14 April 2021, 22:17:01 UTC |
ccde965 | Steven Johnson | 14 April 2021, 21:32:29 UTC | Enable some more wasm simd tests that are now working with top-of-tree LLVM. (#5903) | 14 April 2021, 21:32:29 UTC |
3ac277b | Dillon Sharlet | 14 April 2021, 17:23:02 UTC | Rewrite double and triple narrowing on ARM (#5896) * Rewrite double and triple narrowing on ARM. * clang-format. Co-authored-by: Steven Johnson <srj@google.com> | 14 April 2021, 17:23:02 UTC |
ce9b324 | Steven Johnson | 14 April 2021, 16:18:45 UTC | Fix UB in halide_buffer_t::size_in_bytes (#5898) Just a port of https://github.com/halide/Halide/pull/4389 to the equivalent methods in HalideRuntime.h, since offset-from-a-null-pointer is UB in C++. | 14 April 2021, 16:18:45 UTC |
1ff3e3f | Mario Emmenlauer | 12 April 2021, 20:50:02 UTC | CMake build: Add more user control (#5859) * packaging/CMakeLists.txt: Allow users to override RPATH (i.e. for packaging Halide) * CMakeLists.txt: Allow users to override the C++ standard | 12 April 2021, 20:50:02 UTC |
9cc17b4 | Alexander Root | 12 April 2021, 17:23:57 UTC | Add fuzzer to bounds_of_expr_in_scope + fix discovered overflow bugs (#5895) * add interval bounds fuzzer * correct overflow checks in bounds inference * catch uint32->int32 overflow in simplifier and revert bounds change | 12 April 2021, 17:23:57 UTC |
687c7d8 | Andrew Adams | 09 April 2021, 05:04:11 UTC | Use guarded versions of vars if they exist in bounds inference (#5890) | 09 April 2021, 05:04:11 UTC |
cf40bc8 | Steven Johnson | 08 April 2021, 17:14:26 UTC | Improve wasm_threads documentation (#5843) * Improve wasm_threads documentation * Update HalideRuntime.h | 08 April 2021, 17:14:26 UTC |
71b895e | Alex Reinking | 18 February 2021, 21:44:02 UTC | Fix existing presets (remove -O2 stuff, typos) | 07 April 2021, 22:51:53 UTC |
bd16b37 | Alex Reinking | 18 February 2021, 21:42:47 UTC | Add shebang line to autotune_loop.sh | 07 April 2021, 22:51:53 UTC |
efea7a2 | Alex Reinking | 18 February 2021, 21:42:22 UTC | Remove WITH_APPS from README_cmake.md | 07 April 2021, 22:51:53 UTC |
69011bc | Alex Reinking | 17 February 2021, 06:56:20 UTC | Fix spelling mistakes and Doxygen references | 07 April 2021, 22:51:53 UTC |
b9cd9f2 | Alex Reinking | 07 April 2021, 22:37:12 UTC | Require LLD_DIR in zip/package.bat (#5887) | 07 April 2021, 22:37:12 UTC |
79fd0c9 | Alex Reinking | 07 April 2021, 19:12:23 UTC | [cmake] Fix and reorganize warnings for building Halide (#5885) | 07 April 2021, 19:12:23 UTC |
7473402 | Steven Johnson | 07 April 2021, 02:23:25 UTC | error_run_with_large_stack_throws should compile without exceptions (#5884) (1) Some downstream environments compile C++ without exceptions by default; this won't compile on those. (2) We should check that assert-fail also errors out as expected. | 07 April 2021, 02:23:25 UTC |
ea76214 | Alex Reinking | 07 April 2021, 00:53:11 UTC | Improve ClangCL support by disabling, fixing warnings (#5876) Co-authored-by: Mario Emmenlauer <memmenlauer@biodataanalysis.de> | 07 April 2021, 00:53:11 UTC |
85816e4 | Andrew Adams | 06 April 2021, 20:56:47 UTC | Add explicit cast to remove ambiguous operator== (Fixes #5329) (#5879) | 06 April 2021, 20:56:47 UTC |
e877e5b | Steven Johnson | 06 April 2021, 16:27:09 UTC | Fix natural_vector_size for wasm 64-bit types (#5880) In the original spec, wasm-simd128 didn't have int64 or float64; the final spec adds these types, so this bit of code is outdated and incorrect. | 06 April 2021, 16:27:09 UTC |
7825d48 | Steven Johnson | 06 April 2021, 16:25:21 UTC | Enable i64x2 comparisons in simd_op_check (#5881) * Enable i64x2 comparisons in simd_op_check * More drive-by fixes | 06 April 2021, 16:25:21 UTC |
9944dda | Ming Yan | 06 April 2021, 16:21:21 UTC | Fix typos in tutorial lesson_08 (#5875) | 06 April 2021, 16:21:21 UTC |
525e246 | Volodymyr Kysenko | 03 April 2021, 01:29:55 UTC | Try to vectorize inner statement of else branch of likely (#5874) * Try to vectorize inner statement of scalarize * Extend test to check for other scalarized loop * Add more details to the comment * make format * Remove note | 03 April 2021, 01:29:55 UTC |
59a04e4 | Alex Reinking | 02 April 2021, 17:53:49 UTC | Use fibers to guarantee stack size on Windows (#5873) * Use fibers for lowering. * Move fibers to Util * Wrap compile_func call in call_with_stack_requirement * Rename call_with_stack_requirement -> run_with_large_stack * Appease clang_format * Add exception handling to run_with_large_stack * clang-format * Fix 32-bit? * Fix error wording for Makefile * Improve naming in run_with_large_stack | 02 April 2021, 17:53:49 UTC |
42092e3 | Zalman Stern | 01 April 2021, 19:13:01 UTC | Fix an issue in Halide's float16 compilation support. Add tests. (#5872) In EmulateFloat16Math.cpp, conversion from 32-bit float to 16-bit float could produce a NaN value when and infinity is correct. This is because for numbers larger than the exact infinity value, the mantissa could be non zero. Add tests to cover this case, and float16 infinities in general. Couple small style/comment cleanups. | 01 April 2021, 19:13:01 UTC |
cb78a6b | Steven Johnson | 31 March 2021, 22:30:55 UTC | Don't strip strict_float() from lets (#5871) * Don't strip strict_float() from lets Bug injected in #5856: the change in Simplify_Let.cpp was inadvertently stripping `strict_float()` calls that wrapped the RHS of a Let-expr, which can change results nontrivially in some cases. I don't think a new test for this fix is practical -- it would be a little fragile, as it would rely on the specifics of simplification that could change over time. As a drive-by, also added an explicit rule to Simplify_Call to ensure that strict_float(strict_float(x)) -> strict_float(x) in *all* cases. (The existing rule didn't do this in all cases.) | 31 March 2021, 22:30:55 UTC |
896b260 | Dillon Sharlet | 31 March 2021, 15:08:33 UTC | Add some not rules. (#5870) | 31 March 2021, 15:08:33 UTC |
3e59294 | Dillon Sharlet | 30 March 2021, 23:26:22 UTC | Add TailStrategy::Predicate (#5856) * Add TailStrategy::Predicate * Add some tests for TailStrategy::Predicate. * Fix missing override. * Fix comment. * Tweak target behavior. * Remove all heuristics * clang-format. * clang-tidy. * TailStrategy::GuardWithIf isn't always faster than scalar code :( * Use TailStrategy::Predicate in the predicated store/load test. * What is this test * Fix test bug. * Revert x86 behavior. * Move predicate to Internal namespace. * Recursively strip tags. * trigger buildbots * strip_tags -> unwrap_tags * Fix comment. Co-authored-by: Steven Johnson <srj@google.com> | 30 March 2021, 23:26:22 UTC |
7bbe2fd | Steven Johnson | 30 March 2021, 23:00:23 UTC | Add wasm support for int32->f64 and f32->f64 simd ops (#5863) * Add wasm support for int32->f64 and f32->f64 simd ops At top-of-tree LLVM, the wasm backend never seems to emit the vector version of these ops; pattern-match to target them specifically. | 30 March 2021, 23:00:23 UTC |
e7eec5c | Steven Johnson | 30 March 2021, 22:45:32 UTC | Add support for wasm dot-product instruction (#5861) * Add support for wasm dot-product instruction | 30 March 2021, 22:45:32 UTC |
f2143bf | Steven Johnson | 30 March 2021, 19:29:32 UTC | Add a way to set a GeneratorInput's type in code (#5868) * Add a way to set a GeneratorInput's type in code Currently, if you want to vary the type of a Generator's inputs or outputs, you have to specify the types in the makefile. This can be awkward for things with complex logic. This PR proposes adding a way to do this: a new `set_type()` method which can only be called from the rarely-used Generator::configure() method. It only allows setting the type for an input or output that has no type specified. I'm not 100% sure if this is a good idea, but for certain rare corner cases, it may be quite handy. (Note that extending this to allow specifying dimensions and/or array size in the same way might be handy, but is omitted from this PR.) * Update Generator.h * Also add set_dimensions, set_array_size | 30 March 2021, 19:29:32 UTC |
2dd7a6b | Thales Sabino | 30 March 2021, 19:00:21 UTC | Add support for AVX-512 VNNI saturating dot products (#5807) * Add support for AVX-512 VNNI saturating dot products This commit adds support to Intel VNNI saturating dot product instructions vpdpbuds and vpdpwssd This was accomplished by adding a new VectorReduce operation to perform the saturating_add and exposing a new inline reduction saturaring_sum. Users can then write RDom r(0, 4); f(x) = saturating_sum(i32(0), i16(i8(g(x + r)) * u8(h(x + r)))) bool override_associativity_test = true; int vector_width = 4; Var xo, xi; f.update() .split(x, xo, xi, vector_width) .atomic(override_associativity_test) .vectorize(r) .vectorize(xi); To lower the expression into a call to vpdpbuds. Note that override_associativity_test is set to true or halide will fail to prove the associativity of the saturating_add operation Add support for VectorReduce::SaturatingAdd in CodeGen_LLVM Code is correctly generated when no intrinsic is available to perform a saturating dot product. Add vpdpbusds,vpdpwssd tests to simd_op_check Test if the saturating dot product instructions are being generated for AVX512_SapphireRapids targets * Improve code according to report from clang-tidy * Make init_val a const ref since it only used that way inside saturating_sum * clang-format * Revert removal of clang-format tag in CodeGen_X86.cpp * Add SaturatingAdd case Monotonic VectorReduce visit * Bail out in Bounds when dealing with a SaturatingAdd VectorReduce * Move saturating_mul to Simplify_Internal.h so it can be used in Simplify_Exprs.cpp * Remove init_val from the saturating_sum inline reduction * Unconditionally override the associativity test in the simd_op_check tests * Remove annonymous namespace from saturating_mul utility Co-authored-by: Thales Sabino <thales@codeplay.com> | 30 March 2021, 19:00:21 UTC |
5b238e7 | Steven Johnson | 30 March 2021, 17:18:45 UTC | Use a varying seed for random test data in simd_op_check (#5864) * Use a varying seed for random test data in simd_op_check We currently use `123` as a hardcoded seed, so we may sometimes be getting lucky with test patterns that happen to match scalar and vector. Let's vary the seed in the same way we do for (eg) fuzz_simplify to slightly broaden test coverage. | 30 March 2021, 17:18:45 UTC |
602cbac | Shivam Gupta | 30 March 2021, 16:56:13 UTC | [NFC] LLVM trunk is now called main (#5866) Reference - https://foundation.llvm.org/docs/branch-rename/ | 30 March 2021, 16:56:13 UTC |
b7bc8e2 | Steven Johnson | 30 March 2021, 16:48:14 UTC | Add support for wasm-simd saturating-narrow ops. (#5854) * Add support for wasm-simd saturating-narrow ops. | 30 March 2021, 16:48:14 UTC |
e0461e9 | Steven Johnson | 30 March 2021, 03:01:20 UTC | Add support for i16x8.q15mulr_sat_s in wasm (#5853) * Add support for i16x8.q15mulr_sat_s in wasm Also, some drive-by clarifications to other wasm-simd instructions in simd_op_check -- some of the yet-to-be-implemented ones are of dubious use in Halide and may not be worth implementing. | 30 March 2021, 03:01:20 UTC |
07f880e | Steven Johnson | 29 March 2021, 23:37:23 UTC | Add support for pairwise_widening_add in wasm (#5850) * Add support for widening_mul in wasm | 29 March 2021, 23:37:23 UTC |