https://github.com/halide/Halide
- HEAD
- refs/heads/Halide_unsharp
- refs/heads/abadams/align_strided_const_loads
- refs/heads/abadams/alloca
- refs/heads/abadams/atomic_parallel_compiled_in
- refs/heads/abadams/atomic_vector_non_recursive
- refs/heads/abadams/averaging_tree
- refs/heads/abadams/avoid_name_mangling_in_cross_module_dependencies
- refs/heads/abadams/better_absd
- refs/heads/abadams/better_codegen_for_non_const_ramps
- refs/heads/abadams/bgu_cholesky
- refs/heads/abadams/braces_around_statements
- refs/heads/abadams/cache_tighten_producer_consumer_nodes
- refs/heads/abadams/check_reorder_dups
- refs/heads/abadams/clarify_broadcast_shuffle
- refs/heads/abadams/compositing_app
- refs/heads/abadams/cond_wait_spin
- refs/heads/abadams/cse_in_unroll_split_tuples
- refs/heads/abadams/custom_cuda_context
- refs/heads/abadams/custom_cuda_context_2
- refs/heads/abadams/custom_cuda_context_3
- refs/heads/abadams/d3d12abi
- refs/heads/abadams/deflake_mullapudi_reorder
- refs/heads/abadams/delete_prepare_for_early_exit
- refs/heads/abadams/depthwise_separable_conv
- refs/heads/abadams/diagnose_boundary_condition_failure
- refs/heads/abadams/disable_onnx_app_on_mac
- refs/heads/abadams/divide_using_pavgw
- refs/heads/abadams/dont_link_to_cudart
- refs/heads/abadams/dont_reinterpret_concat
- refs/heads/abadams/early_out
- refs/heads/abadams/enable_f16c
- refs/heads/abadams/extract_concat_bits
- refs/heads/abadams/fast_integer_divide_round_to_zero
- refs/heads/abadams/faster_runtime_integer_division
- refs/heads/abadams/faster_unroll
- refs/heads/abadams/fix-arm-seg2
- refs/heads/abadams/fix_4211
- refs/heads/abadams/fix_5323
- refs/heads/abadams/fix_5329
- refs/heads/abadams/fix_5889
- refs/heads/abadams/fix_6984
- refs/heads/abadams/fix_7229
- refs/heads/abadams/fix_7260
- refs/heads/abadams/fix_7365
- refs/heads/abadams/fix_7374
- refs/heads/abadams/fix_7504
- refs/heads/abadams/fix_7514
- refs/heads/abadams/fix_7531
- refs/heads/abadams/fix_7584
- refs/heads/abadams/fix_7584_v2
- refs/heads/abadams/fix_7742
- refs/heads/abadams/fix_7756
- refs/heads/abadams/fix_7761
- refs/heads/abadams/fix_7768
- refs/heads/abadams/fix_7786
- refs/heads/abadams/fix_7810
- refs/heads/abadams/fix_7811
- refs/heads/abadams/fix_7815
- refs/heads/abadams/fix_7867
- refs/heads/abadams/fix_7871
- refs/heads/abadams/fix_7872
- refs/heads/abadams/fix_7873
- refs/heads/abadams/fix_7888
- refs/heads/abadams/fix_7890
- refs/heads/abadams/fix_7891
- refs/heads/abadams/fix_7892
- refs/heads/abadams/fix_7893
- refs/heads/abadams/fix_7906
- refs/heads/abadams/fix_7909
- refs/heads/abadams/fix_7968
- refs/heads/abadams/fix_8038
- refs/heads/abadams/fix_8054
- refs/heads/abadams/fix_arm_fcvtmp
- refs/heads/abadams/fix_autoschedule_feature_transposition
- refs/heads/abadams/fix_cse_name_collisions
- refs/heads/abadams/fix_cuda_mat_mul_assert
- refs/heads/abadams/fix_deinterleave_bug
- refs/heads/abadams/fix_deinterleave_for_reinterpret
- refs/heads/abadams/fix_div_round_to_zero
- refs/heads/abadams/fix_fft_compile_time_regression
- refs/heads/abadams/fix_generate_output_snippets
- refs/heads/abadams/fix_if_nesting_condition
- refs/heads/abadams/fix_leaks_in_memoize_test
- refs/heads/abadams/fix_lgtm_warnings
- refs/heads/abadams/fix_links_to_master
- refs/heads/abadams/fix_load_of_broadcast
- refs/heads/abadams/fix_lossless_cast_of_sub
- refs/heads/abadams/fix_onnx_app
- refs/heads/abadams/fix_pointless_lower_condition
- refs/heads/abadams/fix_potential_gpu_deadlock
- refs/heads/abadams/fix_realize_condition_depends_on_tuple
- refs/heads/abadams/fix_reduce_expr_modulo_of_vector
- refs/heads/abadams/fix_riscv_vx_vi
- refs/heads/abadams/fix_round
- refs/heads/abadams/fix_stencil_chain_gpu_schedule
- refs/heads/abadams/fix_track_bounds_intervals
- refs/heads/abadams/fix_tutorial_2
- refs/heads/abadams/forward_partition_methods
- refs/heads/abadams/fully_fused_depthwise_separable_conv
- refs/heads/abadams/fuzz_sliding_window
- refs/heads/abadams/gaussian_blur_app
- refs/heads/abadams/generator_infinite_default_timeout
- refs/heads/abadams/gpu_autoscheduler_parallel_random_probes
- refs/heads/abadams/include_riscv_in_readme
- refs/heads/abadams/interleave_nested_vector
- refs/heads/abadams/ir_match_by_ref
- refs/heads/abadams/lerp_plus_cast
- refs/heads/abadams/local_laplacian_code_size
- refs/heads/abadams/lower_halving_sub
- refs/heads/abadams/lower_rounding_shift_right
- refs/heads/abadams/mac-arm-fixes
- refs/heads/abadams/make_fast_inverse_test_throughput_limited
- refs/heads/abadams/makefile_serialization_support
- refs/heads/abadams/mismatched_new_delete
- refs/heads/abadams/mixed_sign_mul_shift_right
- refs/heads/abadams/mixed_width_mul_shift_right
- refs/heads/abadams/multiple_scatter
- refs/heads/abadams/mux_intrinsic
- refs/heads/abadams/name_helpers
- refs/heads/abadams/narrow_predicates
- refs/heads/abadams/nested_vectorization_compile_time_regression_fix
- refs/heads/abadams/nested_vectorization_tweaks
- refs/heads/abadams/parallel_simd_op_check
- refs/heads/abadams/per_instance_profiling
- refs/heads/abadams/precompute_shared_mem_size
- refs/heads/abadams/prefer_no_gather
- refs/heads/abadams/print_uncaught_exception
- refs/heads/abadams/promote_fixed_point_intrinsics
- refs/heads/abadams/psabdw
- refs/heads/abadams/random_pipelines
- refs/heads/abadams/rationalize_gpu_for_loop_names
- refs/heads/abadams/reenable_unscheduled_stage_warning
- refs/heads/abadams/reinterpret_vector
- refs/heads/abadams/remove_arch_os_for_shaders
- refs/heads/abadams/remove_bad_pruning
- refs/heads/abadams/remove_parameter_self_references
- refs/heads/abadams/remove_readnone_on_functions
- refs/heads/abadams/remove_use_of_python_config_in_onnx_makefile
- refs/heads/abadams/reschedule_bgu
- refs/heads/abadams/reschedule_bilateral_grid
- refs/heads/abadams/rewrite_atomic_pass
- refs/heads/abadams/rounding_shift_right_use_average
- refs/heads/abadams/rungenmain_error
- refs/heads/abadams/sampling_profiler_overhead_v2
- refs/heads/abadams/scope_improvements
- refs/heads/abadams/simpler_broadcasts
- refs/heads/abadams/simplify_correlated_pyramid
- refs/heads/abadams/siotas_20
- refs/heads/abadams/sioutas_20
- refs/heads/abadams/slide_over_split_loop
- refs/heads/abadams/sorting_network_working_branch
- refs/heads/abadams/stable_topological_order
- refs/heads/abadams/string_view
- refs/heads/abadams/strip_asserts_last
- refs/heads/abadams/switch_stmt
- refs/heads/abadams/target_specific_lerp
- refs/heads/abadams/time_lowering_passes
- refs/heads/abadams/track_failedness_through_solver_lets
- refs/heads/abadams/turn_off_slp_vectorization_for_avx512
- refs/heads/abadams/tweak_unpack_buffers
- refs/heads/abadams/undo_pointless_widening
- refs/heads/abadams/unordered_blocks
- refs/heads/abadams/unsigned_demosaic
- refs/heads/abadams/update_makefile_for_llvm_19
- refs/heads/abadams/use_arm_for_runtime_triple
- refs/heads/abadams/use_pmaddubsw_for_downsample
- refs/heads/abadams/validate_gpu_schedules
- refs/heads/abadams/vector_reduce_hexagon_predicate
- refs/heads/abadams/vector_scan
- refs/heads/abadams/vst_type_fix
- refs/heads/abadams/widening_let_bug
- refs/heads/abadams/x86_avg
- refs/heads/abadams/zen4
- refs/heads/adadams/profile_allocator
- refs/heads/add_image_checks_after_bounds_inference_plus_new_rules
- refs/heads/add_outermost_to_extern
- refs/heads/add_vectorization_to_search_space
- refs/heads/aelphy/feature_cadence_changes
- refs/heads/aelphy/float_extracts
- refs/heads/align_loads_comment_fix
- refs/heads/alina-strided-store
- refs/heads/another_buffer_copy_fix
- refs/heads/arm_sve_redux
- refs/heads/ataei-block_asserts-codegen
- refs/heads/ataei-debug_info
- refs/heads/ataei-fix-pow
- refs/heads/ataei-gen_str_param
- refs/heads/ataei-implicit_lhs_vars
- refs/heads/ataei-onnx
- refs/heads/ataei-onnx_converter_update
- refs/heads/ataei-onnx_pybind
- refs/heads/ataei-resnet50_benchmarks
- refs/heads/ataei-standalone_autoscheduler
- refs/heads/ataei_lots_of_inputs
- refs/heads/auto_sched_benchmarks
- refs/heads/auto_sched_estimates
- refs/heads/auto_sched_inline
- refs/heads/auto_sched_test_notparallel
- refs/heads/autoschedule_top_down
- refs/heads/autoschedule_with_convnet
- refs/heads/autoscheduler_scalar_imageparam_fix
- refs/heads/backports/10.x
- refs/heads/backports/11.x
- refs/heads/backports/12.x
- refs/heads/backports/13.x
- refs/heads/balance_expressions
- refs/heads/bazel
- refs/heads/benchmarks
- refs/heads/blaze
- refs/heads/bounds_buffer_lets_fix
- refs/heads/bounds_correct_vs_bounds_loaded_reduced
- refs/heads/buffer_device_api_target
- refs/heads/bug_device_free
- refs/heads/bug_inline_unbounded
- refs/heads/build/fix-xcode-2
- refs/heads/build/manylinux-fixes
- refs/heads/circ_buffer
- refs/heads/cmake-no-runtime-debug-symbols
- refs/heads/cmake/asan
- refs/heads/cmake/deps-cleanup
- refs/heads/cmake/find-modules
- refs/heads/cmake/spirv
- refs/heads/cmake_wasm_features
- refs/heads/compute_at_guard_with_if_goes_on_stack
- refs/heads/compute_with_at
- refs/heads/compute_with_check
- refs/heads/compute_with_excessive_bounds
- refs/heads/compute_with_inlined
- refs/heads/compute_with_remove_is_right_level
- refs/heads/cpack/nuget
- refs/heads/ctest/wrappers
- refs/heads/cuda-constant
- refs/heads/d3d12-allocation-cache
- refs/heads/deferred_cse_after_inlining
- refs/heads/destructor_calls_deinit
- refs/heads/dg/deserialize_unmapped_objects
- refs/heads/dg/fix_vulkan_codegen_bool_conversion
- refs/heads/dg/vulkan_conform_api
- refs/heads/dg/vulkan_region_allocator_fixes
- refs/heads/dgerstmann/fix-vulkan-memory-config-init
- refs/heads/disable_acquire_release_test_vulkan
- refs/heads/distinct_wrapper_names
- refs/heads/dkg/6863_asan_fixes
- refs/heads/dkg/vulkan
- refs/heads/dpalermo_dmabuf
- refs/heads/dpalermo_dmabuf_libion
- refs/heads/dpalermo_hexagon_remote_202003
- refs/heads/dpalermo_sdk4_2_0_2
- refs/heads/ds/buffer-get-pure
- refs/heads/ds/opt-tile-size
- refs/heads/ds/tail-none
- refs/heads/ds/while
- refs/heads/dsharletg/bitwise-intrinsics
- refs/heads/dsharletg/find-vector-reduce
- refs/heads/dsharletg/jit-optimization
- refs/heads/dsharletg/memcpy-copy_from
- refs/heads/dsharletg/pattern-headroom
- refs/heads/dsharletg/refactor-host-alignment
- refs/heads/dsharletg/runtime-size
- refs/heads/dsharletg/simplify-abs
- refs/heads/dsharletg/simplify-type-bounds
- refs/heads/dsharletg/specialize-bounds
- refs/heads/dsharletg/upsample-channels
- refs/heads/empty_prefetch
- refs/heads/emscripten_vector_fix
- refs/heads/export_all-wsmoses
- refs/heads/expr_auto_sched
- refs/heads/extern_bugs
- refs/heads/extern_host_alloc
- refs/heads/factor_parallel_codegen_hack
- refs/heads/fast_sync_tsan
- refs/heads/faster_integer_division
- refs/heads/feature/apps-external
- refs/heads/feature/cmake-presets
- refs/heads/feature/convert
- refs/heads/feature/f16_interleave
- refs/heads/feature/gather_load_q7
- refs/heads/feature/llvm-codemodel
- refs/heads/feature/load_predicated
- refs/heads/feature/luma_regression
- refs/heads/feature/maintanence
- refs/heads/feature/reinterprets
- refs/heads/feature/tcm_bump_allocator
- refs/heads/feature/xtensa_fix_interleave_q8
- refs/heads/feature/xtensa_q8_tests
- refs/heads/find_intrinsics_issue
- refs/heads/find_intrinsics_widening_lets
- refs/heads/fix-floated-pure-stage
- refs/heads/fix-race-condition
- refs/heads/fix_hexagon_alignment
- refs/heads/fix_hvx_intrinsics
- refs/heads/fix_prefetch_test
- refs/heads/fix_windows_vs15_build
- refs/heads/fixed_length_vectors
- refs/heads/fixed_point_local_laplac
- refs/heads/gemmlowp
- refs/heads/generate
- refs/heads/gha/pip
- refs/heads/gpu_canon_fix
- refs/heads/halide_ir_flatbuffer
- refs/heads/hex_dma2_async
- refs/heads/hexagon_le_runtime
- refs/heads/hexagon_priority
- refs/heads/hexagon_setpriority
- refs/heads/hexagon_strided_pred_load
- refs/heads/hexagon_sysmon_markers
- refs/heads/imaging-synthesis
- refs/heads/includes_fix
- refs/heads/ios_fast_sync_fix
- refs/heads/jia-kai-fix-runtime-cuda-init
- refs/heads/kamil-openglcompute-infinity
- refs/heads/kamil/name_pthread_workers
- refs/heads/kp_bit_shift
- refs/heads/line_buffer
- refs/heads/loop_carry_not_working
- refs/heads/lower_on_huge_stack
- refs/heads/main
- refs/heads/master
- refs/heads/memoize_with_extents
- refs/heads/metal_float16
- refs/heads/metaprogrammed_simplifier_mod
- refs/heads/mohamedadaly-vmlal
- refs/heads/more_powerful_sliding
- refs/heads/new_autoschedule_with_new_simplifier_arm_worker_branch
- refs/heads/new_autoscheduler
- refs/heads/new_simplifier_rule_testing
- refs/heads/newer_ion_ioctl
- refs/heads/no_bounds_query_when_bounds_used
- refs/heads/opengl_compute_buffer_types_fix
- refs/heads/openglcompute_reuse_shared_allocations
- refs/heads/optmize_reorder
- refs/heads/par_for_opt
- refs/heads/pdb/fix_7806
- refs/heads/pdb/hexagon_remote_cmake
- refs/heads/pdb_add_libcpp_makefile_inc
- refs/heads/pdb_eliminate_interleaves_test
- refs/heads/pdb_fix_clang_build
- refs/heads/pdb_fix_install_qc
- refs/heads/pdb_fix_loop_carry
- refs/heads/pdb_fix_simd_op_check_hvx
- refs/heads/pdb_mul_div_mod_multi_thread
- refs/heads/pdb_remove_hvx_v64
- refs/heads/perform_inline_with_order
- refs/heads/pr/2572
- refs/heads/pr/2676
- refs/heads/pr/2975
- refs/heads/pr/3017
- refs/heads/pr/3081
- refs/heads/pr/3387
- refs/heads/pr/3939
- refs/heads/pr/3960
- refs/heads/pr/4380
- refs/heads/pr/4414
- refs/heads/pr/5331
- refs/heads/pr/5438
- refs/heads/pr/5455
- refs/heads/pr/5758_2
- refs/heads/predicated_vector
- refs/heads/prefetch_specialize
- refs/heads/print_schedule
- refs/heads/profile_hardware_counters
- refs/heads/random-pipelines
- refs/heads/rdom_with_pure_vars
- refs/heads/readme-fix-gcd
- refs/heads/realization_order
- refs/heads/refactor_module
- refs/heads/register_promotion
- refs/heads/release/10.x
- refs/heads/release/11.x
- refs/heads/release/12.x
- refs/heads/release/13.x
- refs/heads/release/14.x
- refs/heads/release/15.x
- refs/heads/release/16.x
- refs/heads/release/17.x
- refs/heads/release/8.x
- refs/heads/remove_max_on_fuse_factor
- refs/heads/reorder_rvar
- refs/heads/reset_unique_counter
- refs/heads/revert-3612-ataei-speedup_compiletime
- refs/heads/revert-7009-rootjalex/distribute-w_shl
- refs/heads/revert-7601-compile_hexagon_remote
- refs/heads/riscv_update
- refs/heads/rl_simplifier_rules
- refs/heads/rootjalex/add_simpl_rules
- refs/heads/rootjalex/arm-optimize
- refs/heads/rootjalex/autoscheduler_mcts
- refs/heads/rootjalex/bounds-rewriter
- refs/heads/rootjalex/bounds_synthesis
- refs/heads/rootjalex/cbounds
- refs/heads/rootjalex/cbounds_predicated
- refs/heads/rootjalex/fix-sat-overflow
- refs/heads/rootjalex/fix_estimate_issue
- refs/heads/rootjalex/fix_failed_unrolls
- refs/heads/rootjalex/gsoc_codegen
- refs/heads/rootjalex/improve_cbounds_fixed
- refs/heads/rootjalex/improve_constant_bounds
- refs/heads/rootjalex/pitchfork-arm
- refs/heads/rootjalex/reinterpret-simplify
- refs/heads/rootjalex/rts
- refs/heads/rootjalex/super_simplify_bounds
- refs/heads/rootjalex/test_cbounds_fixed
- refs/heads/rootjalex/test_constant_bounds
- refs/heads/rootjalex/trs-codegen
- refs/heads/rootjalex/trs-codegen-cross
- refs/heads/rootjalex/trs-merge
- refs/heads/rootjalex/uint32-int32-cast
- refs/heads/rootjalex/x86-hadds
- refs/heads/rootjalex/x86-optimize
- refs/heads/rootjalex/x86-optimize-test
- refs/heads/rootjalex/x86-sat
- refs/heads/rootjalex/x86-test
- refs/heads/rule_removal_experiments
- refs/heads/schedule-output-storage
- refs/heads/separate_bounds_query_entrypoint
- refs/heads/shallow
- refs/heads/shift_amount_type_change
- refs/heads/shoaibkamil/cmake-without-arm
- refs/heads/shoaibkamil/correct_memory_fences
- refs/heads/shoaibkamil/d3d-fixes
- refs/heads/shoaibkamil/deprecate_openglcompute
- refs/heads/shoaibkamil/json
- refs/heads/shoaibkamil/llvm_clone_tag
- refs/heads/shoaibkamil/minor-vcpkg-doc-change
- refs/heads/shoaibkamil/opengl_compute_tests
- refs/heads/shoaibkamil/performance_tests_as_generators
- refs/heads/shoaibkamil/rule_removal_experiments
- refs/heads/shoaibkamil/super_simplify_with_interpreter
- refs/heads/shoaibkamil/windows-arm-fix-attributes
- refs/heads/sim_shlib_addr_print
- refs/heads/simplify-nested-broadcasts
- refs/heads/simplify-vectorreduce-shuffles2
- refs/heads/simplify_mod
- refs/heads/sioutas_2020
- refs/heads/sioutas_2020_autoscheduler
- refs/heads/slomp/gpu-codegen-profiling
- refs/heads/slomp/msvc-static-analysis
- refs/heads/solve_div
- refs/heads/solve_div_master
- refs/heads/solve_div_simplifier_test
- refs/heads/sr/python-late-binding-defaults
- refs/heads/srj-aaa
- refs/heads/srj-alloc
- refs/heads/srj-alloca
- refs/heads/srj-appmake2
- refs/heads/srj-armv83a
- refs/heads/srj-aslog
- refs/heads/srj-assert
- refs/heads/srj-assoc
- refs/heads/srj-auto-multi
- refs/heads/srj-auto-multi2
- refs/heads/srj-auto_schedule_mat_mul
- refs/heads/srj-autosched
- refs/heads/srj-b2cpphide
- refs/heads/srj-barr
- refs/heads/srj-bits
- refs/heads/srj-blacklist
- refs/heads/srj-bounds
- refs/heads/srj-bufcalltype
- refs/heads/srj-bufcallwrap
- refs/heads/srj-bufcallwrap2
- refs/heads/srj-buffer
- refs/heads/srj-bv
- refs/heads/srj-classic-autotune
- refs/heads/srj-clean
- refs/heads/srj-constcall
- refs/heads/srj-crosscompile
- refs/heads/srj-ctlz
- refs/heads/srj-cvec-patch
- refs/heads/srj-dag
- refs/heads/srj-debug-to-file
- refs/heads/srj-deir
- refs/heads/srj-f16
- refs/heads/srj-fp16
- refs/heads/srj-fsch
- refs/heads/srj-fthru
- refs/heads/srj-g2
- refs/heads/srj-g3
- refs/heads/srj-gha-test-fixes
- refs/heads/srj-hidden
- refs/heads/srj-hide2
- refs/heads/srj-hvx
- refs/heads/srj-hvx-bug
- refs/heads/srj-hvx-codegen-bug
- refs/heads/srj-hvx-nocopy
- refs/heads/srj-hvxshift
- refs/heads/srj-iib
- refs/heads/srj-initshape
- refs/heads/srj-inv
- refs/heads/srj-ir
- refs/heads/srj-irmut2
- refs/heads/srj-iwyu
- refs/heads/srj-iwyu3
- refs/heads/srj-javascript_work_in_progress
- refs/heads/srj-lensblur
- refs/heads/srj-lessinc
- refs/heads/srj-llvm-loop-opt
- refs/heads/srj-mak
- refs/heads/srj-maxthreads
- refs/heads/srj-mod
- refs/heads/srj-msan
- refs/heads/srj-msan-call
- refs/heads/srj-muldivmod
- refs/heads/srj-mut
- refs/heads/srj-outputs-2
- refs/heads/srj-parse
- refs/heads/srj-pch
- refs/heads/srj-printfunc
- refs/heads/srj-pygp
- refs/heads/srj-revertbits
- refs/heads/srj-schedule-storage
- refs/heads/srj-shl-shr-2
- refs/heads/srj-sio
- refs/heads/srj-static-const
- refs/heads/srj-strided-store
- refs/heads/srj-tidyh
- refs/heads/srj-tiff
- refs/heads/srj-trace
- refs/heads/srj-tutorial
- refs/heads/srj-using
- refs/heads/srj-wasmfix
- refs/heads/srj-xor2
- refs/heads/srj/abstract-gen-without-get-output-func-KEEP
- refs/heads/srj/aligned-alloc
- refs/heads/srj/aligned-alloc-2
- refs/heads/srj/aligned-malloc-with-aligned-alloc
- refs/heads/srj/all-explicit-ctor
- refs/heads/srj/anderson-thread-info-ptr
- refs/heads/srj/aot-perf
- refs/heads/srj/argv-signatures
- refs/heads/srj/argv-types
- refs/heads/srj/async-test
- refs/heads/srj/b2cpp-const-data
- refs/heads/srj/better-xt-dispatch
- refs/heads/srj/bfloat1
- refs/heads/srj/bp
- refs/heads/srj/build_halide_h
- refs/heads/srj/c-bool
- refs/heads/srj/cache-clear
- refs/heads/srj/clang-fmt-ignore
- refs/heads/srj/clang-tidy
- refs/heads/srj/clear-c-cache
- refs/heads/srj/cmake-asan
- refs/heads/srj/cmake-asan2
- refs/heads/srj/cmake-jit-generators
- refs/heads/srj/configure-cmake
- refs/heads/srj/cpp-generator-v2-experiment-KEEP
- refs/heads/srj/crosscompile
- refs/heads/srj/ctad
- refs/heads/srj/depr
- refs/heads/srj/deprecation
- refs/heads/srj/device-copy
- refs/heads/srj/example
- refs/heads/srj/experiment
- refs/heads/srj/experiment-6967
- refs/heads/srj/exporting
- refs/heads/srj/expr_t
- refs/heads/srj/external-tensors
- refs/heads/srj/fix-pytorch
- refs/heads/srj/fixed-rollback
- refs/heads/srj/fopen-fix
- refs/heads/srj/forward
- refs/heads/srj/forward-name
- refs/heads/srj/gen-func
- refs/heads/srj/gen-func-2
- refs/heads/srj/gen-func-3
- refs/heads/srj/gen2-1
- refs/heads/srj/gen_closure
- refs/heads/srj/generator_aot_gpu_multi_context_threaded
- refs/heads/srj/globals
- refs/heads/srj/halide-buffer-crop
- refs/heads/srj/halide-malloc-alignment
- refs/heads/srj/halide-must-use
- refs/heads/srj/halide-runtime-must-use-result
- refs/heads/srj/hang-repro
- refs/heads/srj/hannk
- refs/heads/srj/hannk-aliasing
- refs/heads/srj/hannk-error-checking
- refs/heads/srj/hannk-errors
- refs/heads/srj/hannk-inplace
- refs/heads/srj/hannk-mmap
- refs/heads/srj/hannk-tflite-27
- refs/heads/srj/hannk-verbosity
- refs/heads/srj/hdrs
- refs/heads/srj/html-becomes-viz
- refs/heads/srj/implicit-mult-widening
- refs/heads/srj/issue-7076
- refs/heads/srj/iwyu
- refs/heads/srj/iwyu-2
- refs/heads/srj/iwyu-6
- refs/heads/srj/libHANNK
- refs/heads/srj/llvm_type_of
- refs/heads/srj/maybe-unused
- refs/heads/srj/meanop
- refs/heads/srj/metadata-calling-convention
- refs/heads/srj/more-tidy
- refs/heads/srj/msan-dtf
- refs/heads/srj/multimeta
- refs/heads/srj/nanobind
- refs/heads/srj/new-rt-1
- refs/heads/srj/no-threadpool
- refs/heads/srj/no-timeout-thread
- refs/heads/srj/oglc-mutexed
- refs/heads/srj/param-map
- refs/heads/srj/pip-15.x
- refs/heads/srj/pip-cron
- refs/heads/srj/possible-uninited
- refs/heads/srj/pr-7566
- refs/heads/srj/printer-size
- refs/heads/srj/profiler-data-race
- refs/heads/srj/ptr-int-cast
- refs/heads/srj/pyapps
- refs/heads/srj/pyext-fix
- refs/heads/srj/pygen-class
- refs/heads/srj/pygen-deux
- refs/heads/srj/pygen-func
- refs/heads/srj/pygen-native-types
- refs/heads/srj/pyinstall
- refs/heads/srj/pypi-try
- refs/heads/srj/pystuff
- refs/heads/srj/python-buffer-unpack
- refs/heads/srj/python-tutorial
- refs/heads/srj/reshape
- refs/heads/srj/rt-error-smallify
- refs/heads/srj/rt-return-types
- refs/heads/srj/runtime-error-handling
- refs/heads/srj/sat-fixes-exp
- refs/heads/srj/sat-fixes-exp-2
- refs/heads/srj/shadow-field
- refs/heads/srj/snprintf
- refs/heads/srj/spirv-license
- refs/heads/srj/stat-buf-deprecations
- refs/heads/srj/static-buffer-generators
- refs/heads/srj/stmt-html
- refs/heads/srj/stringify
- refs/heads/srj/synth-gen-params
- refs/heads/srj/synth-params-python
- refs/heads/srj/test-arm_sve_redux
- refs/heads/srj/test-intrinsics-bounds
- refs/heads/srj/test8076
- refs/heads/srj/test8078
- refs/heads/srj/test8094
- refs/heads/srj/test8105a
- refs/heads/srj/test8115
- refs/heads/srj/test_tmpdir_fix
- refs/heads/srj/tidy
- refs/heads/srj/tidy-format-14
- refs/heads/srj/tidymore
- refs/heads/srj/tidymore2
- refs/heads/srj/tls
- refs/heads/srj/tls-3
- refs/heads/srj/tls-4
- refs/heads/srj/tls-ucon
- refs/heads/srj/tmp-unschedule-experiment
- refs/heads/srj/tot-fix
- refs/heads/srj/try-revert-sat
- refs/heads/srj/type-traits
- refs/heads/srj/typed-func
- refs/heads/srj/ucon-all-const
- refs/heads/srj/ucon-non-const
- refs/heads/srj/visit-warnings
- refs/heads/srj/wasm-atomic2
- refs/heads/srj/wasm-simd
- refs/heads/srj/wasm-stuff
- refs/heads/srj/wasm-threads
- refs/heads/srj/wasm-updates
- refs/heads/srj/wasm-work
- refs/heads/srj/wip
- refs/heads/srj/x-rounding
- refs/heads/srj/xbuf
- refs/heads/srj/xc+plus+size+tmp
- refs/heads/srj/xc-types
- refs/heads/srj/xt-uint-cast-test
- refs/heads/srj/xtensa-arch
- refs/heads/srj/xtensa-merge
- refs/heads/srj/xvc-experimetn
- refs/heads/srj/zlib-embed
- refs/heads/standalone_autoscheduler
- refs/heads/standalone_autoscheduler_arm_worker
- refs/heads/standalone_autoscheduler_arm_worker_amazon
- refs/heads/standalone_autoscheduler_gpu
- refs/heads/standalone_autoscheduler_hexagon
- refs/heads/sticky_task_assignments
- refs/heads/store_with
- refs/heads/store_with_solver_for_super_simplify
- refs/heads/strict_float_cse_fix
- refs/heads/super_simplify
- refs/heads/super_simplify_v2
- refs/heads/super_simplify_v3
- refs/heads/transitive_wrapper
- refs/heads/trigger-release-v16
- refs/heads/tzumao-autodiff-boundarycond
- refs/heads/tzumao-gradient-autoscheduler-bug
- refs/heads/tzumao-predicate-store-load
- refs/heads/tzumao-python-buffer
- refs/heads/tzumao_autodiff_unbounded
- refs/heads/tzumao_improve_gradient_autoscheduler
- refs/heads/tzumao_issue_4297
- refs/heads/tzumao_licm_before_BI
- refs/heads/unbounded_bugs
- refs/heads/undo_async_copy_chain_black_list
- refs/heads/use_string_literals_for_blobs
- refs/heads/users/lukas/python-pip
- refs/heads/validate_sched_error_msg
- refs/heads/var_ir_fix
- refs/heads/vksnk/async-experiment
- refs/heads/vksnk/async-multiple-producers
- refs/heads/vksnk/async-order
- refs/heads/vksnk/better-loop-carry
- refs/heads/vksnk/better-message
- refs/heads/vksnk/bound-storage
- refs/heads/vksnk/bounds-widen-right
- refs/heads/vksnk/c-print-type
- refs/heads/vksnk/c-round
- refs/heads/vksnk/check-return-result
- refs/heads/vksnk/compute-with-bug
- refs/heads/vksnk/compute_with_async
- refs/heads/vksnk/dma-limit-channels
- refs/heads/vksnk/dma-min-max
- refs/heads/vksnk/expr-match-shuffle
- refs/heads/vksnk/extract-from-scalar
- refs/heads/vksnk/f16-load
- refs/heads/vksnk/fix-packvr
- refs/heads/vksnk/fix_halide_xtensa_narrow_with_rounding_shift_i16
- refs/heads/vksnk/fused-compute-with
- refs/heads/vksnk/hoist-storage-bug
- refs/heads/vksnk/lerp-intrinsics
- refs/heads/vksnk/lower-signed-shifts
- refs/heads/vksnk/missing-exception
- refs/heads/vksnk/non-widening-halves
- refs/heads/vksnk/optimize-shuffles
- refs/heads/vksnk/replace-all
- refs/heads/vksnk/restrict
- refs/heads/vksnk/roll-buffer
- refs/heads/vksnk/roundeven-arm
- refs/heads/vksnk/rvar-bounds
- refs/heads/vksnk/simplify-slice
- refs/heads/vksnk/skip-semaphores
- refs/heads/vksnk/storage-folding
- refs/heads/vksnk/strided-load-of-4_2
- refs/heads/vksnk/typed-scope
- refs/heads/vksnk/update-simd-driver
- refs/heads/vksnk/vectorize-bug
- refs/heads/vksnk/vectorize-scalarize
- refs/heads/vksnk/widening_absd
- refs/heads/vksnk/xtensa-codegen-fp16
- refs/heads/vksnk/xtensa-dma-improvements
- refs/heads/vksnk/xtensa-regroup-pass
- refs/heads/vksnk/xtensa/lift-allocs
- refs/heads/vulkan
- refs/heads/vulkan-diagnose-alloc-failures
- refs/heads/vulkan-phase0-adts
- refs/heads/vulkan-phase1-spirv
- refs/heads/vulkan-phase2-runtime
- refs/heads/vulkan2
- refs/heads/vulkan_fix_gpu_dynamic_shared_test
- refs/heads/vulkan_fix_subregion_memory_offsets
- refs/heads/webassembly-old
- refs/heads/winograd
- refs/heads/wording_fix
- refs/heads/xtensa-codegen
- refs/heads/xtensa-codegen-parallel
- refs/heads/xuanda/fix-serialize-bad-partition-always
- refs/remotes/origin/rootjalex/add_autosched_caching
- refs/tags/release_2018_02_15
- refs/tags/release_2019_08_27
- refs/tags/release_8.0.0
- refs/tags/v10.0.0
- refs/tags/v10.0.1
- refs/tags/v11.0.0
- refs/tags/v11.0.1
- refs/tags/v12.0.0
- refs/tags/v12.0.1
- refs/tags/v13.0.0
- refs/tags/v13.0.1
- refs/tags/v13.0.2
- refs/tags/v13.0.3
- refs/tags/v13.0.4
- refs/tags/v14.0.0
- refs/tags/v15.0.0
- refs/tags/v15.0.1
- refs/tags/v16.0.0
- refs/tags/v17.0.0
- refs/tags/v17.0.1
- refs/tags/v8.0.0
Take a new snapshot of a software origin
If the archived software origin currently browsed is not synchronized with its upstream version (for instance when new commits have been issued), you can explicitly request Software Heritage to take a new snapshot of it.
Use the form below to proceed. Once a request has been submitted and accepted, it will be processed as soon as possible. You can then check its processing state by visiting this dedicated page.Processing "take a new snapshot" request ...
Permalinks
To reference or cite the objects present in the Software Heritage archive, permalinks based on SoftWare Hash IDentifiers (SWHIDs) must be used.
Select below a type of object currently browsed in order to display its associated SWHID and permalink.
Revision | Author | Date | Message | Commit Date |
---|---|---|---|---|
078254f | Dillon Sharlet | 04 May 2021, 21:36:50 UTC | Use memcpy to implement copy_from | 04 May 2021, 21:36:50 UTC |
f45d323 | Andrew Adams | 04 May 2021, 16:32:56 UTC | Non-widening lowering of rounding shifts (#5956) This version lowers it without needing to widen, which is a large win on x86 for 16 and 32-bit types (3.8x faster and 2.8x faster respectively). It's a very slight slowdown for 8-bit because x86 doesn't have 8-bit shift instructions. Also drive-by typo fix. | 04 May 2021, 16:32:56 UTC |
94c0eca | Dillon Sharlet | 04 May 2021, 00:17:39 UTC | Use dot products for sums. (#5954) | 04 May 2021, 00:17:39 UTC |
5a0d1e5 | Volodymyr Kysenko | 03 May 2021, 16:34:37 UTC | Support VectorReduce in CodeGen_C (#5952) | 03 May 2021, 16:34:37 UTC |
8b9deea | Dillon Sharlet | 30 April 2021, 20:56:30 UTC | Fix bugs when D != 4 (#5951) * Fix bugs when D != 4 * clang-format | 30 April 2021, 20:56:30 UTC |
093e8df | Fangrui Song | 29 April 2021, 22:58:43 UTC | Replace llvm::sys::fs::F_None with llvm::sys::fs::OF_None (#5946) The former is deprecated. | 29 April 2021, 22:58:43 UTC |
fcbd2ee | Dillon Sharlet | 27 April 2021, 23:52:57 UTC | Fix build issue in runtime. (#5944) | 27 April 2021, 23:52:57 UTC |
a391e9a | AbdouTlili | 27 April 2021, 23:13:06 UTC | adding a note in the README.md to use -j option in make --build (#5938) * adding a note in the README.md to use -j option in make --build * wrapped the added section to 80 column | 27 April 2021, 23:13:06 UTC |
5a69e9f | Dillon Sharlet | 26 April 2021, 21:12:53 UTC | Fix flattening of ramps involving 64-bit mins (#5940) * Fix flattening of ramps involving 64-bit mins. * Use make_const instead of cast. | 26 April 2021, 21:12:53 UTC |
91e42f4 | Steven Johnson | 26 April 2021, 20:10:21 UTC | Don't use as_const_int() on temporaries (#5939) Sometimes we get lucky and it's still valid, but it's always wrong. | 26 April 2021, 20:10:21 UTC |
1b3cbcb | aankit-ca | 26 April 2021, 17:55:12 UTC | [Hexagon] Try vdelta/vrdelta before vlut for some shuffles. (#5935) The patch tries to generate vdelta/vrdelta instructions for non-ramp shuffles. Eg: shuffle(lut_expr, < 0, 1, 2, 3, 4, 5, 6, 7, 9, 10, 11, 12, 13, 14, 15, 16, 18, 19, 20, 21, 22, 23, 24, 25, 27, 28, 29, 30, 31, 32, 33, 34, 36, 37, 38, 39, 40, 41, 42, 43, 45, 46, 47, 48, 49, 50, 51, 52, 54, 55, 56, 57, 58, 59, 60, 61, 63, 64, 65, 66, 67, 68, 69, 70>) can be generated using vrdelta. The patch also fixes a bug where we bitcast vdelta/vrdelta with 16/32 bits elements to wrong type. User would see the below error: llvm-project/llvm/lib/IR/Instructions.cpp:2905: static llvm::CastInst *llvm::CastInst::Create(Instruction::CastOps, llvm::Value *, llvm::Type *, const llvm::Twine &, llvm::Instruction *): Assertion `castIsValid(op, S, Ty) && "Invalid cast!"' failed. Co-authored-by: Ankit Aggarwal <aankit@quicinc.com> | 26 April 2021, 17:55:12 UTC |
ba89623 | Shivam Gupta | 23 April 2021, 16:19:40 UTC | Small Typo fix in lesson 06 (#5936) Signed-off-by: xgupta <shivam98.tkg@rediffmail.com> | 23 April 2021, 16:19:40 UTC |
a407acd | Steven Johnson | 22 April 2021, 16:29:01 UTC | Revert "Temporarily disable hanging test (#5925)" (#5933) This reverts commit 62505857694ab8af2a88a22edf291e630c8c0cfd. | 22 April 2021, 16:29:01 UTC |
fb13fb0 | Dillon Sharlet | 21 April 2021, 22:10:32 UTC | Add mul_shift_right intrinsic and related improvements (#5916) * Add multiply_quantized intrinsic * clang-format * Fix build on some compilers. * Fix incorrect saturating_pmulhrs * multiply_quantized -> mul_shift_right * Remove workaround and just cast shift amounts. * Fix error message * Fix declaration of mul_shift_right. | 21 April 2021, 22:10:32 UTC |
6867005 | Shoaib Kamil | 21 April 2021, 19:06:50 UTC | Suppress Metal unused function warning (#5913) Co-authored-by: Steven Johnson <srj@google.com> | 21 April 2021, 19:06:50 UTC |
5dd85ae | Andrew Adams | 21 April 2021, 16:50:56 UTC | Let the user pass the Func to use to the reduction helpers (#5929) * Let the user pass the Func to use to the reduction helpers * Pass Funcs by const ref | 21 April 2021, 16:50:56 UTC |
17d4771 | Dillon Sharlet | 21 April 2021, 16:04:27 UTC | Update test to reflect behavior we expect. (#5928) | 21 April 2021, 16:04:27 UTC |
087567f | Dillon Sharlet | 21 April 2021, 16:04:09 UTC | Remove old codegen. LLVM rewrites this back to a multiply anyways. (#5930) | 21 April 2021, 16:04:09 UTC |
6250585 | Steven Johnson | 20 April 2021, 21:23:26 UTC | Temporarily disable hanging test (#5925) * Temporarily disable hanging test LLVM13 is causing vector_reductions to hang (https://reviews.llvm.org/D100099 appears to be the injection point). Disabling this test to unbreak the buildbots. * Update vector_reductions.cpp | 20 April 2021, 21:23:26 UTC |
c1de142 | Alexander Root | 20 April 2021, 21:21:33 UTC | [adams2019] Add caching to autoscheduler (#5697) * add feature caching and block caching to adams2019 autoscheduler * added caching verification for feautures * add caching docstrings | 20 April 2021, 21:21:33 UTC |
ac23987 | Dillon Sharlet | 20 April 2021, 15:02:14 UTC | Speed up simd_op_check by only compiling one pipeline per op (#5918) * Speed up simd_op_check and compute_with * Dense vector loads can be written many different ways. | 20 April 2021, 15:02:14 UTC |
6963673 | Dillon Sharlet | 20 April 2021, 00:24:06 UTC | Add Target::ARMv81a and improve shift instruction selection (#5917) * Add Target::ARMv81a and improve shift instruction selection. * Remove merge mistake. * Don't use ARM intrinsic on arm32, it seems to be missing sometimes. | 20 April 2021, 00:24:06 UTC |
493dbd4 | Steven Johnson | 17 April 2021, 17:46:20 UTC | Comment out specialiations for f64x2.convert_low_i32x4_s/u (#5914) LLVM removed the primitives we need (so our code can't be used), but it also doesn't seem to be generating the expected instructions directly (as claimed). Commenting out to un-break tests; issue has been reported to wasm/llvm team. | 17 April 2021, 17:46:20 UTC |
9cdb4aa | Andrew Adams | 16 April 2021, 22:23:30 UTC | Simplify and improve cuda_mat_mul schedule (#5909) * Simplify and improve cuda_mat_mul schedule | 16 April 2021, 22:23:30 UTC |
a41cce7 | Volodymyr Kysenko | 16 April 2021, 20:47:16 UTC | Basic support of predicated loads/stores in C++ backend (#5908) * Basic support of predicated load/stores in C++ backend * Fix formatting and maybe build * Fix * trigger buildbots Co-authored-by: Steven Johnson <srj@google.com> | 16 April 2021, 20:47:16 UTC |
3531167 | Steven Johnson | 15 April 2021, 18:34:37 UTC | Drop LLVM10 support from master (#5740) * Drop LLVM10 support from master Update build files to require LLVM11+ in master branch. (Since we only regularly test master with 12 and 13 this is conservative.) Remove all code that is specialized for LLVM < 11.0. * Update CodeGen_ARM.cpp * Update CodeGen_LLVM.cpp | 15 April 2021, 18:34:37 UTC |
780ebd2 | Zalman Stern | 15 April 2021, 16:19:21 UTC | Add an error for realize with a different number of outputs than defined for pipeline. (#5906) * Add an error for calling realize with a different number of outputs than the pipeline was compiled with. * Forgot to add test. * A readability scarifice to the clang deity. * Add CMake file. * Minor change to error text. * Fix logic to handle Funcs returning Tuples. * Formatting. | 15 April 2021, 16:19:21 UTC |
da02c0d | Jiawen (Kevin) Chen | 14 April 2021, 22:17:01 UTC | Add missing "struct" before halide_type_t. (#5904) This allows it to compile as pure C instead of C++. Co-authored-by: Jiawen Chen <jiawen@adobe.com> | 14 April 2021, 22:17:01 UTC |
ccde965 | Steven Johnson | 14 April 2021, 21:32:29 UTC | Enable some more wasm simd tests that are now working with top-of-tree LLVM. (#5903) | 14 April 2021, 21:32:29 UTC |
3ac277b | Dillon Sharlet | 14 April 2021, 17:23:02 UTC | Rewrite double and triple narrowing on ARM (#5896) * Rewrite double and triple narrowing on ARM. * clang-format. Co-authored-by: Steven Johnson <srj@google.com> | 14 April 2021, 17:23:02 UTC |
ce9b324 | Steven Johnson | 14 April 2021, 16:18:45 UTC | Fix UB in halide_buffer_t::size_in_bytes (#5898) Just a port of https://github.com/halide/Halide/pull/4389 to the equivalent methods in HalideRuntime.h, since offset-from-a-null-pointer is UB in C++. | 14 April 2021, 16:18:45 UTC |
1ff3e3f | Mario Emmenlauer | 12 April 2021, 20:50:02 UTC | CMake build: Add more user control (#5859) * packaging/CMakeLists.txt: Allow users to override RPATH (i.e. for packaging Halide) * CMakeLists.txt: Allow users to override the C++ standard | 12 April 2021, 20:50:02 UTC |
9cc17b4 | Alexander Root | 12 April 2021, 17:23:57 UTC | Add fuzzer to bounds_of_expr_in_scope + fix discovered overflow bugs (#5895) * add interval bounds fuzzer * correct overflow checks in bounds inference * catch uint32->int32 overflow in simplifier and revert bounds change | 12 April 2021, 17:23:57 UTC |
687c7d8 | Andrew Adams | 09 April 2021, 05:04:11 UTC | Use guarded versions of vars if they exist in bounds inference (#5890) | 09 April 2021, 05:04:11 UTC |
cf40bc8 | Steven Johnson | 08 April 2021, 17:14:26 UTC | Improve wasm_threads documentation (#5843) * Improve wasm_threads documentation * Update HalideRuntime.h | 08 April 2021, 17:14:26 UTC |
71b895e | Alex Reinking | 18 February 2021, 21:44:02 UTC | Fix existing presets (remove -O2 stuff, typos) | 07 April 2021, 22:51:53 UTC |
bd16b37 | Alex Reinking | 18 February 2021, 21:42:47 UTC | Add shebang line to autotune_loop.sh | 07 April 2021, 22:51:53 UTC |
efea7a2 | Alex Reinking | 18 February 2021, 21:42:22 UTC | Remove WITH_APPS from README_cmake.md | 07 April 2021, 22:51:53 UTC |
69011bc | Alex Reinking | 17 February 2021, 06:56:20 UTC | Fix spelling mistakes and Doxygen references | 07 April 2021, 22:51:53 UTC |
b9cd9f2 | Alex Reinking | 07 April 2021, 22:37:12 UTC | Require LLD_DIR in zip/package.bat (#5887) | 07 April 2021, 22:37:12 UTC |
79fd0c9 | Alex Reinking | 07 April 2021, 19:12:23 UTC | [cmake] Fix and reorganize warnings for building Halide (#5885) | 07 April 2021, 19:12:23 UTC |
7473402 | Steven Johnson | 07 April 2021, 02:23:25 UTC | error_run_with_large_stack_throws should compile without exceptions (#5884) (1) Some downstream environments compile C++ without exceptions by default; this won't compile on those. (2) We should check that assert-fail also errors out as expected. | 07 April 2021, 02:23:25 UTC |
ea76214 | Alex Reinking | 07 April 2021, 00:53:11 UTC | Improve ClangCL support by disabling, fixing warnings (#5876) Co-authored-by: Mario Emmenlauer <memmenlauer@biodataanalysis.de> | 07 April 2021, 00:53:11 UTC |
85816e4 | Andrew Adams | 06 April 2021, 20:56:47 UTC | Add explicit cast to remove ambiguous operator== (Fixes #5329) (#5879) | 06 April 2021, 20:56:47 UTC |
e877e5b | Steven Johnson | 06 April 2021, 16:27:09 UTC | Fix natural_vector_size for wasm 64-bit types (#5880) In the original spec, wasm-simd128 didn't have int64 or float64; the final spec adds these types, so this bit of code is outdated and incorrect. | 06 April 2021, 16:27:09 UTC |
7825d48 | Steven Johnson | 06 April 2021, 16:25:21 UTC | Enable i64x2 comparisons in simd_op_check (#5881) * Enable i64x2 comparisons in simd_op_check * More drive-by fixes | 06 April 2021, 16:25:21 UTC |
9944dda | Ming Yan | 06 April 2021, 16:21:21 UTC | Fix typos in tutorial lesson_08 (#5875) | 06 April 2021, 16:21:21 UTC |
525e246 | Volodymyr Kysenko | 03 April 2021, 01:29:55 UTC | Try to vectorize inner statement of else branch of likely (#5874) * Try to vectorize inner statement of scalarize * Extend test to check for other scalarized loop * Add more details to the comment * make format * Remove note | 03 April 2021, 01:29:55 UTC |
59a04e4 | Alex Reinking | 02 April 2021, 17:53:49 UTC | Use fibers to guarantee stack size on Windows (#5873) * Use fibers for lowering. * Move fibers to Util * Wrap compile_func call in call_with_stack_requirement * Rename call_with_stack_requirement -> run_with_large_stack * Appease clang_format * Add exception handling to run_with_large_stack * clang-format * Fix 32-bit? * Fix error wording for Makefile * Improve naming in run_with_large_stack | 02 April 2021, 17:53:49 UTC |
42092e3 | Zalman Stern | 01 April 2021, 19:13:01 UTC | Fix an issue in Halide's float16 compilation support. Add tests. (#5872) In EmulateFloat16Math.cpp, conversion from 32-bit float to 16-bit float could produce a NaN value when and infinity is correct. This is because for numbers larger than the exact infinity value, the mantissa could be non zero. Add tests to cover this case, and float16 infinities in general. Couple small style/comment cleanups. | 01 April 2021, 19:13:01 UTC |
cb78a6b | Steven Johnson | 31 March 2021, 22:30:55 UTC | Don't strip strict_float() from lets (#5871) * Don't strip strict_float() from lets Bug injected in #5856: the change in Simplify_Let.cpp was inadvertently stripping `strict_float()` calls that wrapped the RHS of a Let-expr, which can change results nontrivially in some cases. I don't think a new test for this fix is practical -- it would be a little fragile, as it would rely on the specifics of simplification that could change over time. As a drive-by, also added an explicit rule to Simplify_Call to ensure that strict_float(strict_float(x)) -> strict_float(x) in *all* cases. (The existing rule didn't do this in all cases.) | 31 March 2021, 22:30:55 UTC |
896b260 | Dillon Sharlet | 31 March 2021, 15:08:33 UTC | Add some not rules. (#5870) | 31 March 2021, 15:08:33 UTC |
3e59294 | Dillon Sharlet | 30 March 2021, 23:26:22 UTC | Add TailStrategy::Predicate (#5856) * Add TailStrategy::Predicate * Add some tests for TailStrategy::Predicate. * Fix missing override. * Fix comment. * Tweak target behavior. * Remove all heuristics * clang-format. * clang-tidy. * TailStrategy::GuardWithIf isn't always faster than scalar code :( * Use TailStrategy::Predicate in the predicated store/load test. * What is this test * Fix test bug. * Revert x86 behavior. * Move predicate to Internal namespace. * Recursively strip tags. * trigger buildbots * strip_tags -> unwrap_tags * Fix comment. Co-authored-by: Steven Johnson <srj@google.com> | 30 March 2021, 23:26:22 UTC |
7bbe2fd | Steven Johnson | 30 March 2021, 23:00:23 UTC | Add wasm support for int32->f64 and f32->f64 simd ops (#5863) * Add wasm support for int32->f64 and f32->f64 simd ops At top-of-tree LLVM, the wasm backend never seems to emit the vector version of these ops; pattern-match to target them specifically. | 30 March 2021, 23:00:23 UTC |
e7eec5c | Steven Johnson | 30 March 2021, 22:45:32 UTC | Add support for wasm dot-product instruction (#5861) * Add support for wasm dot-product instruction | 30 March 2021, 22:45:32 UTC |
f2143bf | Steven Johnson | 30 March 2021, 19:29:32 UTC | Add a way to set a GeneratorInput's type in code (#5868) * Add a way to set a GeneratorInput's type in code Currently, if you want to vary the type of a Generator's inputs or outputs, you have to specify the types in the makefile. This can be awkward for things with complex logic. This PR proposes adding a way to do this: a new `set_type()` method which can only be called from the rarely-used Generator::configure() method. It only allows setting the type for an input or output that has no type specified. I'm not 100% sure if this is a good idea, but for certain rare corner cases, it may be quite handy. (Note that extending this to allow specifying dimensions and/or array size in the same way might be handy, but is omitted from this PR.) * Update Generator.h * Also add set_dimensions, set_array_size | 30 March 2021, 19:29:32 UTC |
2dd7a6b | Thales Sabino | 30 March 2021, 19:00:21 UTC | Add support for AVX-512 VNNI saturating dot products (#5807) * Add support for AVX-512 VNNI saturating dot products This commit adds support to Intel VNNI saturating dot product instructions vpdpbuds and vpdpwssd This was accomplished by adding a new VectorReduce operation to perform the saturating_add and exposing a new inline reduction saturaring_sum. Users can then write RDom r(0, 4); f(x) = saturating_sum(i32(0), i16(i8(g(x + r)) * u8(h(x + r)))) bool override_associativity_test = true; int vector_width = 4; Var xo, xi; f.update() .split(x, xo, xi, vector_width) .atomic(override_associativity_test) .vectorize(r) .vectorize(xi); To lower the expression into a call to vpdpbuds. Note that override_associativity_test is set to true or halide will fail to prove the associativity of the saturating_add operation Add support for VectorReduce::SaturatingAdd in CodeGen_LLVM Code is correctly generated when no intrinsic is available to perform a saturating dot product. Add vpdpbusds,vpdpwssd tests to simd_op_check Test if the saturating dot product instructions are being generated for AVX512_SapphireRapids targets * Improve code according to report from clang-tidy * Make init_val a const ref since it only used that way inside saturating_sum * clang-format * Revert removal of clang-format tag in CodeGen_X86.cpp * Add SaturatingAdd case Monotonic VectorReduce visit * Bail out in Bounds when dealing with a SaturatingAdd VectorReduce * Move saturating_mul to Simplify_Internal.h so it can be used in Simplify_Exprs.cpp * Remove init_val from the saturating_sum inline reduction * Unconditionally override the associativity test in the simd_op_check tests * Remove annonymous namespace from saturating_mul utility Co-authored-by: Thales Sabino <thales@codeplay.com> | 30 March 2021, 19:00:21 UTC |
5b238e7 | Steven Johnson | 30 March 2021, 17:18:45 UTC | Use a varying seed for random test data in simd_op_check (#5864) * Use a varying seed for random test data in simd_op_check We currently use `123` as a hardcoded seed, so we may sometimes be getting lucky with test patterns that happen to match scalar and vector. Let's vary the seed in the same way we do for (eg) fuzz_simplify to slightly broaden test coverage. | 30 March 2021, 17:18:45 UTC |
602cbac | Shivam Gupta | 30 March 2021, 16:56:13 UTC | [NFC] LLVM trunk is now called main (#5866) Reference - https://foundation.llvm.org/docs/branch-rename/ | 30 March 2021, 16:56:13 UTC |
b7bc8e2 | Steven Johnson | 30 March 2021, 16:48:14 UTC | Add support for wasm-simd saturating-narrow ops. (#5854) * Add support for wasm-simd saturating-narrow ops. | 30 March 2021, 16:48:14 UTC |
e0461e9 | Steven Johnson | 30 March 2021, 03:01:20 UTC | Add support for i16x8.q15mulr_sat_s in wasm (#5853) * Add support for i16x8.q15mulr_sat_s in wasm Also, some drive-by clarifications to other wasm-simd instructions in simd_op_check -- some of the yet-to-be-implemented ones are of dubious use in Halide and may not be worth implementing. | 30 March 2021, 03:01:20 UTC |
07f880e | Steven Johnson | 29 March 2021, 23:37:23 UTC | Add support for pairwise_widening_add in wasm (#5850) * Add support for widening_mul in wasm | 29 March 2021, 23:37:23 UTC |
e8085bd | Dillon Sharlet | 29 March 2021, 17:09:15 UTC | Add simplifier rules helpful for specialization (#5836) * Add simplifier rules helpful for specialization. * clang-format * Revert sketchy select simplification. * Add different rules. | 29 March 2021, 17:09:15 UTC |
4f152f3 | Steven Johnson | 27 March 2021, 23:16:45 UTC | Add align_extent(), to align extent but not min (#5829) * Allow align_bounds() to align extent but not min This can be handy when you have an intermediate Func that is being tiled inside an outer Func and you want to ensure that it fits an exact multiple of tiles. * Add separate align_extent() method | 27 March 2021, 23:16:45 UTC |
bc42da9 | Steven Johnson | 25 March 2021, 16:40:28 UTC | Add support for widening_mul in wasm (#5849) * Add support for widening_mul in wasm | 25 March 2021, 16:40:28 UTC |
9a8ddf7 | Andrew Adams | 24 March 2021, 18:52:17 UTC | Remove buggy deinterleave misfeature (#5844) | 24 March 2021, 18:52:17 UTC |
92dfc82 | Dillon Sharlet | 24 March 2021, 17:56:58 UTC | Enable sliding window in registers (#5815) * Sliding in registers * Fix some failure cases. * Handle if_then_else in loop partitioning. * Add rebase_loops_to_zero pass. * Use select instead of if_then_else. * Add select comparison simplifications. * Don't rewrite lets * Rebase producer loops of register slides to 0, and don't overwrite realization bounds. * Add rules for ramp < broadcast * Put the likely on the old value instead of the new value. * New rules for comparing ramps and broadcasts * Switch back to if_then_else * Update comments. * Don't try to fold dimensions with a constant min or max. * More comments. * Make the vectorized register sliding window test tighter. * Remove debug helper. * Fix tests broken by loop rebasing. * Move rebasing after loop partitioning * clang-format * clang-tidy * Also put MemoryType::Register on the stack. * Expand arg before substitute. Co-authored-by: Andrew Adams <andrew.b.adams@gmail.com> Co-authored-by: Steven Johnson <srj@google.com> | 24 March 2021, 17:56:58 UTC |
28dd74f | Steven Johnson | 24 March 2021, 16:30:16 UTC | Upgrade WABT to 1.0.22 and do some minimal update to codegen and simd_op_check for LLVM13. (#5841) * Upgrade WABT to 1.0.22 and do some minimal update to codegen and simd_op_check for LLVM13. * tickle * trigger buildbots | 24 March 2021, 16:30:16 UTC |
602d15a | Dillon Sharlet | 22 March 2021, 21:39:49 UTC | Fix missed pattern matching due to cast optimization. (#5834) | 22 March 2021, 21:39:49 UTC |
8aa778c | Dillon Sharlet | 22 March 2021, 19:19:44 UTC | Enable targeting broadcasting dot products on ARM (#5833) * Enable targeting broadcasting dot products on ARM. * Add to comment. | 22 March 2021, 19:19:44 UTC |
80307ec | Andrew Adams | 21 March 2021, 23:07:23 UTC | Update README.md (#5831) | 21 March 2021, 23:07:23 UTC |
e92f05d | Steven Johnson | 18 March 2021, 16:58:27 UTC | Upgrade pybind11 to 2.6.2 (#5821) * Upgrade pybind11 to 2.6.2 Released January 2021 with various bug fixes, see https://github.com/pybind/pybind11/releases/tag/v2.6.2 * Update requirements.txt | 18 March 2021, 16:58:27 UTC |
8d5c7cd | Andrew Adams | 18 March 2021, 02:06:03 UTC | In fast_sine_cosine, fix message, relax threshold (#5820) | 18 March 2021, 02:06:03 UTC |
1a61c4b | Volodymyr Kysenko | 17 March 2021, 23:42:41 UTC | Use print_type function when printing vector type definitions, which can (#5819) be overridden in derived classes if needed. | 17 March 2021, 23:42:41 UTC |
b549d18 | Steven Johnson | 16 March 2021, 21:05:33 UTC | Grab-bag of minor fixes (#5813) * All TraceViz scripts should set pipefail (at least) * Use ".ldscript" for linker version scripts * More TraceViz fixes | 16 March 2021, 21:05:33 UTC |
e1c0fd3 | Alex Reinking | 16 March 2021, 20:19:05 UTC | Conform to CMP0116 when using CMake 3.20+ (#5810) | 16 March 2021, 20:19:05 UTC |
5aa1a65 | Steven Johnson | 16 March 2021, 03:16:33 UTC | Add Halide_CCACHE_BUILD option for CMake (#5804) * Add Halide_CCACHE_BUILD option for CMake This replicates the approach LLVM already has of baking a use-cache option directly into the CMake options, without having to futz with changing CC/CXX/etc. * Update CMakeLists.txt * Add CCACHE_SLOPPINESS=pch_defines,time_macros * Update CMakeLists.txt | 16 March 2021, 03:16:33 UTC |
b061a32 | Dillon Sharlet | 15 March 2021, 21:58:31 UTC | Avoid printing the statement when it is unchanged (#5809) * Only print statement if it changed. * Move simplification after unroll from lower to after unrolling a loop. * Avoid making new IR when not necessary. * Use a small helper class instead. * clang-format. Co-authored-by: Steven Johnson <srj@google.com> | 15 March 2021, 21:58:31 UTC |
c49b4af | Mike Woodworth | 15 March 2021, 20:12:57 UTC | support building for m1 with only AARCH target (#5802) move test for ARM + metal to AARCH64 Co-authored-by: Steven Johnson <srj@google.com> | 15 March 2021, 20:12:57 UTC |
28c742f | Steven Johnson | 15 March 2021, 19:21:36 UTC | Set CMAKE_CROSSCOMPILING_EMULATOR in toolchain.linux-arm32.cmake (#5808) Also add comments | 15 March 2021, 19:21:36 UTC |
d523b83 | Steven Johnson | 13 March 2021, 00:54:23 UTC | Fix deprecation warnings for trunk LLVM (#5803) (1) Both the 'deprecated' and new, non-deprecated variants existed back to at least LLVM10, and the deprecated variant was commented as deprecated at that point as well; the change in LLVM13 is that they are now annotated with LLVM_ATTRIBUTE_DEPRECATED so we get compiler warnings (and thus errors). (2) The fixes are simply replicating what the old, deprecated methods did internally. | 13 March 2021, 00:54:23 UTC |
c3882a5 | Dillon Sharlet | 11 March 2021, 23:14:36 UTC | Change warmup strategy for sliding window (#5755) * Implement sliding window warmups by backing up the loop min. * Fix indirect sliding windows. * Improve is_monotonic. * Small cleanups. * Avoid generating vector valued bounds. * Fix build error on some compilers. * Fix loop bounds. * Don't try to slide things that should just be compute_at the store_at location. * Print condition when printing boxes. * Less things broken. * Add/fix comments. * Comments * Fix async by moving if inside consume (and so inside acquires). * Fix division. * This doesn't work on master either. * Add TODO * Acquire is not a no-op. * Add comment about unfortunate simplification. * Remove debug(0) * Add simplification of for { acquire { noop } } * Fix folding factors finally! * Update storage_folding test. * Fix bug when cloning a semaphore used more than once. * Disable failing test. * Work around bad complexity in is_monotonic. * Fix sub bug * Significantly faster schedule for blur. * Update tracing test. * New simplifications that help with upsampled and downsampled sliding windows. * This doesn't need explicit folding any more. * Fix new simplifier rules. * Fix simplifier div rule * Remove ancient brittle test. * Fix simplify rule again * More LT -> EQ rules for mod * Fix nested sliding windows with upsamples. * Replace hack with better solution. * Add missing override * Don't rewrite loop variable if the min doesn't change. * Refactor sliding window lowering. * Fixed bounds growing redundantly for independent producers. * Don't take the union unless possibly needed. * Respect conditional provide/required. * Add missing overrides * Much better schedule. * Use a smaller image for blur benchmarking so that different schedules have different perf * Replace Interval with ConstantInterval for is_monotonic. * Don't try to handle unsigned deltas. * Add failing test. * Remove unused new code. * Remove weird debugging code. * Avoid expanding bounds of split producers * Remove stray likely_if_innermost. * Remove old autotune tests. * Update test for guarded producers. * Reenable test. * Update trace for guarding producers. * Don't overwrite required.used * Handle LE/LT in bounds of lanes in vectorize * Fix acquire and release of warmups * Earlier fix for multiply cloned acquires was wrong. * Handle nested vectorization. * clang-format * Remove autotune_bug_* tests * Fix shadowing error on some compilers. * Appease overzealous clang-tidy warning. * clang-format * Don't use silly hack. * clang-tidy... * It's no longer safe to assume monotonic means bounds_of_expr_in_scope is exact * Address review comments * Add comment * Add missing override. * Fix constant interval issues. * Revert and remove empty interval * Fix multiply!? * Reduce need for simplifications. * Simplifications from dsharletg/sliding-window branch * Don't learn likely(x) and x. * Add comment * Add some min/max rules. * Also substitute facts from asserts * Remove is_empty from header too. * More rules * Add double stairstep rule. * Disable rule that uncovers bugs. * Consider anded expressions as if they were independent nested ifs. * Add promise_clamped to producer guards. * Revert "Consider anded expressions as if they were independent nested ifs." This reverts commit 03efb3f784b3078b64961c98edde383f4de04fb4. * Don't combine ifs, split them instead. * Update trace * clang-tidy/clang-format * Remove splitting of ifs, it breaks brittle tests. * Safer check on old conditions. * Fix producer guard condition. * Interval fixes. * Handle sliding backwards * Handle transitive dependencies. * Backport abadams' fix from abadams/slide_over_split_loop * Fix select visitor. * More simplifier rules. * Bring back old logic as a fallback. * Avoid specializations corrupting sliding * Fix boneheaded rule errors. * Fix slightly conservative bounds at the max for split case. * This pattern is too sensitive to the simplifier. In a real use case, it's just a sum, and the result can be subtracted after doing a reduction. * Add missing clamp rule * Don't count unlikely loops as inner loops for likely_if_innermost * Use <= instead of == to solve for the new loop min Useful when the warmup is a partial vector or something * Verify simplifier changes and add variants as suggested by synthesizer * Make implicit assumption explicit, for clarity * Use find_constant_bounds * Guard against expanded bounds more effectively. * Update tracing test * Small cleanup. * Don't simplify/prove using lets that might change value. * Stronger solving without expanding lets. * New simplifier rule for alignment * Fix case where no warmup needed * Add some useful rules. * Add safety check on when we can use the new loop min. * Better proof to avoid hacky condition that is hard to prove. * Small cleanup and use the nice new folding factors. * Bring back unrolled producer test. * clang-format * Expand comment. * Fix sliding backwards condition. * min(new_loop_min, loop_min) isn't needed any more. * We need that min, but we can be more conservative about it. * Stronger handling of previous loop mins. * Remove unused is_monotonic_strong. * Remove ConstantInterval::make_intersection. * Avoid need to handle uint specially. * Add cache for depends_on. * Reduce unnecessarily large cache scope * The first part of the key is always the same Co-authored-by: Andrew Adams <andrew.b.adams@gmail.com> | 11 March 2021, 23:14:36 UTC |
c2a0db1 | Steven Johnson | 10 March 2021, 23:39:27 UTC | Fix correctness_memoize on arm32 (#5799) Conversion from a pointer to an integer is "implementation defined"; in general, conversion to `uintptr_t` is reliable, but other conversions aren't. It turns out that for our arm32 compiler, casting a pointer to a `uint64_t` sign-extends, rather than zero-extends, causing unexpected behavior in memoize cache eviction *if* the pointer is in the top half of memory. The fix here is simple (cast to `uintptr_t` directly), but it brings up the question of where else in our codebase we might be doing direct conversions elsewhere in our code without noticing the potential UB. | 10 March 2021, 23:39:27 UTC |
7a8c771 | Steven Johnson | 10 March 2021, 22:53:40 UTC | Fix wasm-interp ucon error (#5797) This is a subtle error that only shows up in builds with assertions enabled in wabt; the issue is that we ask for an int64 value from a wabt `Value` struct, but that struct only has an int32, so an assertion can fail. (Note that there is always storage for both, and the values are constrained and unused inside the JIT anyway, so this is mostly just a cosmetic fix.) Also added a .gitignore entry. | 10 March 2021, 22:53:40 UTC |
34c402f | Dillon Sharlet | 10 March 2021, 21:54:51 UTC | Fix bug found by asan from #5784 (#5798) | 10 March 2021, 21:54:51 UTC |
c67f486 | John Lawson | 10 March 2021, 18:51:33 UTC | Move where intrinsic function attributes are set (#5795) * Make declare_intrin_overload return LLVM function * Make names same as elsewhere * Remove unneeded enum name * Set moved attributes in Hexagon backend * Use declare_intrin_overload for ARM vabdl * Fix ARM vabdl intrinsic types * Format and clang-tidy * Rename intrinsic to widening_absd | 10 March 2021, 18:51:33 UTC |
78ff307 | Lars | 10 March 2021, 18:35:50 UTC | Display the GPU device code as a string in the C/C++ backend (#5757) * Added support for a OpenCL backend through the C backend No correctly handles assert of kernel call Fixed clang-formatter Clang-tidy fixes * Remove references to alternative OpenclHost code * Only initialize GPU context once (removes name conflicts) squash commit * (Pretty) Print the kernels in the C backend clang formatting remove virtual * Fixes after review fixes after review 2.0 fixes | 10 March 2021, 18:35:50 UTC |
e75d9fb | Dillon Sharlet | 09 March 2021, 22:16:21 UTC | Fix out of bounds reads in strided ARM loads (#5784) * Safer version of vldN code generation. * Only be more conservative with alignment for external buffers. * Add tolerance to allocation size tests. * Remove old comments. * Improve ARM alignment and vldN code generation. * Remove merge straggler * Fix alignment condition (again). * Fix alignment. * Avoid divide by zero. * Move CodeGen_ARM's logic for strided loads to CodeGen_LLVM. * Fix comment. * clang-format. * Remove sketchy alignment check. | 09 March 2021, 22:16:21 UTC |
a83ab23 | Dillon Sharlet | 08 March 2021, 19:24:07 UTC | Simplifier rules for nested broadcasts (#5794) * Handle some reassociation when simplifying nested broadcasts. * clang-format. | 08 March 2021, 19:24:07 UTC |
89f5ee7 | Andrew Adams | 05 March 2021, 20:00:57 UTC | Add missing min/max/+/- rules (#5788) These are almost all of the rules in <= four in terms of min/max/add/sub ops leaves that simplify to something with <= three leaves. I left out things of the form: max(max(x, -x), 0) -> max(x - x) While correct, that transformation actually hurts our ability to analyze that expression. | 05 March 2021, 20:00:57 UTC |
199b873 | Steven Johnson | 05 March 2021, 17:45:48 UTC | Upgrade WABT version to 1.0.21 (#5782) | 05 March 2021, 17:45:48 UTC |
06b208f | Dillon Sharlet | 04 March 2021, 18:11:57 UTC | Move codegen backends into anonymous namespaces in source files and don't build them if not enabled (#5776) * Remove unused vertex buffer parameters. * Offload GPU code in a lowering pass instead of via CodeGen_GPU_Host. Fixes #5650, fixes #2797, fixes #2084, now #1971 is more relevant. * clang-format. * clang-format sorting is case sensitive!? * clang-tidy * Move codegen backends into anonymous namespaces in source files. * clang-format * Pass type arguments correctly. * Update OffloadGPULoops.cpp * trigger buildbots * trigger buildbots * Hack around tests that rely on the IR for offloaded GPU loops. * Fix missing include. * Remove unused include. * clang-tidy * Use custom lowering pass to see code before GPU offloading * Speculative fix for segfault * Fix const correctness * Fix error on unused variables in generated code. Co-authored-by: Steven Johnson <srj@google.com> | 04 March 2021, 18:11:57 UTC |
abf0f69 | cimes-isi | 04 March 2021, 08:16:21 UTC | cmake: respect find_package QUIET option (#5785) Co-authored-by: Steven Johnson <srj@google.com> | 04 March 2021, 08:16:21 UTC |
a0c5380 | Andrew Adams | 04 March 2021, 06:29:04 UTC | Fixes for macos on arm (#5787) * Remove type checking on ARM-64 instructions in simd op check * Makefile fixes for M1 * Don't assume OSX is x86 * Change xml2 linking flag * M1 has a really fast div instruction | 04 March 2021, 06:29:04 UTC |
74f40fd | Andrew Adams | 03 March 2021, 20:26:03 UTC | Track time spent in malloc/free when profiling (#5763) * Track time spent in malloc/free when profiling * Appease clang tidy * Remove unnecessary asserts | 03 March 2021, 20:26:03 UTC |
acebd50 | Dillon Sharlet | 03 March 2021, 20:14:29 UTC | Various simplifier improvements from dsharletg/sliding-window (#5771) * Pull simplifier changes from dsharletg/sliding-window * Bring over test changes too. * Fix typo * Remove done TODO. * trigger buildbots * This pattern is too sensitive to the simplifier. In a real use case, it's just a sum, and the result can be subtracted after doing a reduction. Co-authored-by: Steven Johnson <srj@google.com> | 03 March 2021, 20:14:29 UTC |
5da8044 | Dillon Sharlet | 03 March 2021, 20:14:11 UTC | Various bug fixes and improvements from dsharletg/sliding-window (#5772) * Fix bug when a semaphore is cloned more than once. * (Originally by abadams) Don't count unlikely loops as inner loops for likely_if_innermost. * Ignore promise_clamped when solving. * Acquires are not no-ops. * Fix test name * Handle nested vectors in bounds_of_lanes and (by abadams) Handle LE/LT in bounds of lanes in vectorize * Fix test name. * Allow any level of nested vectorization. * trigger buildbots * Grammar Co-authored-by: Steven Johnson <srj@google.com> | 03 March 2021, 20:14:11 UTC |
7493c09 | Dillon Sharlet | 02 March 2021, 21:07:14 UTC | Move CodeGen_GPU_Host to a lowering pass (#5775) * Remove unused vertex buffer parameters. * Offload GPU code in a lowering pass instead of via CodeGen_GPU_Host. Fixes #5650, fixes #2797, fixes #2084, now #1971 is more relevant. * clang-format. * clang-format sorting is case sensitive!? * clang-tidy * Pass type arguments correctly. * Update OffloadGPULoops.cpp * trigger buildbots * Hack around tests that rely on the IR for offloaded GPU loops. * clang-tidy * Use custom lowering pass to see code before GPU offloading * Speculative fix for segfault * Fix const correctness * Fix error on unused variables in generated code. * Remove unnecessary space * Use helper. Co-authored-by: Steven Johnson <srj@google.com> | 02 March 2021, 21:07:14 UTC |
01f5e73 | Steven Johnson | 02 March 2021, 18:06:11 UTC | Update simd_op_check for the final wasm simd128 spec (#5779) The final revision of the wasm simd128 spec (https://github.com/WebAssembly/simd/blob/master/proposals/simd/SIMD.md) added some ops and tweaked some others. This just augments the wasm section of simd_op_check so that all the ops are referenced, with the new (or still-unimplemented) ops commented out with TODOs. Most of the new ops will likely be implemented via pattern matching in Codegen_WebAssembly, but before that can happen in any efficient way, WABT needs to be updated to recognize all the ops in the final spec (it doesn't currently, and so we can't even load code with those ops without cratering). See https://github.com/WebAssembly/wabt/issues/1617 for tracking bug. (Also: drive-by fix in CodegenLLVM to fix prettyprinting of some debug output) | 02 March 2021, 18:06:11 UTC |
47d0594 | Steven Johnson | 28 February 2021, 19:35:27 UTC | Fix for trunk LLVM (#5778) | 28 February 2021, 19:35:27 UTC |