Revision history - refs/heads/abadams/fix_7584 - origin: https://github.com/halide/Halide

visit type:

Revision	Author	Date	Message	Commit Date
7baedca	Andrew Adams	31 May 2023, 21:40:28 UTC	Fix operator/ on ModulusRemainder It wasn't reducing the remainder modulo the modulus, which confused trim_bounds_using_alignment in the simplifier.	31 May 2023, 21:40:28 UTC
35740c5	Steven Johnson	31 May 2023, 18:56:56 UTC	Configure the fuzz tests to run for a finite amount of time	31 May 2023, 18:56:56 UTC
e5d5c93	Steven Johnson	31 May 2023, 17:16:39 UTC	Merge branch 'main' into pr/7566	31 May 2023, 17:16:39 UTC
eb9b946	Steven Johnson	30 May 2023, 22:39:41 UTC	Apply fix from #7564 to fuzz/bounds (#7596) (Avoids infinite loop for some fuzzing inputs)	30 May 2023, 22:39:41 UTC
b450647	Pranav Bhandarkar	25 May 2023, 21:07:38 UTC	[Fix for #7524] Skip tests for anderson2021 if PTX is not enabled (#7593) Skip tests for anderson2021 if PTX is not enabled	25 May 2023, 21:07:38 UTC
ca8ca00	Steven Johnson	24 May 2023, 20:14:29 UTC	Pacify clang-tidy by removing unused constant (#7590)	24 May 2023, 20:14:29 UTC
6a98655	Nathaniel Brough	22 May 2023, 17:16:28 UTC	fuzz: Add libfuzzer compatible bounds fuzzer (#7549) * fuzz: Add libfuzzer compatible bounds fuzzer * Remove unused constant * Style fix * Fix handling of binary ops * Handle casting to vector-of-bool properly * fuzz: Alphabetically sort targets in CMake --------- Co-authored-by: Steven Johnson <srj@google.com>	22 May 2023, 17:16:28 UTC
e892e7b	Nathaniel Brough	20 May 2023, 22:54:18 UTC	Add build directory in cmake/fuzzing documentation	20 May 2023, 22:54:18 UTC
9c50965	Nathaniel Brough	20 May 2023, 22:30:52 UTC	Remove asan flags from fuzzer	20 May 2023, 22:30:52 UTC
ac0de71	Nathaniel Brough	20 May 2023, 20:57:46 UTC	Fixes spelling/grammar in fuzzing readme Co-authored-by: Alex Reinking <alex.reinking@gmail.com>	20 May 2023, 21:00:00 UTC
d234143	Steven Johnson	18 May 2023, 19:04:34 UTC	Check for slightly different error msg in AppleClang 14.0.3 (#7582) * Check for slightly different error msg in AppleClang 14.0.3 * Update Makefile	18 May 2023, 19:04:34 UTC
02768ef	Steven Johnson	18 May 2023, 17:43:49 UTC	In fuzz/simplify, output errors to cerr, not cout (#7583) * In fuzz/simplify, output errors to cerr, not cout This makes it easier to capture error output in downstream test harnesses * Also add some more helpful text	18 May 2023, 17:43:49 UTC
4282a5d	Steven Johnson	18 May 2023, 17:05:36 UTC	Fix #7579 (#7580) Fix per @jrprice. (He comments that we should probably regenerate all of mini_webgpu.h, and document how to do that; this is a band-aid to unbreak testing.)	18 May 2023, 17:05:36 UTC
2ed955e	Steven Johnson	18 May 2023, 00:49:25 UTC	Fix various compilation errors with AppleClang 14.0.3 (#7578) * Change & -> && usage Newer versions of Xcode trigger `-Wbitwise-instead-of-logical` for this usage, which we treat as an error * Also fix `error: variable 'i' set but not used [-Werror,-Wunused-but-set-variable]` * Also fix `retrain_cost_model.cpp:419:17: error: variable 'counter' set but not used [-Werror,-Wunused-but-set-variable]`	18 May 2023, 00:49:25 UTC
6c8f7aa	Derek Gerstmann	17 May 2023, 23:05:00 UTC	[vulkan] Fix subregion memory offsets to respect buffer alignment (#7576) * Fix buffer alignment constraints for subregion allocations (some drivers report a minimum alignment for the buffer that is larger than the storage or uniform storage offset alignemnt) Cleanup region offset and size constraints * Clang tidy/format pass --------- Co-authored-by: Derek Gerstmann <dgerstmann@adobe.com>	17 May 2023, 23:05:00 UTC
30d309e	Derek Gerstmann	17 May 2023, 23:04:45 UTC	[vulkan] Change the feature version requirement to v1.3 for correctness_gpu_dynamic_shared (#7577) Change the feature version requirement to v1.3 (since v1.2 lacks the necessary support). Co-authored-by: Derek Gerstmann <dgerstmann@adobe.com>	17 May 2023, 23:04:45 UTC
2fd90bf	Derek Gerstmann	17 May 2023, 18:47:36 UTC	[vulkan] Disable generator acquire_release test for Vulkan (#7565) Disable test for Vulkan Co-authored-by: Derek Gerstmann <dgerstmann@adobe.com>	17 May 2023, 18:47:36 UTC
552bbe2	Steven Johnson	17 May 2023, 18:09:23 UTC	Merge branch 'main' into pr/7566	17 May 2023, 18:09:23 UTC
968e52c	Steven Johnson	17 May 2023, 17:09:47 UTC	Upgrade WABT to 1.0.33 (#7570) * Upgrade WABT to 1.0.33 * Update CMakeLists.txt * Update CMakeLists.txt	17 May 2023, 17:09:47 UTC
76bb84d	Steven Johnson	16 May 2023, 18:25:08 UTC	Allow autoconversion from `Buffer<T>` -> `Buffer<const T>&` and to `Buffer<void>&` (#7571) * Allow autoconversion from `Buffer<T>` -> `Buffer<const T>&` When you are intermixing CPU and GPU calls in a single piece of code, it's preferable to pass `Buffer<>` by nonconst reference, so that lazy host<->device copies are done efficiently. However, many callers prefer to define input Buffers as `Buffer<const T>` (as they should), but the fact that this form didn't easily allow autoconversion from caller (whihc may well have constructed the buffer as non-const) to callee (due to incompatible type references) led some users to just pass by a copy, since these autoconverted. This had a couple of undesirable effects: - Making a copy cost a small but nonzero amount of code (managing refcounts, etc) - More importantly, lazy copies in the callee got 'lost' to the caller, since the `halide_buffer_t` in the callee was a copy, thus any added `device` value or change in dirty bits was never seen. This could previously be worked around by adding explicit calls to `.as_const()`, but that is ugly and awkward. This change adds an ugly-but-safe implicit-conversion overload, to allow converting `Buffer<T>&` to `Buffer<const T>&`, iff T isn't already const. This will allow cleaning up downstream code to pass by references more consistently, without needing to add `.as_const()` warts. * Also add convenience conversions for Buffer<void>&	16 May 2023, 18:25:08 UTC
f121abf	philboske	15 May 2023, 23:47:02 UTC	Fix save_tiff() PlanarConfig assignment for monochrome inputs (#7568) Fixes #7567.	15 May 2023, 23:47:02 UTC
ae7a5bd	Nathaniel Brough	13 May 2023, 00:16:01 UTC	Adds documentation on fuzz testing Closes: #7552	13 May 2023, 00:18:43 UTC
a31bbe3	Nathaniel Brough	12 May 2023, 23:06:39 UTC	Adds fuzzing preset Partial fix for #7552	12 May 2023, 23:12:16 UTC
ae53d9b	Steven Johnson	12 May 2023, 18:18:20 UTC	Avoid potentially infinite loop in fuzz/simplify.cpp (#7564) FuzzedDataProvider is not a RNG; there's no guarantee that it won't return the same data to you forever. This means that the loop to find a new subtype may never terminate (eg if the 'random' type returned always matches the input type). This "fixes" it by just adding a count to break out of the loop, in which case we just use the original type. Not sure if there's a more elegant fix?	12 May 2023, 18:18:20 UTC
e0ef57a	Steven Johnson	12 May 2023, 16:37:40 UTC	Remove unique_name() usage from fuzz/cse (#7563)	12 May 2023, 16:37:40 UTC
252c4b8	Steven Johnson	11 May 2023, 17:07:30 UTC	Add/augment some runtime debug output (#7561) - in `halide_buffer_to_string()`, print the `halide_buffer_t*` pointer value as well - in `debug_log_and_validate_buf()`, do debug logging for some failure modes that return errors	11 May 2023, 17:07:30 UTC
afea893	Derek Gerstmann	10 May 2023, 15:55:38 UTC	[vulkan] Disable performance_wrap test for Vulkan ... results don't match (#7560) * Fix missing initializer for vulkan memory config that got munged in a previous merge. This gets the correctness_multiple_outputs test to pass. * Disable test for Vulkan since shared memory results are incorrect (see issue #7559) * Clang tidy/format pass --------- Co-authored-by: Derek Gerstmann <dgerstmann@adobe.com>	10 May 2023, 15:55:38 UTC
53de4ce	Steven Johnson	09 May 2023, 18:24:07 UTC	Fix #7556 (#7557) * Fix #7556 * Update cast.cpp * Add user_assert that type lanes match * Revert "Add user_assert that type lanes match" This reverts commit e1f34e0c3098a4952af64ae88632bb2ada9763b1.	09 May 2023, 18:24:07 UTC
8f22013	Steven Johnson	09 May 2023, 17:00:40 UTC	Followup to #7551 for bool vectors (#7555) Need to cast to a type that is bool-with-lanes, not scalar bool	09 May 2023, 17:00:40 UTC
acde515	Derek Gerstmann	09 May 2023, 17:00:12 UTC	[vulkan] Fix missing initializer for vulkan memory config (#7554) Fix missing initializer for vulkan memory config that got munged in a previous merge. This gets the correctness_multiple_outputs test to pass. Co-authored-by: Derek Gerstmann <dgerstmann@adobe.com>	09 May 2023, 17:00:12 UTC
763d207	Steven Johnson	08 May 2023, 23:16:57 UTC	Fix fuzz/cse to avoid signed_integer_overflow() results (#7553) * Fix fuzz/cse to avoid signed_integer_overflow() results * Update cse.cpp	08 May 2023, 23:16:57 UTC
7afb343	Steven Johnson	08 May 2023, 20:41:31 UTC	Fix errors in fuzz/simplify.cpp (#7551) * Style Fix: don't use uppercase-T for non-template arguments * Boolean ops need extra type coercion	08 May 2023, 20:41:31 UTC
fb71862	Steven Johnson	08 May 2023, 01:39:41 UTC	Fix unused-thing warnings in fuzz/simplify.cpp (#7548) * Fix unused-thing warnings in fuzz/simplify.cpp * Update simplify.cpp	08 May 2023, 01:39:41 UTC
c86d418	Nathaniel Brough	05 May 2023, 01:33:25 UTC	fix(fuzz): Refactor fuzzers to fix off by 1 errors (#7547) Cleanup the fuzzers making them more readable and fix off by one errors caused by incorrect usage of FuzzedDataProvider::ConsumeIntegralInRange. Closes: #7546	05 May 2023, 01:33:25 UTC
dff1e38	Dmitry Babokin	02 May 2023, 20:38:17 UTC	Remove workaround for GCC 4.x.x in cpuid() (#7545) * Remove workaround for GCC 4.x.x	02 May 2023, 20:38:17 UTC
96acbc6	Steven Johnson	02 May 2023, 20:37:47 UTC	Workaround for Issue #7539 (#7540) * Workaround for Issue #7539 Partial fix for now * trigger buildbots	02 May 2023, 20:37:47 UTC
05316af	Marcos Slomp	02 May 2023, 19:01:29 UTC	metal : replacing spinlock by mutex (#7532) replacing spinlock by mutex	02 May 2023, 19:01:29 UTC
2945c71	Nathaniel Brough	02 May 2023, 16:22:26 UTC	fuzz: Port correctness/cse fuzzer over to libfuzzer (#7543)	02 May 2023, 16:22:26 UTC
7cdbc71	Steven Johnson	02 May 2023, 04:00:16 UTC	Rework CMake interface for Dawn/Node bindings (#7422) AOT pipelines that rely on Dawn/WebGPU now depend on a new Halide_WebGPU find-module. This module honors the make-ish HL_WEBGPU_NATIVE_LIB variable as a means of initializing the Halide_WebGPU_NATIVE_LIB cache variable. This is automatically handled by add_halide_generator and add_halide_runtime and is available to downstreams. The JIT tests no longer read the HL_WEBGPU_NODE_BINDINGS environment variable during the CMake configure or build phase. Instead, a test launcher reads it at CTest runtime. Co-authored-by: Alex Reinking <quic_areinkin@quicinc.com>	02 May 2023, 04:00:16 UTC
6db47d3	Nathaniel Brough	01 May 2023, 23:46:52 UTC	Fix flag check for fuzzers (#7542) On some system size_t isn't available under <cstdint>, however it is garaunteed to be available under <cstddef> for all systems.	01 May 2023, 23:46:52 UTC
38ed15d	Steven Johnson	01 May 2023, 17:08:54 UTC	Fix some autoscheduler build errors (#7538) - Remove inadvertent duplicate of PerfectHashMap.h from adams2019 - add some missing #includes - never pass negative values to exit()	01 May 2023, 17:08:54 UTC
044a8cf	Nathaniel Brough	01 May 2023, 14:00:36 UTC	Add libfuzzer compatible fuzz harness (#7512)	01 May 2023, 14:00:36 UTC
244e72c	Steven Johnson	26 April 2023, 22:37:36 UTC	Avoid endless loop in msan + zero-extent buffer (#7536) With MSAN enabled, we use `make_buffer_copy()` to build an efficient way to check the poison bits on buffers; unfortunately, if you are checking a buffer that has at least one dimension with zero-extent but nonzero-stride, the final while loop will never terminate. Add a trivial check so that it exits.	26 April 2023, 22:37:36 UTC
4d86539	Derek Gerstmann	25 April 2023, 00:21:15 UTC	[vulkan phase2] Vulkan Runtime (#6924) * Import Vulkan runtime changes from personal branch * Fix build to work with latest changes in main * Hookup Vulkan into Target, DeviceInterface and OffloadGPULoops * Add Vulkan runtime to Makefile * Add Vulkan target to Python bindings * Add runtime linker support to target Vulkan CodeGen * Add Vulkan windows decorator to runtime targets * Wrap debug messages for internal runtime classes with DEBUG_INTERNAL Error on failed string termination * Silence clang-tidy warnings for redundant expressions on Vulkan enum values * Clang tidy & format pass * Fix formatting for single line statements * Move Vulkan option to top-level CMakeLists.txt and enable SPIR-V as needed * Fix Vulkan & SPIRV dependencies for makefile * Add Halide version info to Makefile Add HALIDE_VERSION compiler definitions to compilation * Add HL_VERSION_FLAGS to RUNTIME_CXX_FLAGS * Finish refactoring of Vulkan CodeGen to use SpirV-IR. Added splitmix64 based hashing scheme for types and constants. Numerous fixes to instruction packing. Added debug symbols to all variables. * Clang tidy/format pass. * Fix formatting * Remove leftover ifdef * Fix build error for clang OSX for mismatched type comparison * Refactor loops and conditionals to use blocks * Clang tidy/format pass * Add detailed comments for acquire context parameters * Add comments describing loader method exports and dynamically resolved function pointers Other minor cleanups * Change aborts to debug asserts for context parameters. Add error handling to acquire context. * Cache Vulkan descriptor sets and other shader module objects in compilation cache for reuse * Replace platform specific strncpy for grabbing Extension strings with StringUtils::copy_upto * Enable device features for selected device * Fix alignment constraints for to match Vulkan buffer memory requirements. Add env vars to control Vulkan Memory Allocator config. * Add Vulkan to list of supported APIs in README.md Add Vulkan specific README_vulkan.md * Clang tidy/format pass * Fix conform_alignment to handle zero values * Fix declaration of custom_allocation_callbacks to be static. Change to constexpr for invalid values * Whitespace change to trigger build. * Handle Vulkan kernels that don't require storage buffers. Updated test status. Fixes 7 test cases. * Add src/mini_vulkan.h Apache 2.0 license requirements to License file * Add descriptor set binding info as pre-amble to SPIR-V code module Fix shared memory allocation to use global variables in workgroup storage space Add extern calls for spirv and glsl builtins Add memory fence call to gpu thread barrier Add missing visitors to Vulkan CodeGen Add scalar index & vector index methods for load/store * Clang tidy & format pass * Update test results for Vulkan docs. Passing: 326 Failing: 39 * Fix formatting * Remove extraneous parentheses for is_array_type() * Add Vulkan library to linkage fo Halide generator helpers * Add SPIR-V formatted output (for debugging) * Only declare SIMT intrinics that are actually used. Cleanup & refactor add_kernel method. * Add Vulkan handler to test targets * Clang format/tidy pass * Add doc-strings to SPIR-V interface * Adjust runtime array to widest vector width based on alignment and dense vector loads/stores Fix scalar and vector load/stores Fix casts for vectors Add missing nan, inf, neg_inf, is_finite builtins * Add missing bitwise and logical and methods. Cleanups. * Add comments about necessary packages on Ubuntu v22.04 vs earlier versions * Clang tidy & format pass. * Update Vulkan test results. Pass: 329 Fail: 36 * Remove unused Produce/Consume visitor method * Fix Molten VK initialization to work with v1.3+ loader Add support for direct casts for same-size types Add missing mux, mix, lerp, sinh, tanh, etc intrinsics Add explicit storage access for variables Add a macro to enable debug messages in Vulkan Memory Allocator * Disable dynamic shared memory portion of test for Vulkan (since its not supported yet) * Disable uncached portion of test for Vulkan (since it may OOM) * Disable float64 support in Type::supports_type() for Vulkan target since it's not widely supported * Fix Shuffle to handle all known cases Hookup VulkanMemoryAllocator to gpu allocation cache. Fix if_then_else to allow calls and statements to be used Fix loop counter comparison, and don't allow dynamic loops to be unrolled. Fix scalarize to use CompositeInsert instead of VectorInsertDynamic Fix FMod to use FRem (cause SPIR-V's FMod doesn't do what you'd expect ... but FRem does?!) Use exact same sematics for barriers as GLSL Compute ... still not passing everything Fix SPIR-V block termination checks, keys for null constants, and other cleanups * Clang tidy & format pass * Update correctness test results. PASS: 338, FAIL: 27 * Move counter inside debug #define to fix build * Relax tolerance for newton's method to match other GPU APIs Skip gpu dynamic shared testfor Vulkan (since dynamic shared allocations aren't supported yet) Update correctness test status. PASS: 340, FAIL: 25 * Clang format/tidy pass * Skip Vulkan for float64 for correctness test round (since f64 is optional) * Skip Vulkan for tests that rely upon device crop, and slice. * Only test small vector widths for Vulkan (since widths >=8 are optional) * Caninicalize gpu vars for Vulkan * Fix loop initialization, and increments Add all explicit types, and fix constant declarations Add missing fast intrinsics Convert results of logical ops into expected types (instead of bools) * Add SpvInstruction::add_operands(), add_immediates() and template based append() Make integer logical operations explicit. Better handling of constant data. * Clang format & tidy pass * Fix windows build ... refactor convert_to_bool to use std::vectors rather than dynamic fixed sized arrays * Skip asyn_device_copy, device_buffer_copy, device_crop, and device_slice tests for Vulkan (for now). * Don't test large vector widths for Vulkan (since they are optionally supported) * Clear Vulkan buffer allocations prior to use (tbd if this is necessary) * Skip Vulkan for async copy chain test * Skip Vulkan for interpreter test * Clang tidy/format pass * Fix formatting * Fix build ... use error messages for errors * Separate shared memory resources by element type for Vulkan. * Add Vulkan to conditional for fusing gpu loops * Reorder reset method to match declaration ordering. * Cleanup debug log messages for Vulkan resources * Assert alignment is power of two * Only split regions that have already been freed. Add more debug messages to log * Explicitly cleanup Vulkan command buffers as after they are used Avoid recreating descriptor sets Tidy up Vulkan debug messages * Fix Div, Mod, and div_round_to_zero for integer cases Cleanup reset method * Skip Vulkan for async_copy_chain * Skip 64-bit values on Vulkan since they are optionally supported * Skip interleave_rgb for Vulkan (which doesn't support cropping) * Skip interpreter for Vulkan (which doesn't support dynamic allocation of shared mem). * Clang Tidy/Format pass * Handle calls to pow with negative values for Vulkan Add integer and float constant helpers to SPIRV * Only test real numbers for pow with Vulkan * Clang tidy/format pass * Fix logic so a region request of an entire block matches if exactly the same size as an empty block * Create a zero size buffer to check for alignment Return null handles after freeing * Add more verbose debug output for malloc * Fix UConvert logic to avoid narrowing an integer type less than 8 bits Remove optimization path for division which seems to fail worse than DIV Cleanup DIV and MOD operators * Clang format/tidy pass * Fix SConvert & UConvert ops * Add retain semantics to block allocator interface Update test to validate retain/release/reclaim functionality * Implement device_crop, device_slice and release_crop for Vulkan. Re-enable device_crop, device_slice and interleave_rgb tests. * Clang format/tidy pass * Implement device copy for Vulkan. Enable device copy test. * Clang format/tidy pass * Fix signed mod operator and use euclidean identity (just like glsl) * Clang format/tidy pass * Fix to handle Mod on vectors (use vector constant for bitwise and) * Fix pow operator for Vulkan, and re-enable math test to full range. * Add error checking for return types for conditionals Use bool types for ops that require them, and adapt to expected return types * Handle deallocation for existing regions prior to coalescing. Cleanup region allocator logic for availability. Augment block_allocator test to cover allocation reuse. * Clang tidy/format pass * Fix reserved accounting for regions * Add more details to Windows specific Vulkan build config * Update SPIR-V headers to v1.6 * Add support for dynamic shared memory allocations for Vulkan Add dynamic workgroup dispatching to Vulkan Add optional feature flags for Vulkan capabilities Add Vulkan API version flags for target features Enable v1.3 path if requested Re-enable tests for added features Update Vulkan docs with status updates and feature flags * Enable Vulkan asyc_device_copy test. * Disable Vulkan performance test for async gpu (for now). * Disable Vulkan from python AOT tests and tutorials (since it requires linkage against the vulkan loader system library). * Update Vulkan readme with latest status. Everything works! More or less. =) * Clang format pass * Cleanup formatting for Halide version info in Makefile * Fix typos and address review comments for Vulkan readme * Change value casts to match Halide conventions * Fix typos in comments * Add static_assert to rotl to make compilation errors clearer (instead of using enable_if) Fix debug(3) formatting to avoid super long messages Use lookup table for SPIR-V op code names * Fix typos and logic for Vulkan capabilities * Remove leftover debug ifdef * Fix typo in comments * Rename copy_upto(...) method to be copy_up_to(...) * Handle error case for uninitialized buffer allocation (rather than abort) Fix typos in comments * Support any arbitary number of devices and queues for context creation Fix typos in comments * Add get/set alloc_config methods and API hooks for configuring the VulkanMemoryAllocator * Remove leftover debug ifdef * Hookup API methods for get/set alloc_config when initializing the VulkanMemoryAllocator * Remove empty lines in main * Add required capability flags for 8-bit and 16-bit uniform and storage buffer access Handle casts for GLSL ops (spec requires all args to be the same type as the return type) * Add VkPhysicalDevice8BitStorageFeaturesKHR and related constants * Query for 8-bit and 16-bit uniform and storage access support. Enable these as part of the device feature query chain. * Use VK_WHOLE_SIZE for setting buffer (to pass validation ... otherwise size has to be a multiple of alignment) Remove useless debug asserts for static variables Fix debug logging messages for allocations of scalars (which may not have a dim array) * Query for device limits to enforce min alignment constraints for storage and uniform buffers * Fix shutdown sequence to iterate over descriptor sets Avoid bug in validation layer by reordering destruction sequence * Clang format & tidy pass * Fix logic for locating entry point shader binding ... assume exact match for entry point name Cleanup entry point binding variables and clarify usage * Remove accidentally uncommented debug statements * Cleanup debug output for buffer related updates * Fix split and allocate methods in region allocator to fix issues with alignment constraints - discovered a hang if requested size couldn't be fulfilled after adjusting to aligned sizes - cause was incorrect splitting of existing regions Cleanup region allocator iteration, cleanup and shutdown Added maximum_pool_size configuration option to Vulkan Memory Allocator to restrict pool sizes * Added notes about TARGET_VULKAN=ON being the default now Added links to LunarG MoltenVK SDK installer, and brew packages * Fix markdown formatting * Fix error code handling in Vulkan runtime and internal datastructures. Refactor all (well nearly all) return values to use halide error codes. Reduce the usage of abort_if() for recoverable errors. * Fix typo in error message * Fix typo in readme * Skip GPU allocation cache test on MacOSX since MoltenVK only supports 30 buffers to be allocated * Skip widening reduction test on Vulkan for Mac OSX/IOS since MoltenVK fails to translate calls with vector types for builtins like min/max. etc * Skip doubles in vector cast test on Vulkan for Mac OSX/IOS since Molten doesn't support them * Skip gpu_dynamic_shared and gpu_specialize test for Vulkan on Mac OSX/IOS since MoltenVK doesn't support the dynamic shared memory allocation or dynamic grid size. * Clang format / tidy pass * Resolve conflicts for mini_webgpu.h ... revert to main * Use unique intrinsic var names for each kernel Cleanup constant value declarations with template helper methods Add comments on workgroup size usage * Wrap debug output under ifdef DEBUG_RUNTIME_INTERNAL macro guard Add nearest_multiple constraint to block/region allocator * Add vk_clear_device_buffer utility method Add nearest_multiple constrating to vulkan memory allocatori + fixes correctness/multiple_outputs test Add vkCreateBuffer/vkDestroyBuffer debug output i + for gpu_object_lifetime_tracker Cleanup shutdown for shader_module destruction * Add note about nearest_multiple constraint for vulkan memory allocator * Hookup gpu_object_lifetime_tracker with Vulkan debug statements * Skip dynamic shared memory portion of test for Vulkan on iOS/OSX. * Fix stale comment for float type support. Fix incorrect lowering for intrinsic. --------- Co-authored-by: Derek Gerstmann <dgerstmann@adobe.com> Co-authored-by: Steven Johnson <srj@google.com>	25 April 2023, 00:21:15 UTC
fcddcf8	Marcos Slomp	24 April 2023, 13:13:21 UTC	metal : replacing `arg_sizes` by `arg_types` in kernel run interface (#7505) * replacing arg_sizes by arg_types * build fix * allocating and computing arg_sizes[] on the stack * clang-format * zero termination oopsie! * special case when argument is a buffer * telling runtime to pass argument types instead of argument sizes to the kernel run call * args[i] could well be 0! * removing arg_sizes[] * addressing code review comments --------- Co-authored-by: Marcos Slomp <slomp@adobe.com> Co-authored-by: Steven Johnson <srj@google.com>	24 April 2023, 13:13:21 UTC
e55834b	Steven Johnson	24 April 2023, 01:06:59 UTC	Fix Anderson2021 tests to avoid spurious failures on non-Cuda systems (#7518) * Fix Anderson2021 tests to avoid spurious failures on non-Cuda systems The Anderson2021 autoscheduler is pretty Cuda-specific, so some tests assume it is present; this is pretty much never true on macOS, and annoying spurious failures are annoying. This adds a new flag and capability to RunGenMain to try to sniff out the necessary runtime setup and make it a quiet [SKIP] failure when testing. * Use set instead of strstr() * Update LoopNest.cpp * Update RunGenMain.cpp * Update RunGenMain.cpp * Update RunGenMain.cpp * Update RunGenMain.cpp * trigger buildbots * Update RunGenMain.cpp	24 April 2023, 01:06:59 UTC
93a5887	Steven Johnson	20 April 2023, 17:58:33 UTC	Make stmt_html generation work correctly for submodules (#7522) * Don't erase stmt_html before resolving submodules * Fix stmt_html for submodules	20 April 2023, 17:58:33 UTC
294f80c	Andrew Adams	19 April 2023, 23:26:50 UTC	Forbid assigning to Buffer(Expr) by introducing an intermediate type. (#7517) * Forbid assigning to Buffer(Expr) by introducing an intermediate type. Fixes #7514 * Simpler solution * Silence clang-tidy	19 April 2023, 23:26:50 UTC
8670a25	Steven Johnson	19 April 2023, 18:28:11 UTC	Fix for top-of-tree LLVM (#7523)	19 April 2023, 18:28:11 UTC
2527c35	Steven Johnson	18 April 2023, 21:31:56 UTC	Don't accidentally embed .s files in .a files when emitting stmt_html (#7520) * Don't accidentally embed .s files in .a files when emitting stmt_html Followup fix for #7516 * format	18 April 2023, 21:31:56 UTC
42e71f2	Steven Johnson	18 April 2023, 16:28:26 UTC	Convert stmt_html output to use stmt_viz output (#7516) * Allow emitting `stmt_viz` without specifying `assembly` TL;DR: if we request `stmt_viz` without `assembly`, just generate the latter to a temp file that we dispose of later; this wasn't feasible before since we were previously requiring the assembly output to be generated with the same directory and basename as stmt_viz, but that was fixed. * Convert stmt_html output to use stmt_viz output Per discussion on #7507, this entirely removes the "classic" stmt_html output and replaces it with the "new" StmtToViz output. Using `compile_to_lowered_stmt` or requesting `stmt_html` will now always output the new output, and requesting `stmt_viz` output is no longer legal. (Note that this builds on top of #7515, which must be submitted first.) It's not clear to me whether https://github.com/halide/Halide/issues/7507#issuecomment-1511761706 is a blocker for this change, or a request to add back already-lost functionality. * Update Makefile * Update Generator.cpp	18 April 2023, 16:28:26 UTC
8efc688	Steven Johnson	17 April 2023, 22:22:44 UTC	Allow emitting `stmt_viz` without specifying `assembly` (#7515) TL;DR: if we request `stmt_viz` without `assembly`, just generate the latter to a temp file that we dispose of later; this wasn't feasible before since we were previously requiring the assembly output to be generated with the same directory and basename as stmt_viz, but that was fixed.	17 April 2023, 22:22:44 UTC
c9c85dc	Steven Johnson	16 April 2023, 00:43:43 UTC	Improve assembly-file finding logic in StmtToViz (#7513) (1) Avoid having to guess at location by just passing in the location, since we usually already know it. (2) If we don't know it, be more cautious when constructing it: the output html filename might not match our expectations, and all file extensions must use get_output_info() to work correctly on all platforms.	16 April 2023, 00:43:43 UTC
04f09d4	Andrew Adams	14 April 2023, 16:58:23 UTC	Add error message when casting multi-element Realization to Buffer (#7506) * Add error message when casting multi-element Realization to Buffer Fixes #7504 * Add missing test	14 April 2023, 16:58:23 UTC
e20d798	Svenn-Arne Dragly	13 April 2023, 19:17:43 UTC	Add build number to Python wheel before uploading (#7500) * Add build number to Python wheel before uploading This change adds a build number based on GitHub Actions' `github.run_id` to the Python wheel before uploading. This should work around the issue that causes the uploads to fail currently. Fixes #7293 * fixup! Add build number to Python wheel before uploading	13 April 2023, 19:17:43 UTC
bea0075	Steven Johnson	13 April 2023, 17:55:43 UTC	Deprecate ParamMap (#7121) (#7357) * Deprecate ParamMap (#7121) This PR deprecates ParamMap for Halide 16, with the plan of removing it entirely for Halide 17; it was added to provide a threadsafe way to provide parameteres to the JIT, but `compile_to_callable()` now does this in a much less intrusive way. * Updated comments, removed mutexes (mutices?) * formatting * Go back to HALIDE_ATTRIBUTE_DEPRECATED	13 April 2023, 17:55:43 UTC
e7f7860	Marcos Slomp	12 April 2023, 16:13:50 UTC	d3d12: enforce weak linkage (#7496) * ensuring all symbols are weak, or static constexpr, to allow for merging runtimes together * clang-format fluke --------- Co-authored-by: Marcos Slomp <slomp@adobe.com>	12 April 2023, 16:13:50 UTC
24f1bdd	Maaz Ahmad	12 April 2023, 05:08:37 UTC	Feature Enhancement: Halide IR HTML Visualization (#7421) * placeholder for IR visualization work from Darya Verzhbinsky * placeholder for IR visualization work from Darya Verzhbinsky * New Feature: Halide Program IR Visualizer (#7056) Thanks for the feedback everyone! I will merge this into the ir-viz branch and work on it to get it ready for a PR into main. * initial commit * updates * added curr_loop_depth and changed throws for assert(false) * split into header/cpp file and added test file * adding changes to move to adobe laptop * added git ignore to ignore .vscode * attemping to get add_custom_lowering_pass() to work, not working yet * Can now compile to stmt_viz files * moved files into main src folder and added them to Makefile * fixed lesson_01_basics.cpp * pushed updates - very messy code * got side colors working and hiararchy tree. ready for code cleanup * cleaned up code. ready for split into .h/.cpp files * quick comment change * switched everything into .h/.cpp files * added CostPreProcessor class and removed def of mutate * removed definitions of mutate * added data movement costs and bar at the top * changed location of tooltip so it doesn't overflow left * updated cost function for laod/store based on vector size and type * updated colors of hiararchy tree * logic for deciding context of variables (messy) * cleaned up code. waiting for marcos * added context coloring. cleaned up code a bit * collapse/expand on hiararchy working * got depth expansion working for hiararchy * cleaned up code * cleaned up code and renamed some funcs/vars * fixed let hierarchy code and added down arrow to button * dependency graph stuff (still massive and busy) * Minor fixes -- please review * prod/cons built with if stmts and for loops * exit early if running on a module w/ >1 func * added var dependency button to mail html * fixed `add` benchmark and made error printing better * changed `m_assert` to `internal_error` * cleanedup dependency graph * added error for non concrete bounds in prodcons hier * made arrows change btwn up/down depending on sit. * fixed text for ConsProd tables to have strings * added logic for non-set bounds for for loop * added TODO * added syntax highlight to strings and ints * added dotdotdot logic for collapsed children * fixed small bug where 2nd tree wasn't starting correctly * changed colors of ... nodes based on parent color * added if flowchart * added bools for printing different HTML parts of code * added different background colors per object * cleaned up borders of objects * fixed prodCons spacing and started allocate logic * removed border for ifthenelse table * implemented anchoring for prodcons tables * fixed empty if-stmts * open and close anchor are now right after one another * added filename logic for anchors and add blocks for func args * pass in FindStmtCost instead of reruning traversal * fixed comment * heatmap for prodConsHierarchy * fixed consume values (i think) and changed block colors * fixed allocate "!is_const_one(op->condition" error * fixed StmtSizes::visit(const For op) (variables were Add) removed nested-ifs logic (edge case we don't have to worry about) * changed table headers to only show loop interations and no bubbling up * get unique values for loads with ramp<int, int, int> only * fixed !is_const_one(op->condition) in Allocate * changed allocate table to Dim-1, Dim-2, etc * (1) store: changed cost (2) load: added global/local (3) allocate: vized memtype (4) prodConsTable: changed to read/write * BOOTSTAP! added navigation pannel at top * line numbers!!! and removed tooltip (for now) * changed style of info buttons * adjusted and added icons for see-code and info buttons * removed a comment * condensed cost color classes * fixed ifthenelse line numbers * fixed if if-else anchor names * changed prodCons from table to div * adding cost colors for prodConsViz to the left side of div * made long conditions "..." in ProdConsViz * adding spacing for prodCons start viz and dependency graph viz * tooltip!!!! (still a little ugly, but functioning!!) * removed tooltip arrow and changed background to white * removed arrow for tooltip and added if-stmt condition tooltip * added more tooltips to prodCons * getStmtHierarchy popup implemented :) !! * moved css var definitions to respective files * added bubble_up() and multiple modules * converted some stringstream to string + reordered module functions in viz * added getStmtHierarchy js working for expanding/collapsing * calculate color ranges once and not every time * should be added to previous commit * changed everything from IRMutator -> IRVisitor * side by side view on main page * added expand code / viz buttons functionality * attempting to switch GetStmtHierarchy to 1 tree with colors on side * should revert this change later, but need to for now (merging with main) * added Reinterpret + fixed double graph in StmtHierarchy * removed omg!!!! for reinterpret * changed border colors of stmtHierarchy + removed print statements * visualized assert + added colors to assert + made all info buttons next to colors * added resize bar * removed navigation code (sticking to 2col layout) * changed colors spans to buttons (removed segfault???) * visualize entire LetStmt and cleaned up GetStmtHierarchy.cpp * added more info in info-buttons * added hover to side colors in stmthierarchy * made collapse buttons resizeBar icons + put see_code_button top right of div * (1) added code to viz buttons (2) display all if-stmts, even if they are empty (3) fixed store highlight cost span * added scrollTo for function buttons within modules * changed info-button style * (1) added hover over for colors in stmtHierarchy (2) removed =default constructor/destructor (3) changed getStmtHierarchy to string html instead of stringstream * made sure updated StmtToHtml code was in StmtToViz code * small style changes * added see code/viz buttons for module functions * (1) loop size -> loop span (2) made function names big in viz (3) load types in name * removed inline style tags * added VectorReduce code for stmt hierarchy * removed scopeName hack to fix previous scope error (hope it's not happening anymore) * fixed scrollTo if code is hidden * removed commented out includes * changed costs to inclusive vs exclusive (still might be a bit broken) * removed print statements * fixed loop_depth = 0 error * (1) tooltips include inclusive and exclusive sizes (2) moved tooltip HTML to FindStmtCost.cpp * reworked tooltip style * (1) made getStmtHierarchy exclusive costs (2) viuslizing costs for IfThenElse blocks * visualize For and ProducerConsumer blocks * got collapse of code to show cumulative color cost * removed context span button * removed bubble up code and associated logic (now only read/write for loads and stores) * change some variables to read/write instead of prod/cons * fixed range bug * dense/strided vector load * removed inline TODO comments * added loop var for for loops * made loading MUCHHHH faster!!!! * (1) fixed function box width in viz (2) fixed collapse/expand button for functions in code * compile assembly if stmt_viz flag is given * starting assembly stuff * got assembly code button working (button is still ugly) * (1) made assembly button prettier (2) started information bar at top (need to fill in content of info popup) * added content to information bar button popup * (1) fixed if statement costs (2) added percentages to cost tables instead of values * removed output_file_name from ProducerConsumerHierarchy.h and related code in StmtToViz.cpp * fixed IfThenElse cost if there are nested ifs * removed dependency graph logic and files * made tooltip table input vector of pairs so that we can specify order * made tooltip table input vector of pairs so that we can specify order * added collapse/expand to viz on right * (1) collapseCode works now (2) search works in assembly tab * changed codemirror to ARM assembly highlighting * start of refactor: commenting and cleanup * fixed bug!!!!! i think. i hope !!! * removed Stmt function versions (never run this code on Stmt input, only module) * changed ProdCons stuff to IRViz * removed print line for strided vectors (seems to be working now) * fixed bug!!!!!! changed things back to stringstream, because that wasn't the issue * have helper functions return strings isntead of being void * end of refactor (for now) - changed variables from camelCase to snake_case * fix ... error for collapsing nodes * fixed div issue + tooltips not being correct location * added error message for multiple modulse (doesn't currently support) * fixed spacing for boxBody divs * removed submodules logic because it's not supported right now * made assembly marker generation more accurate (added counters to have marker names be unique) * got 3 columns resizing mostly working (just a little glitchy, good enough for now) * got assembly button to populate assembly, kind of working * added assemblyInfoViz.h to makefile * fixed resize bars for 3 different visualizations * fixed linewrapping issue with codemirror * updated spacing for IRVisualization buttons in header * (1) fixed functionBox button sizing (2) dense vector load -> [Dense, Vector] load * fixed informationBar spacing * updated InformationBar content * removed current_loop_depth from consideration of cost * changed cost table tooltip: inclusive: show %, exclusive: show raw cost * simplified cost model * (1) updated InformationBar w/ info for assembly (2) added assembly by default to third col * added logic to collapseVizAssembly if curson passes resizeBar * moved all color range + tooltip logic into IRVisualization * fixed get_combined_color_range() error * reordered js/css strings * changed format and slighly changed content of cost tooltips * refactor: .h and .cpp files have same order * refactor: added comments * refactor: updated internal_error messages * fixed small import / #ifndef typo * updated get_loop_iterator to include more binary ops for extent * reverting some changes I made to get ready for PR * adding CMake build * refactoring namespace scoping * refactoring "endl" * refactoring header guards and includes * const vector reference * ostringstream all the things! * having a symbol for "canIgnoreVariableName" * string -> char, with raw string literal internal_error -> internal_assert() * clang-format * if-else chain to switch-case block * Upgrade wabt to 1.0.30 (#7058) * Add support for float16 buffer in python extension (#7060) * run clang-tidy and clang-format * run clang-tidy & clang-format * run clang-tidy and clang-format, again * run clang-tidy and clang-format, Phaze III * Minor PR Revision - If `stmt_viz` flag is used without the `assembly` flag, the compiler throws an error. - GetAssemblyInfoViz.cpp: replace regex with replace_all - GetSttmtHierarchy.cpp: Bug fix (line 721). Use raw strings for large literals - Restricted scope of default statement values - Added enum type for StmtCostModel. A single cost model config value is specified, instead of multiple booleans. * reminder for later --------- Co-authored-by: Darya Verzhbinsky <dverzhbinsky@adobe.com> Co-authored-by: Maaz Ahmad <maaz.c10@gmail.com> Co-authored-by: Marcos Slomp <slomp@adobe.com> Co-authored-by: Steven Johnson <srj@google.com> Co-authored-by: Steve Suzuki <shinsuke.suzuki@arm.com> Co-authored-by: Marcos Slomp <mslomp@gmail.com> * Refactor 1/N: Moved all static html, css and js code outside of the cpp files to reduce noise * Refactor 2/N: Broke down StmtToViz Class to two simpler classes - Disabled legacy StmtToViz implementation - Introduced three new classes instead: - IRVisualizer: generates the output page (WIP) - HTMLCodePrinter: prints IR code in HTML (Implementation complete) - HTMLVisualizationPrinter: visualizes IR code in HTML (WIP) * Refactor 3/N: Minor stylesheet and javascript improvements Thorough refactor of css and js pending * Fixed resize bars not working properly * Refactoring Viz tab (WIP) * Refactored 4/N: Visualization tab complete Refactored all visualization logic into a single HTMLVisualizationPrinter class * Refactor 5/N: Javascript and css cleanup - Visual improvements - Deleted unnecessary code - Refactored most of js code * Refactor 6/N: JS and CSS refactor complete - Deleted unused code - Simplified remaining code * Refactor 7/N: Added assembly support - Added assembly tab functionality - Jump to assembly buttons added - Tooltips added for all buttons * Refactor 8/N: Deleted info bar * Refactor 9/N: Deleting code These classes were simplified and refactored into the new StmtToViz file. * Remove stale files from build * Refactor 10/N: Cost model simplified and re-activated - Cost model is much simpler now - Re-integrated cost model into code tab - Re-integration to viz tab pending * Delete stale cost model code * Refactor 11/N: Reintegrate cost model into visualization * Minor fixes * Update CMakeLists.txt * Deleting more stale files * Improved documentation for new code in Codegen_LLVM * Static HTML, CSS and JS is now stored as large strings Avoids build shenanigans. * Fix: Build error for unused variable * Deleting dead code * Stmt visualizer should not run on submodules * Ran clang-format on the PR * Ran clang-tidy on the PR * Move boilerplate JS/CSS code into template files * CMake Build fix: Typo * Minor bug fix * Renamed variable to avoid any keyword conflict * Style fixes - Removed underscore-prefix on member variables - Fixed typos in documentation - Fixed indentation in Makefile * Replacing `internal_assert(false) << ...` with `internal_error << ...` * Style improvement: std prefix consistency * Minor fix: variable had greater scope than necessary * Renamed `datamovement` to `data_movement` for readability * Constructor fixes for HTMLCodePrinter, HTMLVizualizationPrinter and IRVisualizer - Made single parameter constructors explicit - Made HTMLCodePrinter and HTMLVizualizationPrinter non-moveable and non-copyable * Typo * Clang-formatting * Undo accidental Makefile change * Improved comments - Fix type in fn name `compute_all_costs` - Comments describing class are now in format consistent with rest of the codebase - Improved comment describing `print_cuda_gpu_source_kernels` * Replaced spaces with underscores in regex markers printed in assembly * Minor bug fix for Halide IR Visualizer Synchronization button from Viz to Code was not working for tail `else` cases. * Clang format fix * Assign deterministic node IDs for reproducability * clang-format and clang-tidy * Update StmtToViz.cpp * Minor formatting fix * Bug fix: ProducerConsumer IDs were not generated correctly --------- Co-authored-by: Marcos Slomp <slomp@adobe.com> Co-authored-by: darya-ver <darya99@gmail.com> Co-authored-by: Darya Verzhbinsky <dverzhbinsky@adobe.com> Co-authored-by: Steven Johnson <srj@google.com> Co-authored-by: Steve Suzuki <shinsuke.suzuki@arm.com> Co-authored-by: Marcos Slomp <mslomp@gmail.com>	12 April 2023, 05:08:37 UTC
774359a	Andrew Adams	11 April 2023, 23:40:26 UTC	Pacify clang-tidy (#7498)	11 April 2023, 23:40:26 UTC
7b96356	郑启航	11 April 2023, 20:57:12 UTC	fix python binding method export (#7494)	11 April 2023, 20:57:12 UTC
264c440	Steven Johnson	11 April 2023, 17:17:57 UTC	Clean up various autoscheduler tool issues (#7483) * Clean up various autoscheduler tool issues - In CMake, not all of the tooling for Anderson2021 was included in packaging; now it is. - Since the featurization_to_sample and get_host_target tools are 100% identical across all autoschedulers, they now get build only once, and don't get the `adams2019_` (etc) prefix they formerly did. - conversely, the autotune_loop.sh and weightsdir_to_weightsfile tools now do get autoscheduler-specific prefixes, because they aren't interchangeable. - Fixes to various scripts etc to use the correct tool names. - When the Anderson2021 autoscheduler was added, we tried to factor out common coding and tooling, but went a bit too far: the weightsdir_to_weightsfile source was moved into common/, and while it and the related Weights source were identical, they needed to include different Featurization.h files, and so basically relied on the build rules building it twice, with a different include path each time. IMHO this is unreasonably fragile and weird, so I moved those sources back into the folders in question, as there isn't any compelling reason to keep these sources exactly in sync anyway. - Same story for the test_function_dag test. Note that I started to add support for Anderson2021 to the main Makefile, but it became painful to do -- it works fine for CMake, so I will leave adding Make support for someone who wants to use it there. * Fix needless renames * Remove scalpel left in patient * Update CMakeLists.txt * Update Makefile	11 April 2023, 17:17:57 UTC
93e3d39	Marcos Slomp	11 April 2023, 16:32:54 UTC	d3d12 runtime: replacing spinlocks by mutex objects (#7489) * replacing spinlock by mutex * adding back weak linkage hint --------- Co-authored-by: Marcos Slomp <slomp@adobe.com>	11 April 2023, 16:32:54 UTC
d031408	Steven Johnson	06 April 2023, 18:26:56 UTC	Turn on WITH_UTILS in CMakePresets.json for 'package' preset (Fixes #7465) (#7479)	06 April 2023, 18:26:56 UTC
d2b320d	Luke Anderson	04 April 2023, 18:25:24 UTC	Use static dimensions in autoscheduler test generators (#7475)	04 April 2023, 18:25:24 UTC
582fb49	Andrew Adams	04 April 2023, 17:33:48 UTC	Increase test threshold for mullapudi histogram test (#7474) It uses fine-grained parallelism, which has a very noisy runtime.	04 April 2023, 17:33:48 UTC
df354c5	Luke Anderson	04 April 2023, 02:53:11 UTC	Add GPU autoscheduler (#6856) Add Anderson2021 GPU autoscheduler	04 April 2023, 02:53:11 UTC
ba590df	Steven Johnson	03 April 2023, 22:15:20 UTC	Port #6869 to PyCallable (Fix #7213) (#7472) Same issue was present in the JIT wrappers for Python; this is the same fix, adapted for JIT.	03 April 2023, 22:15:20 UTC
56490e0	Steven Johnson	03 April 2023, 17:04:52 UTC	Disable clang-format in mini_webgpu.h (#7468) In some configurations, clang-format insists on reformatting this file, despite its presence in .clang-format-ignore; at this point it's easier to 'fix' this by inserting `// clang-format off` rather than debug further.	03 April 2023, 17:04:52 UTC
62bc0f3	Andrew Adams	03 April 2023, 16:14:24 UTC	Avoiding having shader kernels depend on host arch/os by unsetting it (#7470)	03 April 2023, 16:14:24 UTC
ec51838	Steven Johnson	30 March 2023, 00:49:29 UTC	Remove unreachable code to pacify clang-tidy (#7462)	30 March 2023, 00:49:29 UTC
b6b15ac	Andrew Adams	29 March 2023, 21:17:06 UTC	Alternative approach to deprecating internal fixed-point intrinsics (#7461)	29 March 2023, 21:17:06 UTC
95b8543	Steven Johnson	29 March 2023, 17:43:12 UTC	Fix PseudoExpr for FuncRef (followup to #7446) (#7458) * Remove references to deprecated variants of fixed-point operators Fix PseudoExpr for FuncRef (followup to #7446) * format * Update pool_generator.cpp	29 March 2023, 17:43:12 UTC
d06498c	Andrew Adams	29 March 2023, 16:46:45 UTC	Remove apparently pointless shift in autoscheduler tutorial (#7455) Fixes #7451	29 March 2023, 16:46:45 UTC
6865960	Steven Johnson	29 March 2023, 03:01:34 UTC	Remove references to deprecated variants of fixed-point operators (#7457)	29 March 2023, 03:01:34 UTC
7ab0e8f	Andrew Adams	28 March 2023, 21:23:47 UTC	Skip simd_op_check for disabled targets (#7452) Co-authored-by: Steven Johnson <srj@google.com>	28 March 2023, 21:23:47 UTC
82ae713	Steven Johnson	28 March 2023, 18:16:16 UTC	Fix for top-of-tree LLVM (#7453)	28 March 2023, 18:16:16 UTC
1fb9293	Steven Johnson	28 March 2023, 17:04:13 UTC	Remove defunct Make and .gitignore for the external_code tests, now l… (#7449) Remove defunct Make and .gitignore for the external_code tests, now long gone	28 March 2023, 17:04:13 UTC
d57b53e	Steven Johnson	28 March 2023, 01:01:33 UTC	Fix correctness_pytorch for injection from #7443 (#7450)	28 March 2023, 01:01:33 UTC
55edef8	Steven Johnson	27 March 2023, 22:34:16 UTC	Use existing Halide Runtime atomic wrappers everywhere in the runtime (#7429) * Use existing Halide Runtime atomic wrappers everywhere in the runtime We currently have a set of wrappers around the __atomic/__sync primitives used by our threading model; for various reasons, we desire to use the (deprecated) __sync primitives for 32-bit builds instead of the __atomic primitives (see https://github.com/halide/Halide/pull/7427 for some discussion). This PR attempts to use these abstractions everywhere else in our runtime, for consistency, so that 64-bit builds consistent use the __atomic primitives for (e.g.) profiling and tracing too. This meant: - Splitting the wrappers into a new header (runtime_atomics.h), as synchronization_common.h can't be included into arbitrary other files, for valid reasons - Adding wrappers for the necessary primitives - Modifying the code elsewhere in the runtime Where new wrappers were needed, I generally defaulted to assuming that SEQ_CST was the safest memory order to use. Not entirely sure if this is a worthwhile goal or not, but putting this out there for consideration and discussion. * Update runtime_atomics.h	27 March 2023, 22:34:16 UTC
46c48b7	Steven Johnson	27 March 2023, 22:33:32 UTC	Ensure that return values from runtime calls are checked (#7403) * Ensure that return values from runtime calls are checked Fixes a handful of places that should have checked the error-code result from explicit calls to the runtime, but weren't. Also, drive-by change to HashMap::store(), which returned an int but was incapable of returning anything but zero -- changed to just return void. * fixes * trigger buildbots * trigger buildbots	27 March 2023, 22:33:32 UTC
231c88b	Steven Johnson	27 March 2023, 22:28:44 UTC	Cleanups in runtime/device_interface.cpp (#7408) * Cleanups in runtime/device_interface.cpp (Harvested from an experimental CL) - Add `UseModule` helper to make it easier to balance `use_module()` and `release_module()` - add `call_device_interface()` helper to make it easier to call device_interface functions safely - convert the one usage to `halide_abort_if_false()` to `halide_error() + return error` - drive-by changes from `0` to `halide_error_code_success` * trigger buildbots * trigger buildbots * Add halide_debug_assert * trigger buildbots * Update device_interface.cpp	27 March 2023, 22:28:44 UTC
4dc9ce5	Steven Johnson	27 March 2023, 21:47:07 UTC	Mark the Halide pipeline structs as aligned(8) (#7428) When compiling for 32-bit, LLVM assumes that these structs are only 4-aligned (since alignof(uint64_t) == 4 for x86-32), which means some atomic operations on these structs may require library calls. Since these structs are always malloc'ed, and malloc on all our platforms will return an 8-aligned pointer, we can improve this by telling Clang that the struct will always be at least 8-aligned.	27 March 2023, 21:47:07 UTC
d92cec9	Volodymyr Kysenko	27 March 2023, 21:42:18 UTC	Moves OptimizeShuffles pass into separate file (#7447) * Moves OptimizeShuffles pass into separate file * Update comment * Revert changes to src/runtime/mini_webgpu.h	27 March 2023, 21:42:18 UTC
37ac255	Andrew Adams	27 March 2023, 19:34:20 UTC	Promote fixed-point intrinsics out of the Internal namespace (#7446) * Promote fixed-point intrinsics out of the Internal namespace and add deprecated wrappers for them in the Internal namespace so that we don't break any existing code * Pacify clang-tidy * Remove HALIDE_NO_USER_CODE_INLINE	27 March 2023, 19:34:20 UTC
7976d05	Yongqi	27 March 2023, 17:00:23 UTC	Fix bugs in PyTorch codegen. (#7443)	27 March 2023, 17:00:23 UTC
ab5f042	Andrew Adams	25 March 2023, 17:43:11 UTC	Compute comparison masks in narrower types if possible (#7392) * Compute comparison masks in narrower types if possible * Remove reliance on infinite precision int32s * Further elaborate on comment * Lower signed saturating_add and sub to unsigned math The existing lowering was prone to overflow * cast -> reinterpret	25 March 2023, 17:43:11 UTC
2a51f71	Andrew Adams	24 March 2023, 18:13:36 UTC	Use pmaddubsw for non-RDom horizontal widening adds (#7440)	24 March 2023, 18:13:36 UTC
4fa913e	Volodymyr Kysenko	23 March 2023, 16:46:58 UTC	Add missing #include <exception> (#7445)	23 March 2023, 16:46:58 UTC
4cb6dba	Andrew Adams	19 March 2023, 20:55:40 UTC	Redo CPU schedule for bilateral grid (#7436)	19 March 2023, 20:55:40 UTC
badf486	Steven Johnson	17 March 2023, 17:41:48 UTC	Disable performance_boundary_conditions under WebGPU pending #7420 (#7435)	17 March 2023, 17:41:48 UTC
643b2f1	Steven Johnson	16 March 2023, 21:42:34 UTC	Modify runtime calls to always return a valid halide_error_code_t value (#7404) * Modify runtime calls to always return a valid halide_error_code_t value Currently, the return values from our runtime code are a mishmash -- there's lots of code that returns any random nonzero value to indicate an error. This isn't wrong per se, but it's not clean, and it's desirable that the return values are predictable. This PR doesn't change the call signature of any (public) Halide Runtime functions, but modifies the internal logic so that all return values are valid values of `enum halide_error_code_t`. Generally, there should be minimal change to the code otherwise, although I did leave in a few drive-by changes that I couldn't resist (e.g., better error-checking when dynamically loading symbols). My long-term goal here is to eventually propose changing the signature of runtime functions that return errors to actually return `enum halide_error_code_t`; as you might imagine, making that transition might be controversial for a number of reasons. This PR is intended to be a way to make such a future transition easier to reason about, while arguably improving the code quality of the runtime slightly. * tidy * Update opencl.cpp * trigger buildbots * trigger buildbots * Fix merge mistake * Update gpu_context_common.h * Update cuda.cpp * Werror * status_ * if-with-initializer format * Update cuda.cpp * Update opencl.cpp * Update cuda.cpp * Update device_interface.h * Update hexagon_cache_allocator.cpp * Update printer.h * remove prefixes * "device field is already non-zero" * Update opencl.cpp * Update cuda.cpp * Fix error spacing * trigger buildbots	16 March 2023, 21:42:34 UTC
88b3ef8	Steven Johnson	16 March 2023, 21:35:52 UTC	Split WebGPU runtime into two variants (#7248 workaround) (#7419) * Split WebGPU runtime into two variants (#7248 workaround) Halide promises that you can crosscompile to any supported target from a 'stock' build of libHalide. Unfortunately, the initial landing of WebGPU support breaks that promise: we compile the webgpu runtime support (webgpu.cpp) with code that is predicated on `WITH_DAWN_NATIVE` (for Dawn vs Emscripten, respectively). This means that if you build Halide with `WITH_DAWN_NATIVE` defined, you can only target Dawn with that build of Halide; similarly, if you build with `WITH_DAWN_NATIVE` not-defined, you can only target Emscripten. (Trying to use the 'wrong' version will produce link-time errors.) For people who build everything from source, this isn't a big deal, but for people who just pull binary builds, this is a big problem. This PR proposes a temporary workaround until the API discrepancies are resolved: - Compile the existing webgpu.cpp runtime both ways - in LLVM_Runtime_Linker.cpp, select the correct variant based on whether the Target is targeting wasm or not - Profit! This is a rather ugly hack, but it should hopefully be (relatively) temporary. * A few more fixes * Update HalideGeneratorHelpers.cmake * Update interpreter.cpp * Update interpreter.cpp	16 March 2023, 21:35:52 UTC
cc74ee8	Steven Johnson	16 March 2023, 18:22:23 UTC	Move long boilerplate C/C++ code into template files (#7426) * Move long boilerplate C/C++ code into template files Codegen_C has a couple of long strings with boilerplate code that is conditionally emitted; at least one of these is too long for a single string literal under MSVC. Let's try moving these into standalone files instead; this may make it easier to use conventional tooling on the C++ code, and make Codegen_C easier to read and think about. (Note that the not-yet-landed Xtensa branch should also use this approach, if we decide this approach is good, since it has even more such code.) TODO: probably would be good to augment `binary2cpp` to allow option comments in the source file that are stripped in the output file (e.g. "This file is used in CodeGen_C.cpp for blah blah blah, look out for blah") * Remove detritus	16 March 2023, 18:22:23 UTC
50f8c85	Steven Johnson	15 March 2023, 17:16:01 UTC	Disable correctness_atomics on Windows with Cuda, alas (#7423) (#7424)	15 March 2023, 17:16:01 UTC
ae59b91	Steven Johnson	14 March 2023, 20:56:33 UTC	Ignore assertions inside WebGPU kernels (#7418) This is the approach that (e.g.) the OpenCL backend takes to assertions inside kernel code.	14 March 2023, 20:56:33 UTC
e966163	Steven Johnson	14 March 2023, 20:33:34 UTC	Log name of failing function for !function_takes_user_context (#7417)	14 March 2023, 20:33:34 UTC
05fa61a	Steven Johnson	14 March 2023, 18:40:40 UTC	Fix for top-of-tree LLVM (#7416)	14 March 2023, 18:40:40 UTC
9199849	Steven Johnson	14 March 2023, 18:01:46 UTC	A few minor cleanups in WebGPU backend (#7413) * A few minor cleanups in WebGPU backend Mostly just using using halide_error_code_t values everywhere. * trigger buildbots * Update webgpu.cpp	14 March 2023, 18:01:46 UTC
b63139e	James Price	14 March 2023, 17:01:25 UTC	Update mini_webgpu.h with latest changes from Dawn (#7415)	14 March 2023, 17:01:25 UTC
d383cb9	James Price	14 March 2023, 16:58:12 UTC	Fix null device crash during WebGPU initialization (#7414) If RequestDevice fails, make sure we exit initialization early instead of trying to create a staging buffer with a nullptr device.	14 March 2023, 16:58:12 UTC

Newer
Older