Revision history - refs/heads/vksnk/lerp-intrinsics - origin: https://github.com/halide/Halide

visit type:

Revision	Author	Date	Message	Commit Date
c1fb3e0	Volodymyr Kysenko	18 November 2021, 17:59:17 UTC	Comment	18 November 2021, 17:59:17 UTC
519930d	Volodymyr Kysenko	17 November 2021, 23:22:34 UTC	Rewrite integer lerp using intrinsics	17 November 2021, 23:22:34 UTC
16fa3ce	Steven Johnson	12 November 2021, 23:17:53 UTC	[hannk] Pacify clang-tidy (#6412) * [hannk] Pacify clang-tidy * One more ASAN fix We must use use_global_gc = false to work properly with the JIT * Revert "One more ASAN fix" This reverts commit 9ed07a70b4a656790236a5ff6966155df823a319. * Rework Op::mutate() to avoid UB	12 November 2021, 23:17:53 UTC
b63f6af	Steven Johnson	12 November 2021, 20:56:57 UTC	[hannk] Fix lower_tflite_fullyconnected (#6414) Fixed the bounds calculation in lower_tflite_fullyconnected() to preserve the invariants expected, and added a testcase that previously failed.	12 November 2021, 20:56:57 UTC
8c2dd5f	Steven Johnson	12 November 2021, 20:34:14 UTC	One more ASAN fix (#6413) We must use use_global_gc = false to work properly with the JIT	12 November 2021, 20:34:14 UTC
0153c6b	Steven Johnson	12 November 2021, 16:35:37 UTC	Revamp Hannk IR (#6379) Refactor Hannk IR and transforms to use a Mutator-based approach	12 November 2021, 16:35:37 UTC
79da2a0	Steven Johnson	12 November 2021, 16:34:30 UTC	Fix broken ASAN code (#6408) * Fix broken ASAN code Various changes and merges ended up with us using multiple ASAN passes, which was pretty crashy (we just didn't notice because it isn't tested well enough on our buildbots, but is elsewhere). I think we really only want to use the ModuleAddressSanitizerPass (not the non-Module version), which is what Clang does. * set UseAfterScope = true	12 November 2021, 16:34:30 UTC
02a394d	Steven Johnson	12 November 2021, 03:25:52 UTC	x86_cpuid_halide must preserve all 64 bits of rbx/rsi (#6409) The existing code attempts to preserve ebx (since the cpuid instruction can trash it), but it only preserves the lower 32 bits; on 64-bit systems, this (amazingly) usually works OK unless you are compiling in (e.g.) ASAN mode, which can subtly change codegen such that the full 32 bits of rbx must be preserved. I'm genuinely astonished this hasn't bitten us before now!	12 November 2021, 03:25:52 UTC
d763406	Volodymyr Kysenko	12 November 2021, 01:30:05 UTC	Change implementation of round_f* in CodeGen_C to use nearbyint to match CodeGen_LLVM (#6406)	12 November 2021, 01:30:05 UTC
9ff87ce	Steven Johnson	11 November 2021, 18:04:09 UTC	_halide_buffer_crop() needs to check for runtime failures (v2) (#6403) * _halide_buffer_crop() needs to check for runtime failures (v2) (Alternate to #6402) We currently assume that _halide_buffer_crop() will never fail. This is a bad assumption, as it can call device_crop(), which can fail due to unexpected runtime errors, or from a backend simply leaving the device_crop field at the default (unimplemented) case (as is currently the case for the OGLC backend). When this happens, the dst buffer was left in an inconsistent, invalid state (which was what led to the crashes fixed by #6401). This change modifies _halide_buffer_crop() to return nullptr in the event of an error, and ensure that all cropped buffers are checked for null at the right point. (This is not optimal, of course, since the specific error returned by device_crop is getting dropped on the floor, but the existence of an error is no longer ignored.) This addresses at least some of the failure issues we are seeing in performance_async_gpu with the OpenGLCompute backend. (Also: drive-by whitespace fix in CodegenC) * Oops	11 November 2021, 18:04:09 UTC
d343e76	Andrew Adams	11 November 2021, 17:06:00 UTC	Fix obscure bug in widening let substitution (#6405) Fix obscure bug in widening let substitution	11 November 2021, 17:06:00 UTC
8e34a35	Steven Johnson	09 November 2021, 23:10:40 UTC	Remove halide_abort_if_false() usage in runtime/metal (#6398) * Remove halide_abort_if_false() usage in runtime/metal This converts all the usage of `halide_abort_if_false()` in runtime/metal into either an explicit runtime check-and-return-error-code (if the check looks plausible), or `halide_debug_assert()` (if the check seems to be stating an invariant that shouldn't be possible in well-structured code). These changes are admittedly subjective, so feedback is especially welcome. Also, driveby change to sync-common.h to use `halide_debug_assert()` rather than a local equivalent. * nits	09 November 2021, 23:10:40 UTC
4f70271	Steven Johnson	09 November 2021, 22:51:05 UTC	Add defensive checks to halide_buffer_copy_already_locked (#6401) Found while debugging crashes with performance_async_gpu for OpenGLCompute: the 'if' tree wasn't robust enough for malformed buffers being passed, and could attempt to deref and use a null src->device_interface or dst->device_interface in some cases. This patch just improves this function to return an error in these cases (rather than crashing); the fact that we are getting malformed buffers passed to us is likely a separate bug.	09 November 2021, 22:51:05 UTC
b189722	Steven Johnson	09 November 2021, 21:35:24 UTC	[hannk] Upgrade hannk to use TFLite 2.7.0 by default (#6393) * [hannk] Upgrade hannk to use TFLite 2.7.0 by default * Fix unused-vars warnings	09 November 2021, 21:35:24 UTC
b021f87	Steven Johnson	09 November 2021, 21:25:26 UTC	Move PyTorch test into standalone tests (#6397) * Move PyTorch test into standalone tests It doesn't need to be internal. Also simplified to use only public API, updated the expected correctness, and avoided the need to have cuda present on the system to test for cuda output (since we can cross-compile to generate the C++ output anywhere). * fixes * Fix Windows text file endings * Update pytorch.cpp * Update pytorch.cpp	09 November 2021, 21:25:26 UTC
4286c78	Steven Johnson	09 November 2021, 17:13:09 UTC	Drop support for LLVM11 (#6396) * Drop support for LLVM11 With Halide 13 released, we should drop support for LLVM11 in Halide trunk, since we only promise to support LLVM trunk + two releases. * Update packaging.yml * Update config.cmake * Update CMakeLists.txt	09 November 2021, 17:13:09 UTC
d3ea755	Steven Johnson	09 November 2021, 17:03:17 UTC	Fix OGLC debug builds (#6399) If you try to build and run something with `openglcompute` and `debug`, you may crash with a div-by-zero, because the openglcompute runtime never calls `halide_start_clock()`, and all implementations of `halide_current_time_ns()` assume that it has been called. On (e.g.) OSX, this results in div by zero. This fixes it by inserting the correct call into openglruntime.cpp, and also adding debug-only asserts to all the `halide_current_time_ns()` implementations. (I was tempted to fix this by removing `halide_start_clock()` entirely and just lazily initing the initial value in `halide_current_time_ns()`, but I figured that would likely get pushback...)	09 November 2021, 17:03:17 UTC
d6f1345	Steven Johnson	08 November 2021, 23:13:13 UTC	Rename halide_assert -> halide_abort_if_false (#6382) * Rename halide_assert -> HALIDDE_CHECK A crashing bug got mistakenly inserted because a new contributor (reasonably) assumed that the `halide_assert()` macro in our runtime code was like a C `assert()` (i.e., something that would vanish in optimized builds). This is not the case; it is a check that happens in all build modes and always triggers an `abort()` if it fires. We should remove any ambiguity about it, so this proposes to rename it to somethingmore like the Google/Abseil-style CHECK() macro, to make it stand out more. (We may want to do a followup to verify that all of the uses really are unrecoverable errors that aren't better handled by returning an error.) * clang-format * Fix for top-of-tree LLVM * Fix for older versions * HALIDE_CHECK -> halide_abort_if_false * Update runtime_internal.h	08 November 2021, 23:13:13 UTC
1312817	Steven Johnson	08 November 2021, 22:01:29 UTC	Clean up CodeGen_LLVM names to match ASAN nomenclature changes (#6395)	08 November 2021, 22:01:29 UTC
6071cf6	Steven Johnson	08 November 2021, 20:13:44 UTC	Check results of all runtime function calls (#6389) * Check results of all runtime function calls This cherry-picks just the changes to callsites internal to Halide (and tests) from #6388. (It doesn't attempt to annotate runtime functions to enforce checking the results.) * Update write_debug_image.cpp * Add checks + comment to buffer_copy_aottest * Add comment to gpu_object_lifetime_aottest * Update memory_profiler_mandelbrot_aottest.cpp * Update user_context_insanity_aottest.cpp * Update process.cpp	08 November 2021, 20:13:44 UTC
a798909	Steven Johnson	08 November 2021, 20:13:28 UTC	Add halide_debug_assert() macro (#6390) * Add halide_debug_assert() macro Also convert usage of halide_assert()/HALIDE_CHECK() in hashmap.h and gpu_context_common.h to halide_debug_assert(), as all the usages looked to be appropriate for debug-mode only. (Rebased version of #6385, which this replaces) * appease clang-format	08 November 2021, 20:13:28 UTC
656c6b5	Steven Johnson	04 November 2021, 23:00:28 UTC	[hannk] Have CMake emit .s, .stmt, .ll files (#6392)	04 November 2021, 23:00:28 UTC
26ccb54	Omar Emara	04 November 2021, 00:35:13 UTC	Support vectorized Select in OpenGLCompute backend (#6371) The ternary operator in GLSL does not work with vector types. While the mix function have overloads to boolean vectors, it is only supported in version 4.5, so it is not exactly portable. To work around this, we use the ternary operator on all elements of the vector type. Necessary for #6348.	04 November 2021, 00:35:13 UTC
c005b9f	Omar Emara	04 November 2021, 00:32:20 UTC	Support vectorization in OpenGLCompute backend (#6348) * Support vectorization in OpenGLCompute backend This patch adds support for vector load and store operations. First, a pass identifies the buffers whose loads and stores are all dense, aligned, and have the same number of lanes. Such buffers are declared with a vector base type and accessed accordingly. Loads and stores that do not satisfy those criteria are implemented as gathers and scatters from buffers whose base type is scalar. Resolves #4976. Partially resolves #1687. * Move buffer name instead of copy (clang-tidy)	04 November 2021, 00:32:20 UTC
657bb03	Steven Johnson	03 November 2021, 23:29:38 UTC	Fix for top-of-tree LLVM (#6386) * Fix for top-of-tree LLVM * Fix for older versions	03 November 2021, 23:29:38 UTC
76315a2	Omar Emara	03 November 2021, 22:54:44 UTC	Vectorize Ramp in OpenGLCompute backend (#6372) Currently, ramps are generated as a number of independent scalar expressions that are finally gathered into a vector. For instance, indexing in vectorized code is filled with ramps like the following: ``` int _11 = int(1) * int(1); int _12 = _10 + _11; int _13 = int(2) * int(1); int _14 = _10 + _13; int _15 = int(3) * int(1); int _16 = _10 + _15; ivec4 _17 = ivec4(_10, _12, _14, _16); ``` This patch simplifies the generated code using a multiply add expression on a vector containing an arithmetic expression, such that the code is as follows: ``` ivec4 _11 = ivec4(0, 1, 2, 3) * int(1) + _10; ``` This is more performant due to vectorization, more compact, and more readable because the base and the stride are easily identifiable.	03 November 2021, 22:54:44 UTC
2cf3afb	Steven Johnson	03 November 2021, 22:47:03 UTC	[hannk] Fix MeanOp (#6336) * [hannk] Fix MeanOp The `reducing()` method didn't handle negative values for indices, and didn't reverse the value of the axis as we do elsewhere, so results were incorrect. Also, we now parse and save the value of `keep_dims`, though I can't find evidence that it does much of anything: test cases pass different values for it but none of them fail (even though we ignore it), and at least one reference implementation I see doesn't seem to do anything with it. * Remove keep_dims handling for MeanOp	03 November 2021, 22:47:03 UTC
7ec8d70	Steven Johnson	03 November 2021, 22:19:15 UTC	Convert various halide_assert -> static_assert (#6383) The type-size checks in d3d12compute.cpp don't need to be runtime checks.	03 November 2021, 22:19:15 UTC
a227440	Steven Johnson	03 November 2021, 20:55:28 UTC	Remove halide_assert() from halide_default_device_wrap_native (#6381) This was inserted in https://github.com/halide/Halide/pull/6310, probably mistakenly, since `halide_assert()` in the Halide runtime is not a debug-only assertion). Instead of a controlled runtime failure, we just abort, which is not OK.	03 November 2021, 20:55:28 UTC
415ce0c	Alex Reinking	03 November 2021, 20:27:46 UTC	Fix empty INSTALL_COMMAND in hannk super-build (#6387) * Fix empty INSTALL_COMMAND in hannk super-build * Fix 3.16 missing command * Fix the fix...	03 November 2021, 20:27:46 UTC
0d6b0f5	Steven Johnson	03 November 2021, 16:18:45 UTC	Fix for top-of-tree LLVM (#6380)	03 November 2021, 16:18:45 UTC
ac2673b	Alex Reinking	03 November 2021, 00:57:12 UTC	Add super-build for cross-compiling HANNK (#6374) * Add super-build for cross-compiling HANNK * Relax CMake version	03 November 2021, 00:57:12 UTC
6070821	Alex Reinking	02 November 2021, 19:42:02 UTC	Update README for Halide 13. (#6378)	02 November 2021, 19:42:02 UTC
5b8f473	Volodymyr Kysenko	02 November 2021, 15:36:19 UTC	Fix for the crash from #6367 (#6375) * Skip empty boxes * Address the comments	02 November 2021, 15:36:19 UTC
4225eba	Alex Reinking	01 November 2021, 23:03:09 UTC	Add helper for cross-compiling Halide generators. (#6366) * Add helper for cross-compiling Halide generators. Created a new function, `add_halide_generator`, that helps users write correct cross-compiling builds by establishing the following convention for creating a generator named `TARGET`: 1. Define Halide generators and libraries in the same project 2. Assume two builds: a host build and a cross build. 3. When creating a generator, check to see if we can load a pre-built version of the target. 4. If so, just use it. 5. If not, make sure the full Halide package is loaded and create a target for the generator. a. If `CMAKE_CROSSCOMPILING` is set, then _warn_ the user (the variable is unreliable on macOS) that something seems fishy. b. Create export rules for the generator. It creates a package `PACKAGE_NAME` and appends to its `EXPORT_FILE`. c. Create a custom target also named `PACKAGE_NAME` for building the generators. d. Create an alias `${PACKAGE_NAMESPACE}${TARGET}`. 6. Users are expected to use the alias in conjunction `add_halide_library`. Users can test the existence of `TARGET` to determine whether a pre-built one was loaded (and set additional properties if not). 7. Setting `${PACKAGE_NAME}_ROOT` is enough to load pre-built generators. `PACKAGE_NAME` is `${PROJECT_NAME}-halide_generators` by default. `PACKAGE_NAMESPACE` is `${PROJECT_NAME}::halide_generators::` by default. `EXPORT_FILE` is `${PROJECT_BINARY_DIR}/cmake/${PACKAGE_NAME}-config.cmake` by default. Users are free to avoid the helper if it would not fit their workflow. * Make HANNK use the new add_halide_generator helper	01 November 2021, 23:03:09 UTC
f5ce5f3	Steven Johnson	01 November 2021, 20:40:36 UTC	[hannk] Clean up aliasing (v2) (#6364) * wip * [hannk] Clean up aliasing (v2) The code for aliasing tensors was janky. This cleans it up and makes a clear distinction between aliasing done to overlay buffers with crop-and-translate, vs the aliasing done when we reshape tensors. We no longer allow a given tensor to do both of these, and we give preference to Reshape aliasing first. (Cherry-picked from #6321) * Move alias_type into shared ptr	01 November 2021, 20:40:36 UTC
1a1c97f	Steven Johnson	01 November 2021, 20:28:50 UTC	[hannk] Add support for building/running for wasm (#6361) * [hannk] Allow disabling TFLite+Delegate build in CMake Preparatory work for allowing building of hannk with Emscripten; TFLite (and its dependees) problematic to build in that environment, but this will allow us to build a tflite-parser-only environment. (Note that more work is needed to get this working for wasm, as crosscompiling in CMake is still pretty painful; this work was split out to make subsequent reviews simpler) * [hannk] Add support for building/running for wasm * HANNK_BUILD_TFLITE_DELEGATE -> HANNK_BUILD_TFLITE * Use explicit host build strategy for cross compiling HANNK (#6365) * Ignore local emsdk clone * Fix usage of CMAKE_BUILD_TYPE * Only print the Halide target info once per CMake run * Fix Halide "cmake" target detection for Emscripten * Prefer target_link_options to _link_libraries when applicable * Validate, rather than find, NODE_JS_EXECUTABLE (set by emsdk) * Emscripten already wraps tests with node. * Add dependency on Android logging library. * For cross-compiling, find host tools instead of recursive call. Rather than shelling out via execute_process and potentially guessing the toolchain options wrong, expect to find our host tools (i.e. generators) in a package called "hannk_tools". The package is created by the host build via the CMake export() command. Importing this package in the cross build creates IMPORTED targets with the same names as our generators. We then use these generators rather than creating generators for the target build. * Rework cross-compiling script. * Respond to (easy) reviewer comments. * Add HANNK_AOT_HOST_ONLY option. Use in script. * [hannk] tests should only process .tflite files (#6368) currently, random dotfiles (e.g. .DS_Store on OSX) can creep in, causing bogus failures * Add comment about node wrapping. * Rename hannk_tools to hannk-halide_generators * Add comment about exporting targets. * Bump version to Halide 14.0.0 (#6369) Co-authored-by: Steven Johnson <srj@google.com> Co-authored-by: Alex Reinking <alex_reinking@berkeley.edu>	01 November 2021, 20:28:50 UTC
69d8ef0	Alex Reinking	30 October 2021, 01:33:50 UTC	Bump version to Halide 14.0.0 (#6369)	30 October 2021, 01:33:50 UTC
3c52df1	Steven Johnson	29 October 2021, 23:46:50 UTC	[hannk] tests should only process .tflite files (#6368) currently, random dotfiles (e.g. .DS_Store on OSX) can creep in, causing bogus failures	29 October 2021, 23:46:50 UTC
541bc37	Steven Johnson	28 October 2021, 21:14:42 UTC	[hannk] Allow disabling TFLite+Delegate build in CMake (#6360) * [hannk] Allow disabling TFLite+Delegate build in CMake Preparatory work for allowing building of hannk with Emscripten; TFLite (and its dependees) problematic to build in that environment, but this will allow us to build a tflite-parser-only environment. (Note that more work is needed to get this working for wasm, as crosscompiling in CMake is still pretty painful; this work was split out to make subsequent reviews simpler) * Update hannk_delegate.h * HANNK_BUILD_TFLITE_DELEGATE -> HANNK_BUILD_TFLITE	28 October 2021, 21:14:42 UTC
e10f104	Steven Johnson	28 October 2021, 17:34:27 UTC	Update Emscripten settings (#6362) The settings we use to build C++ in wasm were slightly out of date now that we've updated our runtime to Node instead of d8. Also drive-by gitignore fix.	28 October 2021, 17:34:27 UTC
1c7388a	Andrew Adams	28 October 2021, 17:25:58 UTC	Allow users to use their own cuda contexts and streams in JIT mode (#6345) * Deprecate JIT runtime override methods that take void * * Make it possible to use custom cuda contexts and streams in JIT mode * Clean up comments * Tolerate null handlers in the JITUserContext These can come up if a JITUserContext is passed to something like copy_to_device before getting fully populated by passing it to a call to realize. * Remove reliance on dlsym in test and reuse the runtime's name resolution mechanism instead * Handle case where cuda and cuda-debug runtime modules both exist This change means we'll only ever create one built-in cuda context in this circumstance. * Slight simplification * Improve comments	28 October 2021, 17:25:58 UTC
4f573bf	Volodymyr Kysenko	28 October 2021, 02:05:29 UTC	Add missing widening_absd patterns (#6359) * Add missing widening_absd patterns * Add a comment	28 October 2021, 02:05:29 UTC
8f1ae2a	Steven Johnson	27 October 2021, 20:37:00 UTC	Use Node instead of d8 for Wasm AOT testing (#6356) * Use Node instead of d8 for Wasm AOT testing This requires the right version of Node is installed on your system. Since EMSDK often puts a too-old version of Node in the path, allow overriding via an env var. * wip	27 October 2021, 20:37:00 UTC
34534f5	Steven Johnson	27 October 2021, 20:35:39 UTC	[hannk] Add missing call to Interpreter::prepare in benchmark app (#6358)	27 October 2021, 20:35:39 UTC
a15ffda	Volodymyr Kysenko	27 October 2021, 01:33:39 UTC	Add include for size_t in constants.h (#6353) * Add include for size_t in constants.h * Change to int	27 October 2021, 01:33:39 UTC
86cb6c7	Andrew Adams	26 October 2021, 23:12:52 UTC	Deprecate JIT runtime override methods that take void * (#6344) * Deprecate JIT runtime override methods that take void * * Clean up comments	26 October 2021, 23:12:52 UTC
6211da9	Andrew Adams	26 October 2021, 23:12:34 UTC	Add --help flag to rungenmain, fixing #5323 (#6354)	26 October 2021, 23:12:34 UTC
06a37ca	Steven Johnson	26 October 2021, 17:23:23 UTC	Add to various OpVisitors to avoid overload warnings for some compilers (#6337)	26 October 2021, 17:23:23 UTC
47fa87f	Steven Johnson	26 October 2021, 17:22:49 UTC	[hannk] Add a prepare() method for ops and interp (#6338) * [hannk] Add a prepare() method for ops and interp This adds a new method to the Interpreter, and to all ops, which allows the interpreter (and each op) to do any one-time preparation for future executions. Previously this was lumped into either the Interpreter's ctor, or the Ops various other methods, but this has some nice advantages at minimal cost: - Since the new prepare() returns an error value, it allows the Interpreter to do sanity checking at startup and return an error to the caller (rather than simply crashing); this makes using it in some runtime environments less painful. - Ops can use this to prep and cache information for multiple subsequent runs; initially, Conv and DepthwiseConv use this to calculate and cache the alignment requirements they need later on. This is unlikely to be a huge performance hit, but it is likely nonzero, and As an added bonus, this means that e.g. the map_bounds() method is no longer susceptible to runtime failures from Halide bounds queries. * Update interpreter.cpp * Update transforms.cpp * Update transforms.cpp	26 October 2021, 17:22:49 UTC
667836d	Steven Johnson	26 October 2021, 17:16:35 UTC	Harvest IWYU changes for LLVM, WABT (#6341) A couple of minor hygiene changes, extracted from https://github.com/halide/Halide/pull/6251: - Clean up LLVM_Headers.h to uniformly use <> instead of "" and to alphabetize properly - Clean up WABT includes to reflect what we need more accurately	26 October 2021, 17:16:35 UTC
b34919f	Omar Emara	26 October 2021, 17:09:48 UTC	Fix wrong type in Ramp CodeGen for OpenGLCompute (#6349) The variable type of the Ramp in OpenGLCompute is assigned the type of the base member of the ramp, which is a scalar, while the ramp is a vector. Instead, we should use the type of the ramp instead to take vectorization into account. Partially resolves #1687.	26 October 2021, 17:09:48 UTC
ab57ab1	Steven Johnson	26 October 2021, 17:07:44 UTC	[hannk] augment SoftmaxOp to allow specifying axis (#6351) (basically equivalent to #6335 but for softmax)	26 October 2021, 17:07:44 UTC
50517cb	Steven Johnson	26 October 2021, 17:07:00 UTC	[hannk] requantize() should never skip the operation (#6350) * [hannk] requantize() should never skip the operation Even if inq == outq, the incoming buffer can contain out-of-range values; we shouldn't try to optimize the op away, since it's cheap. * Update ops.cpp * Update ops.cpp	26 October 2021, 17:07:00 UTC
d6d7bbc	Steven Johnson	26 October 2021, 17:05:48 UTC	Make halide_type_t and halide_type_of constexpr (#6340) * Make halide_type_t and halide_type_of constexpr This allows us to do a bit more at compile time in some cases; e.g., we can more reliably collapse things like `t == halide_type_t(int, 8)` into `t.as_u32() == literal-integer`, avoiding temporaries. It also makes it tractable to to do a `switch` on a series of `halide_type_t`, since we can now use halide_type_t::as_u32() as a constexpr. There were a number of places that did this in an ad-hoc manner previously; I updated those, and also converted at least one more repeated-if clause into a switch. (TBH, I'm not sure if I'm wild about the syntax, though; it is a bit weedy to scan. Suggestions welcome.) * Ensure there are no uninited vars in constexpr funcs * Update HalideRuntime.h	26 October 2021, 17:05:48 UTC
334e27a	Volodymyr Kysenko	26 October 2021, 05:10:41 UTC	Specify template parameter of ScopedValue (#6352)	26 October 2021, 05:10:41 UTC
5e374bc	Omar Emara	25 October 2021, 20:34:07 UTC	Fix default device wrap native function (#6310) * Fix default device wrap native function Currently, an attempt to call device_wrap_native on a target that uses the default device wrap native function will result in an error of type halide_error_device_interface_no_device, namely in the OpenGLCompute and the Hexagon targets. This happens because the default wrap native function calls debug_log_and_validate_buf after the device_interface is set in halide_device_wrap_native but before the device handle is set, which is validated as a bad state. This patch removes the validation call and adds an assert for the handle much like the other wrap_native implementations in other targets. * Use approperiate error code	25 October 2021, 20:34:07 UTC
6d9737d	Xuanda Yang	25 October 2021, 20:12:24 UTC	Fix cuda-debug logging: fix incorrect threads_per_core when SM >= 8.0. fix device memory smaller than acutal size. (#6346)	25 October 2021, 20:12:24 UTC
ba81a06	Volodymyr Kysenko	25 October 2021, 16:22:38 UTC	Scheduling directive to set an explicit storage bound (#6327) * bound_allocation scheduling directive * Add a more specific error and test * Add a correctness test for bound_allocation * Remove debug output from test * Move expression * Per dim bound_storage * Update CMakeLists.txt * how hard can it be * Rename the error code * More detailed error message	25 October 2021, 16:22:38 UTC
297c30a	Steven Johnson	22 October 2021, 16:30:56 UTC	Fix Makefile for LLVM11 (#6343)	22 October 2021, 16:30:56 UTC
6c9224a	Steven Johnson	21 October 2021, 23:16:36 UTC	Fix HelloWasm (#6342) - Add workaround for Cross-origin isolation requirements - Add CXXFLAGS to allow benchmarks to compile	21 October 2021, 23:16:36 UTC
0078880	Thales Sabino	21 October 2021, 17:52:39 UTC	Add support for AMX instructions (#5818) * Add support for AMX tile instructions * Make AMX transform opt-in with memory type * Clean up tiled_matmul test * Handle AMX intrinsic attributes better * Format * Fix test to behave like other tests * Add doc and missing load check * Format * Throw error if user requests AMX for invalid operation * Add Tile lowering pass to makefile * Use spaces in Makefile * Place AMX instrinsics into a separate module (x86_amx.ll) This will only be included if LLVM >= 12 is used to build Halide * Fix CreateAlignedLoad() call in CodeGen_X86 Recent changes in LLVM trunk made the previous calling convention deprecated (and thus compiling with warning/error) * fix exporting to module * add llvm funcs for su, us, uu amx variants * add other amx intrinsics to intrinsic_defs * match with unsigned 8 bit integers This matching happens for the left and right side, each determining whether that side is unsigned or signed. In the end the proper 1024 byte buffer is created with (un)signed. * match for 32 bit integer and guard unsigned amx on llvm 13 * adjust test to cover unsigned tile operations * guard properly with llvm 12 * create explicit error if failed to use tile operations * pass types as template params rather than boolean This makes the intention clearer * clang-format patch * add x86_amx to makefile's runtime components * make tiled_matmul compatible with c++11 * add mattrs required for amx * fix formatting issues * remove outdated FIXME comments * add bf16 tile operations to the runtime * create a schedule that should map to amx * create full amx-bf16 schedule * allow amx operations to yield f32s * accept 32 bit float stores * add support for bf16 * add missing bf16 intrinsics * fix striding error when loading matrix * add checks to verify bf16 result * fix scaling of col_bytes on matmul call * move brace to previous line * derive result type using a function rather than lambda * run clang tidy and format * have tile_store return i32 * make is_3d_tile_index robust to indexing changes * apply formatting suggestions * both first and second can be const qualified * remove trailing whitespace in unformatted section * make requested style changes * rename NewMatmul -> Matmul * fix warning about missing return value * use get_1d_tile_index to handle special case When using `Buffer` instead of `ImageParam` the `Ramp` expression generated is 1D instead of 2D, therefore we recognize this with a special case. The lanes are still matched against the dimensions of the LHS 3d tile lanes. * add correctness test for AMX instructions * correctness part has been separated out * remove unused variables Co-authored-by: John Lawson <john@codeplay.com> Co-authored-by: Thales Sabino <thales@codeplay.com> Co-authored-by: Steven Johnson <srj@google.com> Co-authored-by: Frederik Engels <frederik.engels@codeplay.com>	21 October 2021, 17:52:39 UTC
05107ca	Steven Johnson	21 October 2021, 17:23:31 UTC	Fix WASM datalayout for top-of-tree LLVM (#6339)	21 October 2021, 17:23:31 UTC
ecf69b0	Xuanda Yang	21 October 2021, 04:16:55 UTC	Add support for CUDA capability 8.6 (#6334) * Add support for CUDA capability 8.6 * add assertion to guard LLVM version * fallback to sm80 if LLVM < 13.0	21 October 2021, 04:16:55 UTC
27f975f	Andrew Adams	20 October 2021, 22:32:08 UTC	Add ability to pass a user context in JIT mode (#6313) * Change type of first arg to all JITHandlers and expose struct to users * Make it possible to pass a custom JITUserContext per realize call * More comments * Fix type in python bindings * Fix type in python bindings * Fix more types in python bindings * Add user_context-accepting variants of other realize-like functions * Revert tests back to the way they are on master but with comments explaining why they are the way they are * Add example of passing a custom context to copy_to_host * Revert test to be closer to master. It was that way for a reason * The first arg to get_library_symbol isn't actually a user_context * Add copy_to_device example too * Fix python * Make bad_buf even worse * Comment clarifications	20 October 2021, 22:32:08 UTC
c3641b6	Steven Johnson	20 October 2021, 00:16:04 UTC	[hannk] augment L2NormOp to allow specifying axis (#6335)	20 October 2021, 00:16:04 UTC
d80bb23	Andrew Adams	19 October 2021, 16:43:07 UTC	Add a new unsigned division method (#6322) * Add a new unsigned division method It uses averages rounding up instead of averages rounding down, to reduce instruction count on x86. Division by 7 before: vpmulhuw .LCPI0_1(%rip), %ymm0, %ymm1 vpsubw %ymm1, %ymm0, %ymm0 vpsrlw $1, %ymm0, %ymm0 vpaddw %ymm1, %ymm0, %ymm0 vpsrlw $2, %ymm0, %ymm0 Division by 7 after: vpmulhuw .LCPI0_1(%rip), %ymm0, %ymm1 vpavgw %ymm0, %ymm1, %ymm0 vpsrlw $2, %ymm0, %ymm0 * Remove debugging code * Add comment elaborating on why this is a good idea	19 October 2021, 16:43:07 UTC
deeb6bc	Steve Suzuki	19 October 2021, 16:40:16 UTC	Rewrite double/triple narrowing from float on ARM (#6305) * Rewrite double/triple narrowing from float on ARM	19 October 2021, 16:40:16 UTC
7613f9d	Steven Johnson	18 October 2021, 16:55:35 UTC	[hannk] Improve GatherOp (#6328) We (mostly) implemented GatherOp for TFLite's Gather op, but missed some things: - There's a batch_dim param for Gather that we were ignoring. I added code to fill it in, but we punt for values != 0, because I haven't yet found a test case that handles it. Should be easy to fill in when we do. - TFLite's GatherNd op (and NNAPI's GATHER op) allow for the indices arg to be multidimensional; I rewrote the code to handle this and it's passing the acceptance tests for NNAPI's cases. (It doesn't yet handle the GatherNd op because, again, I haven't found a good test case. Should be simple to do when we do.)	18 October 2021, 16:55:35 UTC
8d098de	Steven Johnson	18 October 2021, 16:52:49 UTC	[hannk] Restructure BinaryOp to allow adding more temporary types (#6326) Change is a no-op as written, but I'd like to land it so this change doesn't get lost -- it's handy for debugging pipelines that happen to use op/type variants we don't yet support (eg arithmetic on floats), which can unblock the ability to run more tests (albeit not efficiently).	18 October 2021, 16:52:49 UTC
cd8146d	Steven Johnson	18 October 2021, 16:47:48 UTC	[hannk] Fix override annotation in hannk (#6315) Minor hygiene: add explicit override annotations and enable the compiler warnings. (I was about to tweak some of the virtual functions and this has been bothering me for a while.)	18 October 2021, 16:47:48 UTC
923025a	Steven Johnson	14 October 2021, 20:27:00 UTC	[hannk] Fix assert in dconv (#6320)	14 October 2021, 20:27:00 UTC
071f5f7	Steven Johnson	13 October 2021, 18:23:34 UTC	[hannk] Improve Op::dump() (#6314) * [hannk] Improve Op::dump() Rewrite the Op::dump() methods to be more verbose, so that we can determine all the details of the tensors used, and the hierarchy of OpGroups; also add a post-transform dump when verbosity >= 2. (I'm using this to track down a subtle bug, but landing this separate from other fixes seems appropriate) * Fixes	13 October 2021, 18:23:34 UTC
63cfd9d	Andrew Adams	12 October 2021, 20:52:24 UTC	Substitute in all widening lets prior to find_intrinsics (#6307) * Look through lets in find_intrinsics If an Expr like: narrow((widen(x) + y + 1)/2) gets lifted into a let, the simplifier will then substitute things in like so: let foo = widen(x) + y in narrow(foo + 1)/2, potentially breaking a pattern. This is a general problem for patterns that widen, do some math, and then narrow. They will always get cut at the widening operation, so this PR just substitutes in all widening operations. This can't cause combinatorial blow-up, because each substitution has a wider type than the values that it depends on, so the chains can be at most 2-3 lets deep. * Make substituting in widening lets a prepass instead * Move find_intrinsics a little earlier in lowering * Handle impure subexpressions by leaving them behind at the original let site * FindIntrinsics must be after the last simplification pass	12 October 2021, 20:52:24 UTC
a351021	Andrew Adams	12 October 2021, 19:20:33 UTC	Demosaic should be done unsigned (#6308) So that we can use pavgw instructions and the like. Speeds it up slightly on x86 (5% or so)	12 October 2021, 19:20:33 UTC
3931213	Steven Johnson	12 October 2021, 18:57:02 UTC	[hannk] SpaceDepthOp isn't limited to u8 Tensors (#6311) The code as written should work on all Tensor types; we just need to require the input and output types match.	12 October 2021, 18:57:02 UTC
d4e45bd	Steven Johnson	12 October 2021, 18:56:43 UTC	[hannk] Fix > and >= op implementations (#6312) a>b should be b<=a (not b <a) a>=b should be b<a (not b<=a)	12 October 2021, 18:56:43 UTC
89b36b4	Steven Johnson	12 October 2021, 18:00:09 UTC	[hannk] Fix faulty 'shallow' logic in dconv2d (#6309) * [hannk] Fix faulty 'shallow' logic in dconv2d	12 October 2021, 18:00:09 UTC
e058532	Andrew Adams	11 October 2021, 21:56:51 UTC	store_in(MemoryType::Stack) should use alloca if the size is small (#6289) * Test using a real alloca call instead of the pseudostack * Improve test and remove debugging prints * Fix test * Switch to heap based on cumulative size rather than current size and add a test case that illustrates why this matters. * Fix test that requires actual heap allocations * Make test actually test more than one trip through the loop * Fix alignment of stack allocation * Branching is cheaper than alloca(0) * Tweak test pass condition * Move shared constant to a single locations * Namespace shuffling * Fix comment location	11 October 2021, 21:56:51 UTC
1e40a71	Steven Johnson	11 October 2021, 20:48:35 UTC	Fix for top-of-tree LLVM (#6306) * Fix for top-of-tree LLVM * drive-by fix for other bad LLVM_VERSION checks	11 October 2021, 20:48:35 UTC
2a2c4b0	Andrew Adams	10 October 2021, 21:06:57 UTC	At some point llvm re-added pavgw intrinsics (#6302) * At some point llvm re-added pavgw intrinsics This is a good thing, because these do not reliably trigger from the pattern in runtime/x86.ll * Delete more dead code	10 October 2021, 21:06:57 UTC
2bfa567	Alex Reinking	08 October 2021, 18:52:29 UTC	Add ClampUnsafeAccesses pass. (#6294) * Add ClampUnsafeAccesses pass. Fixes #6131 Inject clamps around func calls h(...) when all the following conditions hold: 1. The call flows into an indexing context, such as: `f(x) = g(h(x))` or `let y = h(x) in f(x) = g(y)` 2. The FuncValueBounds of h are smaller than those of its type 3. h's allocation bounds might be wider than its compute bounds Condition (3) is not yet implemented see #6297.	08 October 2021, 18:52:29 UTC
c6529ed	Steven Johnson	08 October 2021, 01:21:42 UTC	Modernize loops, part 4/final (#6296) * Modernize loops, part 4/final Final part getting code ready for clang-tidy's modernize-loop check, plus enabling the check * Update Module.cpp	08 October 2021, 01:21:42 UTC
0b297f2	Steven Johnson	07 October 2021, 20:04:57 UTC	Modernize loops, part 3 (#6295) * Modernize loops, part 3 Part 3 of getting code ready for clang-tidy's modernize-loop check * Update Func.cpp * Address review comments	07 October 2021, 20:04:57 UTC
e27db6f	Zalman Stern	07 October 2021, 16:26:52 UTC	Don't set environment for RISCV Linux as apparently it is not (#6282) used. Should not change anything. Per issue: https://github.com/halide/Halide/issues/6281	07 October 2021, 16:26:52 UTC
ed87acb	Steven Johnson	07 October 2021, 00:50:51 UTC	Modernize loops, part 2 (#6293) Part 2 of getting code ready for clang-tidy's modernize-loop check	07 October 2021, 00:50:51 UTC
9169734	Steven Johnson	06 October 2021, 20:51:06 UTC	[hannk] Add specialization for broadcast of input 0 (#6291) * [hannk] Add specialization for broadcast of input 0 Alternate fix for https://github.com/halide/Halide/pull/6290 that is Halide-only. * Update elementwise_generator.cpp * Oops, do Mul as well	06 October 2021, 20:51:06 UTC
71c47b3	Steven Johnson	06 October 2021, 20:37:26 UTC	Modernize loops, part 1 (#6292) Part 1 of getting code ready for clang-tidy's modernize-loop check: src/autoschedulers and src/runtime	06 October 2021, 20:37:26 UTC
81b34e2	Basile Clement	05 October 2021, 16:32:59 UTC	Remove unbound variable in documentation (#6287) In the example for RDom::where, the simplified case contains a free occurence of `r.x`, which should be replaced with `10` since we are in the case `r.x == 10`.	05 October 2021, 16:32:59 UTC
da7c66e	Steven Johnson	04 October 2021, 16:33:15 UTC	Make parking_control (etc) use vtables (#6275) * Make parking_control (etc) use vtables This class hierarchy is clearly best modeled with virtual methods (rather than fn ptrs), but was not; we think this was due to COMDAT issues that have been resolved by other means. I refactored this to use virtual methods instead (and removed the unused unpark_all function); it seems to work locally. * Add -fno-rtti to runtime compile flags (needed to allow vtables in runtime code) * make all overrides 'final' * Make virtual methods protected * Make structs final too * pacify clang-tidy	04 October 2021, 16:33:15 UTC
2495bcc	Zalman Stern	02 October 2021, 17:38:40 UTC	Remove hopefully dead code. (#6280)	02 October 2021, 17:38:40 UTC
81ad45e	Andrew Adams	01 October 2021, 21:41:29 UTC	compiler stack usage improvements (#6239) * Reduce compiler stack usage, and grant more control over stack usage I found some code in the wild that needs 9mb of stack to lower. It's a pain to even diagnose the problem definitively, because it requires plumbing platform-specific linker flags to grant more stack. This commit: - Reduces peak stack usage of similar code in the repo (the FFT) - Increases the stack size for lowering and codegen to 32mb on all platforms, using stack switching techniques. We started doing this on Windows a while ago and it hasn't bitten us, so let's try on more platforms. - Gives user control over the amount of stack used for lowering and codegen. It shouldn't be necessary except when diagnosing problems like this in future. Using the control I was able to determine that the correctness tests all pass with 500k of stack, and the apps all pass with 1MB, so 32MB ought to be enough for anybody. I found a never-checked-in test for the mux helper which uses 10MB of stack and really shouldn't need to, so I added that (and opened an issue) as an example of how to grant more stack when necessary, even though 10MB is less than our default now. Also fixed an incorrect comment on the Block node. * Fixes for macos * Add test to cmake * Fix type of temporary * Reduce number of exprs in the mux * Fix quadratic memory usage in new test * Better comment * Variable name fix * Try giving windows a little more stack * Clarify why we want a live Stmt in scope * Review comments * Check some return values * tickle buildbots * Fixes for arm macos * Remove stray character * clang-tidy had some reasonable concerns * Comment fix * Maybe windows needs yet more stack Co-authored-by: Steven Johnson <srj@google.com>	01 October 2021, 21:41:29 UTC
4b9f728	Steven Johnson	30 September 2021, 17:16:12 UTC	Remove more obsolete MachO/COMDAT workarounds (#6274) * Remove more obsolete MachO/COMDAT workarounds (Followup to #6272) * Update metal.cpp * A few more fixes	30 September 2021, 17:16:12 UTC
ef387ad	Steven Johnson	30 September 2021, 17:03:07 UTC	Minor cleanups in thread_pool_common.h (#6276) Minor hygiene noticed when doing other patches: - prefer `constexpr int` over `#define`, since we can now use C++17 in runtime code - remove redundant def of MAX_THREADS - use `do .. while (0)` idiom for functional macros	30 September 2021, 17:03:07 UTC
a8d7013	Steven Johnson	30 September 2021, 02:09:12 UTC	Remove the runtime/ssp module (#6277) * Remove the runtime/ssp module It doesn't get included via any path in the runtime linker, and removing it doesn't seem to affect any tests. (I haven't looked at the revision history to see when it was added and/or when inclusion of it was removed.) * Update LLVM_Runtime_Linker.cpp * Update LLVM_Runtime_Linker.cpp	30 September 2021, 02:09:12 UTC
e092c01	Steven Johnson	29 September 2021, 23:08:38 UTC	Fix alignment issues in synchronization_common.h (#6272) * Fix alignment issues in synchronization_common.h To work around old COMDAT issues, we allocated the table as a char array and cast it to what we want; unfortunately this doesn't guarantee the right alignment for the table and in some environments (eg wasm) we can get unaligned-access failures. We could fix this by forcing the right alignment, but since we fixed COMDAT issues in another way a while back (adding smarts to LLVM_Runtime_Linker), let's just remove the hack and declare it normally. Also added some drive-by changes to ensure that the hashtable size and HASH_TABLE_BITS were safe (this happened to be the case before but wasn't enforced), and also to init all the fields in hash_bucket. (Q: do we really need `check_hash()` to exist? With the mods in place above, is it possible for addr_hash() to return a bad index?) * Always use HASH_TABLE_BITS in addr_hash() * Only use check_hash() in DEBUG_RUNTIME builds	29 September 2021, 23:08:38 UTC
4307645	aankit-ca	29 September 2021, 16:56:34 UTC	[Hexagon] Remove qurt_init_fini (#6271) Including qurt_init_fini generates the below error: dlopenbuf failed: undefined symbol #140 __DTOR_LIST__ Including qurt_init_fini was needed when the pipeline was loaded using mmap. This is not needed now. Co-authored-by: Ankit Aggarwal <aankit@quicinc.com>	29 September 2021, 16:56:34 UTC
e836ea6	Roman Lebedev	28 September 2021, 08:36:38 UTC	CMake: install docs into halide subdirectory of doc dir (#6267)	28 September 2021, 08:36:38 UTC
8fbc788	Steven Johnson	27 September 2021, 17:12:25 UTC	[hannk] Add Make target to rebuild just the Halide-generated code. (#6265) * [hannk] Add Make target to rebuild just the Halide-generated code. Also, drive-by comment fix about disabling Fortran (!) when building for Android. * More changes	27 September 2021, 17:12:25 UTC
ebb9f19	Roman Lebedev	24 September 2021, 16:13:58 UTC	Usage of C++ `<thread>` header requires linking to threading library (#6257) * Usage of C++ `<thread>` header requires linking to threading library `Generator.cpp` and `ThreadPool.h` both `#include <thread>`, but don't link to the threading implementation. This fixes build for me on debian sid, which is failing otherwise with: ``` $ ninja [ 0% 2/1480][ 0% 0:00:00 + 0:33:06] Linking CXX executable src/autoschedulers/adams2019/get_host_target FAILED: src/autoschedulers/adams2019/get_host_target : && /usr/local/bin/clang++ -pipe -O3 -DNDEBUG src/autoschedulers/adams2019/CMakeFiles/get_host_target.dir/get_host_target.cpp.o -o src/autoschedulers/adams2019/get_host_target -Wl,-rpath,/repositories/halide/build/src: src/libHalide.so.13.0.0 && : ld: error: /usr/lib/gcc/x86_64-linux-gnu/11/libstdc++.so: undefined reference to pthread_create [--no-allow-shlib-undefined] clang: error: linker command failed with exit code 1 (use -v to see invocation) [ 0% 5/1480][ 0% 0:00:00 + 0:12:47] Linking CXX executable src/autoschedulers/adams2019/test_apps_autoscheduler FAILED: src/autoschedulers/adams2019/test_apps_autoscheduler : && /usr/local/bin/clang++ -pipe -O3 -DNDEBUG src/autoschedulers/adams2019/CMakeFiles/test_apps_autoscheduler.dir/test.cpp.o -o src/autoschedulers/adams2019/test_apps_autoscheduler -Wl,-rpath,/repositories/halide/build/src src/libHalide.so.13.0.0 -ldl && : ld: error: /usr/lib/gcc/x86_64-linux-gnu/11/libstdc++.so: undefined reference to pthread_create [--no-allow-shlib-undefined] clang: error: linker command failed with exit code 1 (use -v to see invocation) [ 0% 7/1480][ 0% 0:00:00 + 0:08:53] Linking CXX executable src/autoschedulers/adams2019/test_function_dag FAILED: src/autoschedulers/adams2019/test_function_dag : && /usr/local/bin/clang++ -pipe -O3 -DNDEBUG src/autoschedulers/adams2019/CMakeFiles/test_function_dag.dir/test_function_dag.cpp.o src/autoschedulers/adams2019/CMakeFiles/test_function_dag.dir/FunctionDAG.cpp.o src/autoschedulers/adams2019/CMakeFiles/test_function_dag.dir/ASLog.cpp.o -o src/autoschedulers/adams2019/test_function_dag -Wl,-rpath,/repositories/halide/build/src src/libHalide.so.13.0.0 && : ld: error: /usr/lib/gcc/x86_64-linux-gnu/11/libstdc++.so: undefined reference to pthread_create [--no-allow-shlib-undefined] clang: error: linker command failed with exit code 1 (use -v to see invocation) [ 0% 9/1480][ 0% 0:00:00 + 0:06:41] Linking CXX executable src/autoschedulers/li2018/gradient_autoscheduler_test_cpp FAILED: src/autoschedulers/li2018/gradient_autoscheduler_test_cpp : && /usr/local/bin/clang++ -pipe -O3 -DNDEBUG src/autoschedulers/li2018/CMakeFiles/gradient_autoscheduler_test_cpp.dir/test.cpp.o -o src/autoschedulers/li2018/gradient_autoscheduler_test_cpp -Wl,-rpath,/repositories/halide/build/src src/libHalide.so.13.0.0 && : ld: error: /usr/lib/gcc/x86_64-linux-gnu/11/libstdc++.so: undefined reference to pthread_create [--no-allow-shlib-undefined] clang: error: linker command failed with exit code 1 (use -v to see invocation) [ 2% 35/1480][ 0% 0:00:00 + 0:05:09] Generating included_schedule_file.runtime.o ninja: build stopped: subcommand failed. ``` * Dommy commit to retrigger bots	24 September 2021, 16:13:58 UTC

Newer
Older