https://github.com/halide/Halide

sort by:
Revision Author Date Message Commit Date
53577e5 More targeted fix for gather instructions being slow on intel processors See https://github.com/llvm/llvm-project/issues/70259 11 November 2023, 16:58:06 UTC
f25af7f Remove the deprecated API `llvm::Type::getInt8PtrTy` usage. (#7937) This API is removed in LLVM trunk now https://github.com/llvm/llvm-project/commit/7b9d73c2f90c0ed8497339a16fc39785349d9610. 09 November 2023, 05:27:20 UTC
3b4dc33 Make sure all Halide arithmetic scalar types can be named from the Generator interface. (#7934) * Make sure all Halide arithmetic scalar types can be named from the Generator interface. Specifically adding 64-bit signed and unsigned integers and making sure float16 and bfloat16 are fully supported and documented. Add a simple test for all the type names. (Don't use float16 and bfloat16 in the arithmetic as they do not compile with the C++ backend. The name mapping should still be tested but the types passed do not seem to be checked as the values are not used.) 07 November 2023, 21:23:31 UTC
256c2f2 Add missing serialization of Dim::partition_policy (#7935) add missing serialization of Dim::partition_policy 07 November 2023, 17:57:21 UTC
e5bf7ab Add special build for testing serialization via a serialization roundtrip in JIT compilation and fix serialization leaks (#7763) * add back JIT testing, enclosed in #ifdef blocks * fix typo * nits * WITH_SERIALIZATION_JIT->WITH_SERIALIZATION_JIT_ROUNDTRIP_TESTING * fix self-reference leaks: now uses weak function ptr in reverse function mappings * Move clang-tidy checks back to Linux Recent changes in the GHA runners for macOS don't play well with clang-tidy; rather than sink any more time into debugging it, I'm going to revert the relevant parts of #7746 so that it runs on the less-finicky Linux runners instead. * bogus * Update Generator.cpp * Update Generator.cpp * call copy_to_host before serializing buffers * throw an error if we serialize on-device buffer * Skip specialize_to_gpu * Update Pipeline.cpp * Skip two more tests * use serialize to memory during jit testing * makefile update * makefile fix * skip the tutorial if flatc is not there * fix * fix signature * fix makefile * trigger buildbot --------- Co-authored-by: Steven Johnson <srj@google.com> 06 November 2023, 23:36:56 UTC
e5ee753 Remove use of dynamic_cast. (#7931) Remove use of dynamic_cast to preserve compiling the Halide compiler without RTTI. 03 November 2023, 00:27:03 UTC
1865101 Loop Partitioning Policy through Stage::partition(VarOrRVar, LoopPartitionPolicy) (#7914) * Loop Partitioning Policy through Stage::partition(VarOrRVar, LoopPartitionPolicy) * Renamed LoopPartitionPolicy to Partition. Added tests in boundary_conditions to verify correctness of the code with and without loop partitioning. Added tests that validates that disabling loop partitioning works. * Include error-test for when partitioning is always requested, but none was performed. 31 October 2023, 17:38:55 UTC
0134c40 Improve the error message if you store_at without a compute_at (#7923) * Improve an error message * Clean up * Update messages 30 October 2023, 21:39:17 UTC
97573c6 Scheduling directive to hoist the storage of the function (#7915) * Minimal hoist_storage plumbing * HoistedStorage placeholder IR node * Basic hoist_storage test * Fully plumb through the HoistedStorage node * IRPrinter for HoistedStorage * Insert hoisted storage at the correct loop level * Progress * Formatted * Move out common code for creating Allocate node * Format * Emit Allocate at the HoistedStorage site * Collect all dependant vars * Basic test working * Progress * Substitute lets into allocation extents instead of lifting stuff * Infer bounds for the extends dependant on loop variables * Update tests * Remove old code * Remove old code * Better tests * More tests * Validate schedules with hoist_storage * Error test * Fix stupid mistake * More tests * Remove debug prints * Better errors * Add missing handler for inlined functions * Format * Comments * Format * Add some missing visit handlers * New line * Fix comment * Luckily we only have two build systems * Adds hoist_storage_root * Comment for IR node * Serialization support for HoistedStorage * Handle hoist_storage fo tuples * Handle multiple realize nodes * Move assert up * Better error message * Better loop bounds * Format * Updated error message * Happy clang-tidy happy me * An error message when compute is inlined, but store is not inlined * Only mutate lets which are needed * Update apps to use hoist_storage Some very minor performance gains, but mostly in the noise. Also switched the apps makefiles to emit stmt html by default instead of stmt, to take advantage of the new and improved stmt html. * Switch to stack of hoisted storages * Limit scope of lets for expansion * Break early * Skip substitute_in_all_lets * Re-use expanded min/extents * WebAssembly JIT does not support custom allocators * Change debug level to get more info about segfault * More debugging prints * Let's try aligned malloc * Revert "Change debug level to get more info about segfault" This reverts commit a5a689be8c6ad351674f3ced3bbf542335f91d75. * Revert "More debugging prints" This reverts commit bb6b8c1313cbdb9f355df20fd203ee02d485042e. --------- Co-authored-by: Andrew Adams <andrew.b.adams@gmail.com> 27 October 2023, 21:21:26 UTC
ed357c2 Fix bug mentioned by @antonysigma. (#7916) 27 October 2023, 17:23:42 UTC
cf01e97 Turn off SLP vectorization for avx512 only (#7918) Fixes #7917 27 October 2023, 17:22:31 UTC
fffb8bd Fix read-after-write hazard analysis in storage folding (#7910) Explicitly mark which loops get loop-carry-dependencies inserted by sliding window to assist storage folding. Storage folding needs to know about this so it doesn't try to fold in a way that invalidates these read-after-write dependencies. It currently tries to prove the absence of hazards with box_contains(box_provided, box_required), but this is sometimes incorrect because box_provided could be conservatively large, and the code it analyses might not actually provide (store to) all the required (loaded from) values. It's simpler for sliding window to just tell storage folding when it inserts loop-carry-dependencies, and this is most simply done directly in the IR itself. Fixes #7909 24 October 2023, 17:23:49 UTC
d023065 Hotfix reinterpret HTML (#7912) Hotfix reinterpret 22 October 2023, 19:20:47 UTC
739053d Check returned result in the test (#7911) * Check returned result of Callable * Format 22 October 2023, 19:11:00 UTC
872264c Static analysis (MSVC) fixes for device_buffer_utils.h (#7904) * Static analysis (MSVC) fixes for device_buffer_utils.h * clang-format happiness * signed integer cast 20 October 2023, 21:33:13 UTC
2918854 Highlight groups for the HTML Stmt file and tooltips to reveal types. (#7887) * Highlight groups for the HTML Stmt file and tooltips to reveal types. * Cleaned up JS using eslint. * Remove commented code. 20 October 2023, 17:37:38 UTC
bd1d4df Stop interleaver from expanding the scope of letstmts (#7908) In the following code: let a = b in X let a = c in Y If Stmt X successfully had stores interleaved, it was re-nesting it like so: let a = b in X let a = c in Y This introduces a shadowed variable 'a', which is illegal at this stage of lowering. Fixes #7906 Also some drive-by fixes to earlier tests that had debugging code left in. 20 October 2023, 17:21:50 UTC
eb66c06 Don't lift loop vars outside of their loops in sliding window (#7896) Sliding window, when operating in the mode that shifts the consumer's loop min backwards a few iterations to cover the warmup, was capable of inappropriately lifting for loop vars inside that loop but outside the produce node of the slid Func. Fixes #7891 18 October 2023, 21:45:47 UTC
5c97c3c Assignment is not associative (#7894) * Assignment is not associative * Fix internal tests 17 October 2023, 16:17:58 UTC
f9b90cb Disable warning for mismatched new/delete (#7897) 17 October 2023, 16:17:20 UTC
db207b9 Mutating if branches in isolation can break reachability analysis (#7895) Fixes #7892 17 October 2023, 16:16:33 UTC
667d6ed Check for overflow in Type constructor (#7889) * Check for overflow in Type constructor * Don't try to construct illegal types 16 October 2023, 17:12:50 UTC
7e35494 Generate simpler LLVM IR for shuffles that recursively become broadcasts (#7902) * Generate simpler LLVM IR for shuffles that recursively become broadcasts * Don't re-codegen arg 14 October 2023, 11:29:33 UTC
51ad730 Attempted fixed datalayouts for llvm trunk (#7898) * Attempted fixed datalayouts for llvm trunk * Missed a few i128:128s 13 October 2023, 21:32:01 UTC
a3911bb Explicitly name the allocgroups on GPU schedules "allocgroup__..." (#7883) * 50cents readibility improvement to allocgroups on GPU schedules. * Improve allocation group prefix: only if the alloc group cluster contains more than 1 allocation prepend the prefix. 12 October 2023, 18:38:44 UTC
509140a Implement elementwise complex value division (#7848) Implement the logic: (a + bj) / (c + dj) as an inline operator/() function. Use case: direct FFT method to solve linear least square problem, namely: ```math \begin{align} f(x) &=\Vert F^T D F x - b \Vert_2^2 \\ \arg \min_{x \in \mathbb{R}} f(x) &= F^T \left[ D^{-1} F b \right] \end{align} ``` where `D` is a diagonal complex-valued matrix representing image blur kernel, `b` is an ordinary image in vectorized form. 09 October 2023, 19:08:06 UTC
9293655 Update README.md to include RISCV in llvm build instructions (#7878) 09 October 2023, 19:07:28 UTC
b607129 HTML Stmt IR with conceptual code and device code. (#7843) * WIP: Conceptual Stmt IR and HTML cleanup. * WIP: Lots of progress on Stmt HTML. Cleanup almost complete. * Support scrolling to device code. * Resizing works decent enough for me. Fix cost-model allocate block costs. * Print better vector_reduce calls. * Optionally enable VizTree through an env variable. * Fix the device code tab for non-PTX. * StmtHTML: Tabs renamed to panes. Fix linter warnings. Cut trailing 0 byte from device code buffers. * Fix clang-format. * Fixed typos and copy paste error. * Fix HL_EXTRA_OUTPUTS behaviour to respect the defaults. * Nuked VizTree * Finalize StmtToViz nuke and rename StmtToHTML. * Improved HTML correctness by running output through an online validator. Quite some bugs fixed. * Cost model visualization improvement. Fix button not being allowed in the checkbox/label combination. * Fix collapsing being triggered by jump-to-xxx buttons. * How did this work? * Process Andrew's feedback. * Process Andrew's feedback. * Process Andrew's feedback, part 3. * Improve color palette. Few minor improvements. * Clang-format... 06 October 2023, 19:40:35 UTC
24a64f8 Update onnx app to work with newer versions of protobuf (#7879) and to work on mac 06 October 2023, 18:40:46 UTC
120e5fd Consider all dimensions before deciding to slide over a new dimension (#7875) * Don't deduce unreachability from predicated out of bounds stores Fixes #7873 * Consider all dimensions before deciding to slide over a new dimension Even ones we've already slid over. The previous version of this code could try to slide over a loop where multiple dimensions depend on the loop var, because it ignored dimensions that had already been slid over. Moving a check resolves the issue. Fixes #7872 05 October 2023, 23:52:58 UTC
51ab364 Validate for types when fusing Vars with RVars (#7877) * Fix for llvm trunk * Validate for types when fusing Vars with RVars Fixes #7871 * Commit test 05 October 2023, 17:42:23 UTC
39f12a7 Fix for llvm trunk (#7876) 05 October 2023, 16:15:09 UTC
c31e8f7 Don't deduce unreachability from predicated out of bounds stores (#7874) Fixes #7873 04 October 2023, 18:12:09 UTC
a24071c [serialization] Add support to serialize to memory, and a basic serialization tutorial (#7760) * Add in-memory buffer serialize/deserialize support. * Add basic serialization tutorial * Clang format pass * Update doc strings to use Doxygen formatted args * Clear out data buffer during serialization * Update serialization tutorial to use simple blur example with ImageParam * Make parameter map optional for serialize #7849 Add error messages to deserializer for missing params Update tutorial * Clang format pass --------- Co-authored-by: Derek Gerstmann <dgerstmann@adobe.com> Co-authored-by: Steven Johnson <srj@google.com> 28 September 2023, 21:30:44 UTC
76ac233 Handle unreachable code in bounds inference (#7866) * Handle unreachable code in bounds inference * Avoid ambiguous constructor * IRVisitor -> IRGraphVisitor * Add success print 27 September 2023, 16:42:07 UTC
9f96b25 Prevent use of uninitialized scalar Parameters in JIT code (#7847, partial) (#7853) * Prevent use of uninitialized scalar Parameters in JIT code (#7847, partial) * Fix broken tests * Update Parameter.h * Update func_clone.cpp * Fix Generators too * Fixes * Update InferArguments.cpp * Fixes * pacify clang-tidy * fixes 27 September 2023, 01:55:12 UTC
3926b02 Respect input buffer constraints in root-level bounds inference exprs (#7865) * Respect input buffer constraints in bounds inference lets Fixes #7761 * Add test 26 September 2023, 01:58:20 UTC
05d5efa Handle nested vectorization in store predicates (#7864) Fix #7851 In one place in PartitionLoops and in another place in the simplifier we were neglecting to consider nested vectorization. I added the fuzzer output as a new test, because I have no idea how I'd generate this error with human-readable code. It stems from an interaction of several tail strategies. 25 September 2023, 19:14:25 UTC
26619d2 [Hexagon] - Fix 8-bit unsigned saturating downcasts for HVX (Fixes #7806) (#7825) * Dump the IR more frequently in HexagonOptimize.cpp * Fix 8bit unsigned saturating downcasts for HVX We do not have a way of reliably lowering the following expression to LLVM bitcode for HVX. u8_sat(uint16x) where uint16x is a vector (preferably a HVX double vector) with element type uint16. Since there is no native HVX instruction to do this, this patch introduces two helper functions in hvx_128.ll to perform this operation. One function interleaves its input (trunc_satub.vuh) and the other does not (pack_satub.vuh) This patch also removes declaration of some intrinsics not use any longer in hvx_128.ll * Make IR dump messages in HexagonOptimize.cpp consistent with those in CodeGen_Hexagon.cpp * fix clang-format complaints --------- Co-authored-by: Steven Johnson <srj@google.com> 18 September 2023, 19:48:34 UTC
68a0341 [api] Promote Internal::Parameter to Halide::Parameter (#7829) * Promote Internal::Parameter to Halide::Parameter (to support Serialization API refactoring) * Make raw_buffer(), scalar_address(), and scalar_raw_value() methods protected. Make Pipeline and Serializer protected friend classes. * Add Parameter public interface to python bindings. Remove old stub internal interface from PyParam. * Remove blank line at start of function --------- Co-authored-by: Derek Gerstmann <dgerstmann@adobe.com> Co-authored-by: Steven Johnson <srj@google.com> 18 September 2023, 17:09:11 UTC
ab4067f Fixes for top-of-tree Halide (#7850) * Fixes for top-of-tree Halide * I am a bonehead 15 September 2023, 19:45:10 UTC
d7760f5 [tutorials] Add tutorial on JIT compile/execute performance (#7838) * Add tutorial on JIT compile/execute performance * Addressing comments from review. Fix punctuation and comment nits. Add timing estimates as comments. Add std::function example. Enable advanced scheduling directives. * Addressing comments from review. Added cases that match real usage patterns: 1. Defining and compiling the whole pipeline every time you want to run it (i.e. in the benchmarking loop) 2. Defining the pipeline outside the benchmarking loop, and realizing it repeatedly. 3. (optional) Same as 2), but calling compile_jit() outside the loop, saying what it does, and saying why the time isn't actually different to case 2 (benchmark() runs multiple times and takes a min, and realize only compiiles on the first run) 4. Compiling to a callable outside the benchmarking loop and showing that it has lower overhead than case 3 (if indeed it does. If not we may need to change the example so that it does, e.g. by adding a real input buffer.) * Addressing comments from review for style nits, and typos in comments. --------- Co-authored-by: Derek Gerstmann <dgerstmann@adobe.com> Co-authored-by: Steven Johnson <srj@google.com> 15 September 2023, 01:05:12 UTC
8797287 Update arguments in driver.cpp to match what correctness/simd_op_check has (#7842) 12 September 2023, 20:35:23 UTC
6569a83 Zen4 support (#7840) * Enable emission of float16/32 casts on x86 Fixes #7836 Fixes #4166 * Add support for zen4 * Add avx512_Zen4 target flag It's a superset of cannon lake, and a subset of sapphire rapids * Fix runtime detection, sapphire rapids CPUID bits * Fix comment * Don't catch bfloat casts * Fix Zen4 model number * Use llvm BFloat type for bfloat intrinsics * Give up on native bfloat16 conversion for now * Don't use llvm's bfloat type at all * Add missing enum * Fix constant in comment * clang-format 11 September 2023, 17:40:29 UTC
b704abd Iterate over lets in the correct order in VectorizeLoops (#7830) * Iterate over lets in correct order * Comments * Comments * Comments 06 September 2023, 21:32:44 UTC
836879e Enable emission of float16/32 casts on x86 (#7837) * Enable emission of float16/32 casts on x86 Fixes #7836 Fixes #4166 * Fix comment * Don't catch bfloat casts * Fix missing word in comment 06 September 2023, 21:29:27 UTC
02865e2 Add a check that PredicateLoads must be used in the outermost split of a dimension (#7788) * add a check that PredicateLoads must be used in the outermost split of a dimension * newline * use the repro example * fix * avoid check for every other tail strategy * update error message to point out what's not allowed --------- Co-authored-by: Steven Johnson <srj@google.com> 05 September 2023, 20:28:11 UTC
8188b42 Avoid generating name collisions in CSE (#7821) * Avoid generating name collisions in CSE Alternative to #7801 (See the discussion there) Fixes #4124 * Add missing test * Minor cleanup * clang-format 01 September 2023, 17:38:19 UTC
ddfb1dc Don't return an undefined Stmt() from IfThenElse visitor (#7816) Fixes #7815 01 September 2023, 17:37:50 UTC
24d846c Remove dead `auto-schedule` label in CMake (#7818) These were replaced by more granular labels. Also, drive-by fix to comment that needed plurals. 30 August 2023, 23:54:48 UTC
afc61b2 Update 'Check CMake file lists' action (#7809) * Update 'Check CMake file lists' action Several subcategories were missing -- let's add them and see if they should be there or not * bogus change * Add missing comments * Revert "bogus change" This reverts commit 80454b1313e1c06b5432d15287fa1f51185f70b6. 30 August 2023, 23:54:09 UTC
3a1dffe Move clang-tidy checks back to Linux (#7817) * Move clang-tidy checks back to Linux Recent changes in the GHA runners for macOS don't play well with clang-tidy; rather than sink any more time into debugging it, I'm going to revert the relevant parts of #7746 so that it runs on the less-finicky Linux runners instead. * bogus * Update Generator.cpp * Update Generator.cpp 29 August 2023, 16:23:44 UTC
fa136cb Ensure that multitarget AOT builds have consistent random sequence (#7717) * Fix CMake test for generator_aot_multitarget * Ensure that multitarget AOT builds have consistent random numbers If a Generator uses random_float() (or the int or uint versions), and is used in a multitarget build, we weren't resetting the counters for random generation between each subtarget... meaning that each subtarget would get a different random sequence, leading to some ery hard-to-debug test failures when running on different hardware variants. This PR ensures that the relevant counters are all reset before each subtarget is generated, so that each should see the same sequence of random number generation. * Update CMakeLists.txt * Update multitarget_aottest.cpp * Combine float/uint counters 29 August 2023, 16:21:59 UTC
fe9f0b7 [serialization] Add serialization support to generator interface (#7792) * Add serialization support to Generator interface * Clang format pass * Make target required when emitting a serialized pipeline (since schedule may be target dependent). Apply auto-scheduler before serialization so that schedules can be serialized. * Fix enum ordering for hlpipe. Fix hlpipe comments. Add missing hlpipe enum to pyenums. * Remove unused Serialization build_mode * Fix formatting * Remove unused serializable flag. Remove redundant cpp_stub check. Fix comments. * Safeguard emit_hlpipe calls with #ifdef WITH_SERIALIZATION --------- Co-authored-by: Derek Gerstmann <dgerstmann@adobe.com> Co-authored-by: Steven Johnson <srj@google.com> 28 August 2023, 18:13:53 UTC
79d2be3 Update clang-tidy action to stop breaking (#7808) * Switch clang-tidy action from macos-13 to macos-latest `macos-latest` is actually macos-12 (macos13 is considered "beta" on the GHA runners). Hopefully this will fix the recent install snafus that are breaking clang-tidy. * Bogus change to trigger check * Update presubmit.yml * Update presubmit.yml * Update presubmit.yml * Revert "Bogus change to trigger check" This reverts commit a70f9ed8e6032d4b7799ff0cf6c009a7d2f92b3a. * Update presubmit.yml 28 August 2023, 17:21:38 UTC
8ac1e1c Add jump-buttons to get fro Stmt directly to Assembly (#7793) Co-authored-by: Steven Johnson <srj@google.com> 28 August 2023, 16:46:45 UTC
69c75b3 Update WebGPU to latest Emscripten/Dawn API (#7804) * Update WebGPU to latest Emscripten/Dawn API - Updated mini_webgpu.h to be in sync with Dawn as of commit ded6610f45a8826db37b52d73121a66b74d8aa61 - Updated the use of SetDeviceLost callbacks to be in the DeviceDescriptor instead of a separate call - Updated a couple of fields that got renamed - Update webgpu.cpp and gpu_context.h to always use wgpuCreateInstance() and wgpuInstanceRelease(), since the Dawn node bindings now support & require them * clang-tidy 24 August 2023, 23:12:19 UTC
84faa68 [wasm] Enable PIC for WebAssembly on LLVM v18.x (#7803) * Enable PIC code generation for WebAssembly for LLVM >18. Enable +mutable-globals to support dynamic linking * Fix LLVM v18 interface changes for writeArchive() Add RelLookupTableConverterPass for PIC (in LLVM v18) * Resolve conflict for writeArchive interface changes. * Clang format pass --------- Co-authored-by: Derek Gerstmann <dgerstmann@adobe.com> 24 August 2023, 22:22:22 UTC
84af2cd Add support to the makefile for serialization (#7762) * Add support to the makefile for serialization * Fix deps * Fix for no flatc, and for homebrew --------- Co-authored-by: Steven Johnson <srj@google.com> 24 August 2023, 22:18:09 UTC
f56b9ad Remove some unused includes (#7799) 24 August 2023, 21:48:26 UTC
678ea32 [ARM] support new udot/sdot patterns (#7800) 24 August 2023, 19:49:57 UTC
88c75ec [ARM] Distribute shifts as muls (#7790) * [ARM] distribute shifts as muls This reverts commit eba8f325edfaaa7b11c52a19435200f6b28e539a. --------- Co-authored-by: Steven Johnson <srj@google.com> 24 August 2023, 17:31:26 UTC
e8df5cf Fix for top-of-tree LLVM (#7798) 23 August 2023, 18:05:37 UTC
acc9413 Don't inject undef() in the simplifier (#7791) We shouldn't be using undef() in the simplifier. This replaces a load with a constant false predicate with a zero instead. I also added a guard around some dubious logic about out of bounds loads. out of bounds loads may be reachable if they have a false predicate, so I changed this simplification to only trigger if the load is unpredicated. 22 August 2023, 15:49:44 UTC
6efecbe slice IRMatcher should only match on slices (#7772) * slice IRMatcher should only match on slices Fixes #7768 * Add test 22 August 2023, 15:49:29 UTC
fcc1c3b [Hexagon] -Build Hexagon runtime components using the Hexagon SDK (Clone of #7671) (#7741) * Add CMakeLists.txt to build the hexagon_remote runtime. * Print an error message if libhalide_hexagon_host.so is not found. * Fix case mismatch in hexagon_remote/CMakeLists.txt * Remove some code that had been commented out in hexagon_remote/CMakeLists.txt * Remove unused argument in macro in hexagon_remote/CMakeLists.txt * add find module for Hexagon * move more variables to find module * Build binary modules with ExternalProject * group platform-speicifc sources into subdirectories * Pass HEXAGON_TOOLS_ROOT, too * Use the desired layout for the build-tree artifacts * Use SYSTEM for Hexagon SDK include dirs * trigger buildbots * Ignore code in src/runtime/hexagon_remote/bin/src for clang-tidy * Just skip hexagon_remote entirely for Halide_CLANG_TIDY_BUILD * Add an option to enable the building of the hexagon remote runtime --------- Co-authored-by: Alex Reinking <quic_areinkin@quicinc.com> Co-authored-by: Steven Johnson <srj@google.com> 21 August 2023, 21:16:51 UTC
708d41b Don't introduce reinterprets in find/lower intrinsics (#7776) 21 August 2023, 18:45:05 UTC
f11e80d Fix out of bounds access in anderson2021_test_apps_autoscheduler (#7771) * Fix out of bounds access in anderson2021_test_apps_autoscheduler * clang-format 21 August 2023, 17:06:07 UTC
36eb0b2 Try to fix remaining ASAN-reported leaks (#7767) This fixes all but one of the known remaining ASAN-related leaks; the remaining is in `tutorial_lesson_19_wrapper_funcs` I can't debug that one locally because the leaks are in OpenCL and I am temporarily relegated to using a 'cloud' machine with no real GPU for linux-x64 -- if someone with access to such a machine could take a look, I'd appreciate it (examples of leakage at https://buildbot.halide-lang.org/master/#/builders/154/builds/79/steps/12/logs/tutorial_lesson_19_wrapper_funcs) 21 August 2023, 17:05:25 UTC
c50d11a Speedup page loading of VizStmt. (#7755) * Speedup page loading of VizStmt. Disabled line numbers in the syntax highlihgting of the assembly. Made syntax highlighting on-demand with a button. * Fix computedStyleMap() not available in Firefox. * Reanble assembly highlighting by default. 21 August 2023, 17:00:17 UTC
840ed4d Remove fragile simd_op_check test for mlal/mlsl on ARM (#7775) 18 August 2023, 22:31:32 UTC
4e6fe00 Fix vector reduce HTML (#7773) VectorReduce: Div cannot be in Span 17 August 2023, 18:07:11 UTC
f2f2af2 Define `cast<i32>(u32)` overflow behavior (#7769) uint32 -> int32 casting should not produce SIO 17 August 2023, 16:11:21 UTC
f75f68d Experimental serializer (#7594) * init * sync * single func pipeline round-trip test * roundtrip test framework completed, single output function tested, no Dag yet * serialize Stmt, partially done (cuz no support of Expr yet), not fully tested * deviceAPI MemoryType ForType * Expr, with a grain of salt * fix exprs in stmts * format everything * Range * fix undefined exprs and stmts * address some review comments: - proper using - proper includes - rename Serdes -> Serialize * address more review comments - rename .hlb/.hlr to .hlpipe - reserve vectors - proper memory management * deserialize_expr_vector * support bound, storageDim, loopLevel and funcSchedule * Specialization, Definition * sync commit * temporarily comment out func mapping stuff to remove blockers * helper funcs * call_type and reduction_domain * ModulusRemainder and VectorReduceOp, some minor refactoring * prefetch directive * name mangling and closing on function's odds and ends * split * dim * stage schedule * tidy * parameter * more parameter * check nullptr and some minor fix * fix crashing * func index replacing func ptr during serialization * extern func arg, some minor cleanup * replace cerr with halide assert * buffer?? * remove printer * fix * wrappers in func_schedule * clear func mapping to use serializer for more than 1 pipelines, use unordered_map also * attempt to move serialization into core, get cmake working for now * fix * we maybe don't need submodule * fix cmake * make headers work again, with some hacks ofc * serialization now lives in libHalide * testing 101 * don't include flatbuffers header in Halide.h * fix * namespace adjust * user_assert * fix a missing field * fix missing type info in some exprs * fix bug in function mapping * fix function DAG broken issue * format * rm cout in cpp files and change test group name * fix the case func ptr is not defined * add a missing call type deserialization * serialize unique parameters * serialize unique buffers * fix missing type in parameter * fix a missing tail stra * change find_transive_call to build_enviroment to include wrappers in the DAG * upstream current test strategy, intercept JIT compilation for each pipeline, serdes ronudtrip and back * make sure buffer memory layout are the same * don't use ir comparator to compare pipelines, we will use jit tests from now * don't serialize Parameter's buffer, compute external buffers from Call, Variable and ExternFuncArgument and don't serialize them as well * fix, 35 tests remaining * fix output function orders * reuse jit_externs since we cannot really serialize it, 29 tests to go * fix that buffer_constraints, host_alignment and memory_type are incorrectly removed, also add missing exact in split * only use outputs and requirements from deserialized pipeline during testing * nits * add missing requirements during deserialization * restore original pipeline's contents after lowering * address some review comments * Install flatbuffers for clang-tidy * use std::map to make results the same on different compiler * proper way to handle cropped buffers * fix cmake build using alex's branch * try set flatbuffers_DIR explicitly * case sensitive? * rename serialization test env var * cleanup Serialization.cpp * format * have halide version embedded in the file identifier * nits and comments * format * try make clang-tidy happy and const a lot of things * const more things * support istream input * nit * add template function deserialize_vector * nit * attempt to integrate serialization test * line breaks * remove hack in compile_jit, at least for now * fix * add #ifdef guards * format * try nolint * special case two files so clang-tidy will be happy * Make Flatbuffers-missing error more useful * Make a few final changes - change BUILD_SERIALIZATION -> WITH_SERIALIZATION to match other flags better - fix capitalization of the CMake package (must be `FlatBuffers` for some Linux usage) - add stub calls to the de/serialization calls when building without Flatbuffers * Oops addition * clang-format * Add temporary debug hackery * more hackery * grr * sdfsdf * sigh, capitalization * One more try * Update presubmit.yml * No more mr nice guy * Update CMakeLists.txt * Revise build rules & script to allow clang-tidy for the new files * Update CMakeLists.txt * Apply clang-tidy fixes * Fix target for generated header * Prefer to use FetchContent for flatbuffers * Fixes * set PIC on * more pic * fix attempt * fix attempt * try macos * coreutils * Update run-clang-tidy.sh * noquiet * final again? --------- Co-authored-by: Steven Johnson <srj@google.com> 11 August 2023, 21:45:38 UTC
93514c3 StmtViz: Search for tooltip only in the child node (#7754) Search for tooltip only in the child node Further cut ~5 second of StmtVisualizer rendering by searching for the tooltip text-box in the child node of the current button. Previously, the script compose the global ID with regular expression, and then search the entire DOM causing delays. 10 August 2023, 20:42:49 UTC
7054828 Improve error-handling in Anderson2021, and ensure build deps are cor… (#7748) * Improve error-handling in Anderson2021, and ensure build deps are correct * clang-format 10 August 2023, 17:01:12 UTC
150a930 [vulkan] Fix SPIR-V IR references causing leaks (#7739) * Remove unnecessary parent refs and owning function/block refs. Add explicit clear methods for contents structs and destructors. * Move objects when changing ownership --------- Co-authored-by: Derek Gerstmann <dgerstmann@adobe.com> 07 August 2023, 19:30:41 UTC
7b45542 [vulkan] Fix heap buffer overflow in Vulkan extension handling discovered by ASAN (#7740) Fix heap buffer overflow in Vulkan extension handling discovered by ASAN Co-authored-by: Derek Gerstmann <dgerstmann@adobe.com> 07 August 2023, 19:29:25 UTC
2f5c4d2 Revert accidental typo change in #7746 (#7747) 07 August 2023, 18:14:59 UTC
af56605 Permit llvm 15 on windows (#7744) Our build instructions for windows are currently broken, because vcpkg is still on llvm 15. This PR unbreaks them. Re-enabling any testing of llvm 15 to be discussed. 07 August 2023, 16:24:03 UTC
25028cd Allow optional sorting of profiler output via HL_PROFILER_SORT env var (Fixes #7638) (#7639) * Allow optional sorting of profiler output via HL_PROFILER_SORT env var (Fixes #7638) * trigger buildbots * Update profiler_common.cpp * Update float16_t.cpp * Update float16_t.cpp * Update float16_t.cpp * Update float16_t.cpp 07 August 2023, 16:13:41 UTC
c254043 Fix leaks in test/correctness/memoize.cpp (#7705) * Fix leaks caused by self-referential parameter constraints * Add comment * Add missing overrides * Fix reported leaks in memoize test by explicitly releasing the shared runtime at the end of the test * Use const refs for non-mutated args * Hopefully fix for windows * Fix for 32-bit pointers * Don't use _aligned_malloc It requires _aligned_free, which the runtime aint gonna do * Fix other memoize test * Use runtime built-in malloc/free On windows mixing and matching mallocs and frees doesn't work well. * Fix comment --------- Co-authored-by: Steven Johnson <srj@google.com> 05 August 2023, 18:52:46 UTC
f39576f Fix infinite recursion in loop partitioning (#7743) * Fix infinite recursion in partition loops We weren't stripping the likely tags off the unlikely case on a store/load predicate, resulting in infinite recursion. * Add test * Remove accidental return 05 August 2023, 18:52:23 UTC
87087f1 Run clang-tidy on macOS runners instead of Linux (#7746) * Run clang-tidy on macOS runners instead of LInux The current macOS runners have twice the RAM and more CPU power. Also, drive-by change to allow specifying the parallelism that the run-clang-tidy script should use (defaults to nproc) * Update Generator.cpp * Update run-clang-tidy.sh * Update run-clang-tidy.sh 04 August 2023, 23:36:26 UTC
48b3df6 Speedup the VizIR HTML. (#7713) * From 12s to 2s, by eliminating the bulk of the $() calls. * Speed up recursive depth function by not using jQuery. * Changed out CodeMirror for Speed-Highlight. Additionally several fixes regarding the StmtViz. --------- Co-authored-by: Steven Johnson <srj@google.com> 04 August 2023, 17:00:41 UTC
bc30d6f Revise labels on autoscheduler tests (#7732) * Revise labels on autoscheduler tests This is step 1 in fixing https://github.com/halide/Halide/issues/7731: it replaces the `autoschedulers` tag with more granular ones, so that we can modify the build script to test the right autoscheduler(s) for a given backend. (Note that the `autoschedulers` tag was unused by the buildbots, which only used the generic `auto_schedule` tag.) Step 2 will be to modify the buildbot script after this lands to use the new tags above. Step 3 will be to remove the `auto_schedule` tag. * Fix anderson2021 labels 03 August 2023, 00:18:33 UTC
734df3f Clean up really long line lengths in Anderson2021 (#7728) * Clean up really long line lengths in Anderson2021 We don't have an explicit line length limit in Halide, but generally consider 120 to be a reasonable extent; a lot of code in Anderson2021 went waaaay over this limit, especially function/method calls. I did a semi-manual cleanup to try to clean up the worst offenders. Should be 100% cosmetic. * Add LoopNestMap * Fixes 02 August 2023, 18:12:25 UTC
ef24391 Ignore code in src/runtime/hexagon_remote/bin/src for clang-format (#7736) 02 August 2023, 17:21:17 UTC
8fe4f99 Fix leak on cloning functions with update defs (#7735) * Fix leak on cloning functions with update defs When cloning a Func with an update def, the remapping map resulting from the deep copy may already contain a key for the wrapped function pointing to a strong reference to itself. The reasons are unclear to me, but it means that emplace silently does nothing and we get a memory leak because the cloned Func's update definition has a strong self-reference after the remapping is applied. We want to replace it with a weak reference, so this PR changes things to use operator[] instead of emplace. * Add comment 02 August 2023, 16:39:44 UTC
0839270 Attempt to fix #7703 (#7706) * Attempt to fix #7703 * fixes * Update LoopNest.cpp * Update GPULoopInfo.h * Fixes. * clang-tidy 01 August 2023, 20:55:28 UTC
831fd1a Fix RDom usage in anderson2021_test_apps_autoscheduler (Fixes #7729) (#7734) 01 August 2023, 16:36:15 UTC
3ced617 [Hexagon] - Fix problems in sim_host.cpp (#7725) * Fix problems in src/runtime/hexagon_remote/sim_host.cpp reported by clang-tidy and clang-format 01 August 2023, 14:45:00 UTC
ef51a23 Remove unused using decl (#7730) Also convert a std::vector to a vector in a file that has using std::vector 01 August 2023, 00:07:29 UTC
9f43580 Change default generator timeout to infinite (#7718) 31 July 2023, 21:51:21 UTC
f54bc08 Fix handling of thread features for scalars in Anderson2021 (#7726) * Fix handling of thread features for scalars * Remove unneeded change 31 July 2023, 21:27:01 UTC
fca8d96 Making Metal code-gen a bit faster (#7720) removing redundant print_expr() call 28 July 2023, 16:55:19 UTC
89ffae2 Making HLSL code-gen a couple orders of magnitude faster... (#7719) Removing redundant print_expr() 28 July 2023, 16:22:57 UTC
649a224 Fix CMake test for generator_aot_multitarget (#7716) * Fix CMake test for generator_aot_multitarget * Update CMakeLists.txt 27 July 2023, 22:56:50 UTC
df4c981 Throw an erorr if split is called with the same older and inner var name (#7715) * throw an erorr if split is called with the same older and inner name * update * fix naming * rewording * add test --------- Co-authored-by: Steven Johnson <srj@google.com> 27 July 2023, 15:13:02 UTC
09c5d1d Default WITH_TEST_FUZZ to OFF (#7695) * Fix for top-of-tree LLVM * Default WITH_TEST_FUZZ to OFF Just because our compiler supports fuzzing doesn't mean we want to build the fuzz tests, because they won't really build properly without the right preset specified. (This will be followed up with a change to the buildbot to set WITH_TEST_FUZZ to ON for fuzz tests) 26 July 2023, 22:25:43 UTC
back to top