https://github.com/halide/Halide

sort by:
Revision Author Date Message Commit Date
ddd614a TFLiteParser should handle the 2-input form of Reshape The new shape can be specified in a 2nd input (instead of the ReshapeParams); we should handle this properly 22 December 2020, 23:53:35 UTC
7916c84 Merge branch 'master' into interpret_nn 22 December 2020, 21:52:12 UTC
8a0f4a1 Fix sketchy shadowing that breaks on some compilers. Fixes #5581 (#5587) * Fix sketchy shadowing that breaks on some compilers. Fixes #5581 * Fix another sketchy shadowing. 22 December 2020, 01:54:36 UTC
eeb0bcf Upgrade to TFLite 2.4.0 (#5586) TFLite v2.4.0 has gone final, and there is a prebuilt version in AAR form for us to use on Android, so let's upgrade the code to assume 2.4 as a baseline. 21 December 2020, 23:44:53 UTC
b22598c Remove unused wildcard/type info. (#5582) 21 December 2020, 22:56:00 UTC
5ac8808 Fix several bugs on Hexagon and some cleanup (#5570) * Fix several bugs on Hexagon. * clang-format actually found a bug 21 December 2020, 21:54:37 UTC
80d8028 Merge branch 'master' into interpret_nn 21 December 2020, 21:53:29 UTC
1dbcf19 Decouple wasm's +bulk-memory from wasm_threads (#5574) * Decouple wasm's +bulk-memory from threads When `wasm_threads` was added, `+bulk-memory` codegen was enabled in conjunction with this feature, due to some inscrutable error which apparently didn't get recorded. From inspection of the spec for bulk-memory, and experimentation with the most recent version of Emscripten (2.0.10), I can't find any reason that this actually needs to be enabled, so I've moved it into its own new feature flag. Also: drive-by fix in Target to format the tables in `get_runtime_compatible_target()` better, and to remove some wasm-related entries from the 'must match' table that didn't actually need to match. * Create .gitignore 21 December 2020, 17:08:20 UTC
ef45c87 Fix for trunk LLVM (#5576) 19 December 2020, 00:45:34 UTC
724ce83 Fix code that will break with dsharletg/lower-patterns 17 December 2020, 23:56:29 UTC
7b821ce Merge branch 'interpret_nn' of https://github.com/halide/Halide into interpret_nn 17 December 2020, 22:45:05 UTC
83b040d Add a feature to name cached memoizations and to evict them by name. (#5510) This PR adds an optional ```EvictionKey``` parameter to the ```memoize``` scheduling option. EvictionKeys are user provided labels of up to 64-bits that can be used to request that labeled items in the cache be removed to free up space. Co-authored-by: Steven Johnson <srj@google.com> 17 December 2020, 21:40:16 UTC
6aa31e6 Merge branch 'master' into interpret_nn 17 December 2020, 19:35:39 UTC
590b253 D3D12: refactoring of kernel argument constant buffer packing (#5569) * initial setup for Direct3D 12 support for Windows-on-ARM * fixing runtime modules for Windows on ARM 64 * typo * wrapping windows_clock for Windows on ARM support * temporarily disabling windows_clock_[x86/arm] * wip * Set -fshort-wchar on generic Windows runtime target. * removing windows_clock specializations * replacing accidental tabs * addressing code review comments * Hoist fpic in CMakeFile. Mirror CMake changes to Makefile. * Add explanatory comment to Makefile * Add arm64-windows and -windows-d3d12compute targets to correctness_cross_compilation * fixed kernel parameter packing * Run clang-format * handling Bool -- UInt(1), and Int(1) as well just in case * Add previously-failing D3D12 test * Add new test to CMake Co-authored-by: Marcos Slomp <slomp@adobe.com> Co-authored-by: Shoaib Kamil <kamil@adobe.com> Co-authored-by: Shoaib Kamil <shoaibkamil@gmail.com> Co-authored-by: Steven Johnson <srj@google.com> 17 December 2020, 19:22:09 UTC
7fd3a7b Windows on ARM64 support (CPU, and also GPU through D3D12) (#5544) * initial setup for Direct3D 12 support for Windows-on-ARM * fixing runtime modules for Windows on ARM 64 * typo * wrapping windows_clock for Windows on ARM support * temporarily disabling windows_clock_[x86/arm] * wip * Set -fshort-wchar on generic Windows runtime target. * removing windows_clock specializations * replacing accidental tabs * addressing code review comments * Hoist fpic in CMakeFile. Mirror CMake changes to Makefile. * Add explanatory comment to Makefile * Add arm64-windows and -windows-d3d12compute targets to correctness_cross_compilation Co-authored-by: Marcos Slomp <slomp@adobe.com> Co-authored-by: Shoaib Kamil <kamil@adobe.com> Co-authored-by: Shoaib Kamil <shoaibkamil@gmail.com> 16 December 2020, 23:23:50 UTC
10a01dd Move CodeGen_Hexagon to internal linkage and don't include it without WITH_HEXAGON (#5567) * Move CodeGen_Hexagon to internal linkage. * Move using out of the #ifdef * clang-tidy 16 December 2020, 23:08:32 UTC
0cfa6db Fix minor wasm issues (#5566) * Fix minor wasm issues - If wasm_threads is in the target string, be sure to launch the external shell with --experimental-wasm-threads. - All the test/performance tests should detect wasm and explicitly skip * Disable noisy warning in WABT 16 December 2020, 18:19:52 UTC
34d35a3 Add special case for printing broadcast shuffles. (#5565) 16 December 2020, 18:18:36 UTC
94da4f6 autoscheduler: prepend, don't override LD_LIBRARY_PATH in adams2019 test (#5563) 15 December 2020, 22:18:36 UTC
fef82c2 cmake: detect ppc64le arch (#5558) 15 December 2020, 17:57:59 UTC
a0bbf43 Fix fragile Makefile for apps/onnx (#5540) The protoc usage happened to work when building in the app folder but often failed when building from the toplevel Makefile. Also, drive-by silencing of noise from curl, and drive-by fix of "redundant copy" warning in onnx_converter.cc. 15 December 2020, 17:57:33 UTC
ee2e5df Upgrade WABT version to 1.0.20 (#5557) 15 December 2020, 17:57:00 UTC
d37b995 Better document how LLVM_DIR works. (#5560) 14 December 2020, 23:04:36 UTC
9fea59f Bug fix for lossless_cast with minor additions (#5459) * Bug fix for lossless_cast with minor additions The bug can seen for types where lossless_cast type can represent cast->value.type() but not cast->type. For eg: lossless_cast(UInt(16), cast(Int(8), Variable::make(UInt(16), e))) returns (uint16)e which is incorrect. The patch also adds lossless_cast of Mod and Ramp expressions. * Handle Mod for negative numbers in lossless_cast. * Add lossless_cast test for VectorReduce. * Rename check to check_lossless_cast. * clang-format complains * Remove Ramp and Mod from lossless_cast. * Minor changes * Update test/correctness/CMakeLists.txt Co-authored-by: Ankit Aggarwal <aankit@quicinc.com> 14 December 2020, 17:01:27 UTC
ed8f7c2 Hide inaccessible symbols in internal linkage (#5548) * Hide inaccessible symbols in internal linkage. * clang-format * Remove redundant static. 14 December 2020, 04:46:38 UTC
f9153e8 Mark Target::OpenGL (etc) as deprecated (#5475) (#5551) 13 December 2020, 22:19:38 UTC
4fa78b6 change StrongestExprNodeType for rewriter (#5554) 13 December 2020, 18:34:17 UTC
b962495 Merge branch 'interpret_nn' of https://github.com/halide/Halide into interpret_nn 12 December 2020, 00:26:00 UTC
8fe96ca Merge branch 'master' of https://github.com/halide/Halide into interpret_nn 12 December 2020, 00:25:20 UTC
a0ddabe Remove <iostream> from the code generated by CodeGen_C (#5547) 12 December 2020, 00:16:35 UTC
67fbecb Merge branch 'master' into interpret_nn 11 December 2020, 00:28:55 UTC
968f6b3 VectorReduce peephole matching for Hexagon (#5424) * CodeGen for VectorReduce for Hexagon * Remove use of MAKE_ID_PAIR. * Fix clang-format errors. * Spelling correction. * Address comments from PR. Use Shuffle::make_concat instead of vcombine. * Remove IROperator changes. * Address comments * Move even-odd shuffling for vrmpy to runtime .ll func * clang-format + hvx_128 changes.ll changes * clang-format * Minor changes * Minor changes * interchange vshuffvdd operand Co-authored-by: Ankit Aggarwal <aankit@quicinc.com> Co-authored-by: Steven Johnson <srj@google.com> 10 December 2020, 21:32:56 UTC
000677b Merge branch 'master' into interpret_nn 10 December 2020, 21:23:20 UTC
ad414e2 Add possible simplify to GL(Compute) `pow` function (#5517) OpenGL(Compute) generates `select` IR for `pow(a, b)` function, which can be simplified when `a` or `b` is const. 10 December 2020, 17:10:50 UTC
5e526d4 Modify memoization code to allow using min/extent/stride of an input. (#5542) Modify memoization code to allow using min/extent/stride of a buffer as part of memoization without wrapping in memoize_tag. This seems reasonable and failure to support this causes tricky to diagnose errors if one uses the extent of an input in an RDom that is then used in a memoized Func. The code pattern here is a bit heuristic in that I can't think of a case where a Var has a buffer but the reference isn't to a *field* of the buffer. If this turns out to be incorrect or to become invalid in the future, the code could be extended to pattern match the variable name. 10 December 2020, 09:40:20 UTC
b05f23b Fix rounding-shift, tweak compare_vs_tflite (#5541) * Add code to allow running compare_vs_tflite on Android devices * Fix rounding-shift, tweak compare_vs_tflite round_shift_right() now mimics the logic of ARM rounding-shift instructions (and should emit them once some upstream changes land). Tweaked compare_vs_tflite to try to minimize error reporting as a result. (Note, this is additive to the change in https://github.com/halide/Halide/pull/5537) * Use a saturating add in round_shift_right * Fix saturating_add 09 December 2020, 23:11:00 UTC
89b6558 Add code to allow running compare_vs_tflite on Android devices (#5537) 09 December 2020, 21:47:47 UTC
382c807 Update images used in apps/ tests (#5538) Some of them weren't the same as the Make equivalents, which meant that the test diverged between the two build systems (sometimes causing failures due to too-large images). 09 December 2020, 12:06:57 UTC
b83de89 Pathnames may or may not be absolute so loosen comparison to allow for this. (#5535) 09 December 2020, 04:25:41 UTC
437025e CHECK() was broken My recent changes meant the log output for CHECK() never got emitted (abort() was called first). Refactored to fix this. 09 December 2020, 01:40:26 UTC
873c8f1 Solve the COMDAT in runtime failing on Mac OS X problem once and for all. (#5532) Solve the COMDAT in runtime failing on Mac OS X problem once and for all by removing Comdat IR annotations in runtime on Mac OS and iOS. 08 December 2020, 21:39:04 UTC
e2f463f Revamp TFLite glue code and versioning (#5534) Some work to enable running compare_vs_tflite on-device in subsequent PRs... - Downgrade the TFLite version we target to 2.3.x - Add TFLITE_VERSION define to avoid losing changes needed for 2.4.x (etc) - Switch to relying on a shared-library build of TFLite (rather than static library), as getting static-library builds for Android is apparently ~impossible without heavy customization of TF build files 08 December 2020, 20:15:00 UTC
aa1cd2b Fix various Linux/GCC compile errors 08 December 2020, 19:44:12 UTC
dcce815 Merge branch 'master' into interpret_nn 08 December 2020, 18:45:47 UTC
42b1a6e Add overloaded intrinsic mechanism to simplify code generation (#5527) * Add table of instructions for ARM. * Add CodeGen_LLVM::Intrinsic and related helpers. * Use call_elementwise_intrinsic for more patterns. * Clean up intrinsics a bit. * Use call_elementwise_intrinsic for x86. * More clean-up and comments. * Add comment * Use call_elementwise_intrinsic for pmaddwd * Remove stray comment. * Move a few more things to overloaded intrinsics * Remove unused runtime functions. * Fix some corner case target flags * ssse... * Run clang-format * Replace introspection test. * Remove x86_avx512 initmod * clang-tidy * Remove x86_avx512 from makefile too * Revert simd_op_check * clang-format off on tables * Update Generator.cpp * Update Generator.cpp * Fix requirement for abs_i8x32 * Review fixes * Temporarily work around webassembly strangeness. Co-authored-by: Steven Johnson <srj@google.com> 08 December 2020, 01:11:28 UTC
8a8e441 Use the C API in compare_vs_tflite (not C++) 08 December 2020, 01:07:18 UTC
d1651d5 Upgrade logging code in interpret_nn (#5530) To make some integration with the Delegate code easier, I upgraded our logging code to mimic their usage pattern a bit. This is a little gratuitous but will likely pay off in the Delegate code. - Replaced LOG_FATAL with CHECK(0) everywhere - Added LOG(INFO/WARN/ERROR), all of which are wrappers that output to stderr (and also to __android_log on android) - Made the guts of CHECK() a little more purpose-built - Moved guts of Logger into a cpp file to hopefully reduce code bloat from logging - Made inclusion of __FILE__ and __LINE__ info contingent on NDEBUG, to reduce internal info of logging that might be desirable to leave in release builds. - Converted some (but not all) usage of std::cerr to use LOG(whatever). 07 December 2020, 21:53:58 UTC
6f5dc6b Ensure that QuantizationInfo.dimension is initialized Currently, it can contain garbage after parsing 07 December 2020, 19:27:17 UTC
4f554a7 Change #include path style to disambiguate headers (#5529) 07 December 2020, 19:21:35 UTC
8e7c992 Merge branch 'master' into interpret_nn 07 December 2020, 18:53:46 UTC
7f70907 Combine align and slice for the small vectors in align_loads (#5497) * Combine align and slice for the small vectors in align_loads * Fix format 07 December 2020, 17:17:14 UTC
1800dc2 Simplify a slice of slice (#5495) * Simplify a slice of slice * Fix format * Simplify for slice of concats + tests * format * format * New line to improve readability Co-authored-by: Steven Johnson <srj@google.com> 07 December 2020, 01:45:39 UTC
bd53b47 Allow creation of IntImm/UIntImm with any number of bits up to 64 (#5441) * Allow creation of IntImm/UIntImm with any number of bits up to 64 * Changes: - check that the number of bits is >= 1 - modify upgrade_* functions - allow printing of type with arbitrary number of bits. * Fix format * next_power_of_two which will end Co-authored-by: Steven Johnson <srj@google.com> 06 December 2020, 20:39:26 UTC
7ea09cd Point fft JIT tests to Halide binary (#5521) 06 December 2020, 07:38:56 UTC
d325e13 Add simd_op_check tests and a few more patterns (#5519) * Add simd_op_check coverage of some ARM ops we generate. * Remove local filter option. * Fix expected patterns for arm32. 04 December 2020, 16:38:05 UTC
c1885fc Fixes to bounds inference on shift_left (#5477) * Add shift_left fix for signed integers by possibly negative values + regression test * add required condition on shift_left integer fix * add type check to shift_left minimum condition * fix constant folding of shifts with |b| >= type.bits() for types that allow overflow (failes correctness/simplify test) * make regression tests use scoped bindings * change condition in case int24/int48 proposal happens soon * revert changes based on overflow expectations * add more regression tests * clarify comment * add shift_left min handler for b only UB * fix clang-tidy complaint * relax shift_left of non-negative value constraint * pull case outside of unnecessary preconditions * fix clang-format complaint * fix broken precondition * add typecheck to possibly save a can_prove() call * add easy-out type check to precondition * Add descriptive comment to bug fix + add another early-exit precondition Co-authored-by: Steven Johnson <srj@google.com> 04 December 2020, 00:14:07 UTC
28f9aef Enable commented clang-format option. (#5520) 03 December 2020, 22:05:21 UTC
927edeb Merge branch 'master' into interpret_nn 03 December 2020, 18:31:24 UTC
759b241 Add version-checking to the clang-tidy and clang-format scripts (#5513) Using the 'wrong' version of the tools will produce results out of sync with our presubmit tests, so add checking to ensure the user has their env set up correctly. 03 December 2020, 18:04:00 UTC
2ddd0b0 Revert "Make context handling in GPU runtimes more consistent and robust. (#5474)" (#5515) This reverts commit f47c5c99deac86c6d1f16cfcb1743a0e9e79317d. 03 December 2020, 02:10:58 UTC
2c8e3ea Revert "Fix broken destroy_context() in gpu_multi_context_threaded_aottest.cpp (#5512)" (#5514) This reverts commit 445ed5ee5ba5e23efaabe0b8d6971c0678b5a569. 03 December 2020, 02:08:31 UTC
445ed5e Fix broken destroy_context() in gpu_multi_context_threaded_aottest.cpp (#5512) 03 December 2020, 00:35:48 UTC
a34d00d Adding CMake build for FFT (#5508) * Add fft build * Fix properties * Fix generator argument * Add "Success!" message to fft aot test. * Formatting. * Fix target directory for bench_fft 02 December 2020, 22:44:43 UTC
f47c5c9 Make context handling in GPU runtimes more consistent and robust. (#5474) This PR adds a consistent GPU compiled kernel cache across the Cuda, Direct3D, OpenCL, and Metal runtimes. This cache is robust for kernels being used across multiple contexts and threads as well as using common code via a template. OpenGL and OpenGLCompute are not addressed due to issues in their implementation. There should be no regressions for those runtimes however. Adds tests for many GPU kernels and kernels across contexts and threads. Fixes a bug in CUDA runtime where some error message text in cuda_do_multidimensional_copy was not initialized. Fixes a bug in CUDA runtime where device release code did not run if CUDA libraries are directly linked into the executable. (This would have caused crashes due to the device allocation caching among other issues.) 02 December 2020, 22:40:21 UTC
073b8e4 Add CMake presets for 3.19+ users (#5506) * add CMakePresets.json and update docs * fix Windows presets * remove NDEBUG from GCC options * fix typo in README 02 December 2020, 22:19:34 UTC
1c0f824 Restructure apps to be fully external. (#5507) * Restructure apps to be fully external. * drive-by fix default Halide_TARGET * patch up fused apps build * remove doubled line * fixing multiple import for 3.16 * fix naming convention * Add missing #include <cstdio> 02 December 2020, 22:15:23 UTC
329a405 Enable constant folding of broadcasted constants (#5500) * Enable constant folding of broadcasted constants. * Make some scalar constant folding tests vectors. * Remove excessive simplify calls causing infinite recursion. Co-authored-by: Steven Johnson <srj@google.com> 02 December 2020, 18:29:08 UTC
ce684c6 Merge branch 'master' into interpret_nn 01 December 2020, 21:55:19 UTC
6cc24bb Fix compile time regression in fft (#5494) * Use equal instead of can_prove equality when examining enclosing scope There can be a lot of things in there, and can_prove is expensive. * Speed up bounds_of_inner_var By only expanding enclosing let stmts if the variable is actually used in the result, and by finding the last usage and then skipping anything earlier (skipping over nested producer nodes) Co-authored-by: Steven Johnson <srj@google.com> 01 December 2020, 20:49:09 UTC
6af4361 Fixes for trunk LLVM (#5499) 01 December 2020, 16:58:13 UTC
44c9a72 Reduce size of test image (#5496) 01 December 2020, 04:32:46 UTC
1ad6fb8 Fix case where simplifying interleaves might need a slice of the original vector (#5492) * Replace is_negative_negatable_const and associated cruft with lossless_negate. * Don't assume an interleave consumes all of the vectors it is shuffled from. * Add test of slices of interleaves. * Fix formatting * Rephrase logic. 01 December 2020, 04:31:39 UTC
491791d Simplify signed shifts more strongly (#5491) * Simplify signed shifts more strongly. * Simplify after negating b. * Also mutate other possibly simplifying cast. 01 December 2020, 04:31:00 UTC
edfc98b Restructure interpret_nn (#5498) * Restructure interpret_nn - Shuffle stuff into subdirs, remove some dead files, do some Makefile cleanup * Move tflite_parser -> tflite/ 01 December 2020, 00:52:36 UTC
7df01a5 Track TFLite 2.4, not master To simplify ongoing upkeep, let's have apps/interpret_nn track the TF 2.4 release (which is at 2.4.0-rc3 right now), rather than master. (This means removing the support for uint64 types in our TFLite-adjacent code, which was added to master post-2.4) 30 November 2020, 23:08:46 UTC
960f857 Fix All value from the ValType table (#5493) 30 November 2020, 22:58:10 UTC
682b771 Merge branch 'master' into interpret_nn 30 November 2020, 22:50:58 UTC
21afdc4 Align the base when doing strided loads from constant addresses (#5489) When we codegen something like f[ramp(x + 1, 2, 16)], where f is an internal allocation, we subtract the 1, do the dense load f[ramp(x, 1, 32)] and then take the odd lanes of the result. The reason for this is that it's likely that there's an f[ramp(x, 2, 16)] nearby, and aligning down the x+1 to x means we can share the dense loads and just deinterleave. This PR does the same when there's no x, just an odd constant. This means that cases like f[ramp(64, 2, 16)] + f[ramp(65, 2, 16)] now generate much better assembly. In one case I have it speeds up an entire pipeline by 8%, because aligning the loads in this way causes them to all be promoted off the stack into registers. 30 November 2020, 21:14:56 UTC
226b12c Improve speed of testing apps/ (#5482) * Improve speed of testing apps/ - Skip all app tests that are labeled as 'benchmarks' - Specify `--build-noclean` to avoid unnecessary full rebuilds * Change label 'benchmark' -> 'slow_tests' 30 November 2020, 19:12:07 UTC
16929df Add Type::widen and Type::narrow helpers. (#5478) * Add Type::widen and Type::narrow helpers. * widen -> wide, more uses of wide. * wide back to widen. Co-authored-by: Dillon Sharlet <dsharlet@gmail.com> 30 November 2020, 18:27:56 UTC
78489d0 Small cleanups/fixes (#5479) * Small cleanups/fixes peeled from lower-patterns2. * Fix derp * Fix possibly undefined evaluation order. * Smaller code. * Work around test issue. 30 November 2020, 16:15:16 UTC
49ca720 Replace is_negative_negatable_const and more logic with lossless_negate (#5490) * Replace is_negative_negatable_const and associated cruft with lossless_negate. * Add comment 30 November 2020, 15:43:18 UTC
bfbfacd Revert formatting of Hexagon intrinsic table (#5484) * Revert formatting of Hexagon intrinsic table * Revert one extra find and replace. 27 November 2020, 20:31:02 UTC
f911a89 Add as_intrinsic helper (#5480) * Add as_intrinsic helper. * Rename calls of known intrinsics. * Fix check_sio. 26 November 2020, 07:40:25 UTC
c9d7806 Add quantize_test 25 November 2020, 20:18:22 UTC
59bbc4d Simplify intrinsics of broadcasts to broadcasts of intrinsics (#5473) * Simplify intrinsics of broadcasts to broadcasts of intrinsics. * Fix broadcast elementwise simplifications for nested broadcasts. * broadcasted -> broadcast. 25 November 2020, 19:36:02 UTC
2ee4828 Add reshape_test 25 November 2020, 02:01:34 UTC
771e1ea Update buffer_util.h 25 November 2020, 01:13:08 UTC
27fa4b4 Fix bonehead mistake 25 November 2020, 01:05:34 UTC
92bcf19 Add pad_test 25 November 2020, 00:55:37 UTC
726ab95 Fix dopey code 25 November 2020, 00:37:24 UTC
b311b84 Add concatenation_test Also, drive-by fix to 'axis' parsing 25 November 2020, 00:19:18 UTC
073542e Reverse order of tensor axes in our tests 25 November 2020, 00:05:57 UTC
eebcd69 Add max_pool_test 24 November 2020, 23:18:33 UTC
84244eb Add stub test for FullyConnectedOp 24 November 2020, 23:09:34 UTC
2758853 Make more types amenable to use with CHECK 24 November 2020, 22:49:12 UTC
4c87186 Update conv2d_test.cpp 24 November 2020, 22:19:15 UTC
f5f1b20 Revert changes in Makefile.inc 24 November 2020, 22:04:30 UTC
9fbbfa4 Revert "Revert changes in Makefile.inc" This reverts commit cbff3c369ff2f491c52e994009c3e33060bc1ae1. 24 November 2020, 22:03:23 UTC
cbff3c3 Revert changes in Makefile.inc 24 November 2020, 21:54:15 UTC
back to top