Revision history - refs/heads/dsharletg/bitwise-intrinsics - origin: https://github.com/halide/Halide

visit type:

https://github.com/halide/Halide

19 April 2024, 08:20:39 UTC

Newer
Older

Revision	Author	Date	Message	Commit Date
c84785d	Dillon Sharlet	27 January 2021, 04:06:17 UTC	Add lowering for bitwise and patterns.	27 January 2021, 04:06:17 UTC
6e3fb56	Steven Johnson	26 January 2021, 04:09:53 UTC	FIx intermittent OSX Python crash (#5667) * FIx intermittent OSX Python crash The OSX buildbot has been crashing intermittently on some python tests; debugging showed that in some situations, Introspection's calls to `backtrace()` include bogus addresses (eg 0x08), which cause segfaults when you try to inspect memory near them. The reasons for this aren't entirely clear -- for instance, it only seems to repeat reliably when using the Makefile rather than CMake, and only when doing an 'out-of-tree' build. Rather than try to run this to ground further, this PR just checks for address fields that seem obviously unreasonable (first 256 bytes of address space) and ignore them. * Add -fno-omit-frame-pointer, update sanity check * Update Introspection.cpp	26 January 2021, 04:09:53 UTC
d4c27ca	Dillon Sharlet	25 January 2021, 21:51:57 UTC	Lower saturating arithmetic without widening (#5662) * Lower saturating arithmetic without widening, and handle it in lower_intrinsic. * clang-format, fix saturating sub * cout -> cerr * trigger buildbots Co-authored-by: Alex Reinking <alex.reinking@gmail.com> Co-authored-by: Steven Johnson <srj@google.com>	25 January 2021, 21:51:57 UTC
38be3e3	aankit-ca	23 January 2021, 01:06:20 UTC	Add rounding shift right instructions (#5664) Co-authored-by: Ankit Aggarwal <aankit@quicinc.com>	23 January 2021, 01:06:20 UTC
7cff481	Dillon Sharlet	22 January 2021, 21:18:47 UTC	Fix VSX min/max intrinsics. Fixes #5661. (#5663)	22 January 2021, 21:18:47 UTC
6b398a3	Andrew Adams	22 January 2021, 18:23:48 UTC	Better codegen for switch-statement-like if-else chains (#5595) Better codegen for switch-statement-like if-else chains And added a test that demonstrates writing a little interpreter in Halide and scheduling it.	22 January 2021, 18:23:48 UTC
8c57a1a	Steven Johnson	21 January 2021, 22:07:20 UTC	Use linker tools on OSX & Linux to limit exports (#4651) (#5659) * Use linker scripts on OSX & Linux to limit exports * Write script to detect appropriate linker flags. Co-authored-by: Alex Reinking <alex.reinking@gmail.com>	21 January 2021, 22:07:20 UTC
be7a6a3	Alexander Root	21 January 2021, 21:23:43 UTC	is_positive_const and is_negative_const broken for (some) casts (#5615) * let signed_const checkers fail for non-widening integral casts Co-authored-by: Steven Johnson <srj@google.com>	21 January 2021, 21:23:43 UTC
0ca0415	Steven Johnson	21 January 2021, 01:59:10 UTC	Remove all deprecated methods for Halide 12 (#5656) * Remove all deprecated methods for Halide 12 These were all marked as deprecated in Halide 11 (and probably Halide 10 too); let's go ahead and remove them in Halide 12. * Remove function bodies too	21 January 2021, 01:59:10 UTC
a785b53	Steven Johnson	20 January 2021, 02:25:51 UTC	Add Lambda.cpp (#5651) Functions/methods that are part of the Halide public API should (generally) not be inline, to ensure the function instantiation is always in libHalide.	20 January 2021, 02:25:51 UTC
7713b3a	Steven Johnson	20 January 2021, 02:25:27 UTC	Add Python & PyBind version checking to PyStubImpl.cpp (#5653) It's built separately from the rest of the Python bindings and could get out of sync separately.	20 January 2021, 02:25:27 UTC
57083e4	Andrew Adams	19 January 2021, 17:44:27 UTC	Fix cuda warp shuffle issue for narrow types (#5624) * Fix cuda warp shuffle issue for narrow types In the case where no shuffle was necessary, we were upcasting the type to 32-bits needlessly and causing chaos.	19 January 2021, 17:44:27 UTC
8a12c43	Alex Reinking	18 January 2021, 18:53:20 UTC	Upgrade pybind11 to 2.6.x (#5644) * Use pybind11 2.6.0, which fixes Python-finding bugs. * Update Generator.cpp * Update Generator.cpp * Update PyHalide.cpp * 2.6.0 -> 2.6.1 Co-authored-by: Steven Johnson <srj@google.com>	18 January 2021, 18:53:20 UTC
42c5182	Alex Reinking	15 January 2021, 19:34:45 UTC	Shrink tile size to fit in Mac Mini GPU memory. (#5647) * Shrink tile size to fit in Mac Mini GPU memory. * Fix comment per Shoaib's correction.	15 January 2021, 19:34:45 UTC
bb1ca3c	Steven Johnson	14 January 2021, 22:57:28 UTC	correctness_vector_math: skip hypot() test for LLVM10 (#5643)	14 January 2021, 22:57:28 UTC
722b93e	Steven Johnson	13 January 2021, 20:27:12 UTC	Add 11.1 as an acceptable LLVM version (#5640) * Add 11.1 as an acceptable LLVM version Apparently 11.1 was released but our Makefile only allows for 11.0. * Update Makefile	13 January 2021, 20:27:12 UTC
61ca4d2	Steven Johnson	13 January 2021, 20:12:26 UTC	Simplify CodeGen_OpenGLCompute_C (#5636) * Simplify CodeGen_OpenGLCompute_C Combines CodeGen_GLSLBase and CodeGen_OpenGLCompute_C into one class, removing unnecessary stuff from the OpenGL support code.	13 January 2021, 20:12:26 UTC
0dfdc0d	Alexander Root	13 January 2021, 08:11:56 UTC	fix typo on assert in lerp() (#5638)	13 January 2021, 08:11:56 UTC
6620563	Steven Johnson	13 January 2021, 01:06:52 UTC	Add TARGET_OPENGLCOMPUTE (#5637) Inadvertently removed code for properly enabling/disabling OGLC in #5626, this restores it properly	13 January 2021, 01:06:52 UTC
4ed4db8	Steven Johnson	12 January 2021, 23:09:16 UTC	Remove CodeGen_OpenGL_Dev (#5635) 5626 removed OpenGL support, but didn't remove this no-longer-needed class; removed it here. Moved CodeGen_GLSLBase into CodGen_OpenGLCompute_Dev.cpp (which is now the only subclass), but didn't yet attempt to consolidate them into a single class.	12 January 2021, 23:09:16 UTC
3defb66	Dillon Sharlet	12 January 2021, 23:02:59 UTC	Pattern match intrinsics in a target independent lowering pass (#5531) * Simplify intrinsics of broadcasts to broadcasts of intrinsics. * Add pattern matching of intrinsics lowering pass. * Fix broadcast elementwise simplifications for nested broadcasts. * More target independent pattern matching. * Progress on pattern matching for ARM and Hexagon. * broadcasted -> broadcast. * Broken pattern matching. * Fix broken build. * Match without patterns. * Try to match saturating_add. * Pattern matching working for some intrinsics. * x86 simd_op_check passing. * Most x86 and Hexagon patterns working. * Rename subtract -> sub, multiply -> mul. * Add widening_left_shift. * Remove bad simplification. * Fix some missed pattern issues. * Hexagon patterns mostly working. * Fix pmaddwd patterns * Start on ARM intrinsics. * Use shift intrinsics. * Revert formatting of Hexagon intrinsic table * Revert one extra find and replace. * Add table of instructions for ARM. * Add more patterns for rounding_halving_add. * Remove unused unsigned widening subtracts * Add intrinsics test * Remove bogus patterns. * Match lanes in shifts. * Progress on multiply-add Hexagon pattern matching. * Fix multiply-subtracts * Enable constant folding of broadcasted constants. * Fix some widening patterns * Fix double widening lossless casts * Use widen/narrow helpers * All Hexagon and x86 patterns working * Fix return type of widening subtracts. * Fix rounding shift right patterns * WIP simd op check * Add CodeGen_LLVM::Intrinsic and related helpers. * Use call_elementwise_intrinsic for more patterns. * Clean up intrinsics a bit. * Use call_elementwise_intrinsic for x86. * More clean-up and comments. * Add comment * Use call_elementwise_intrinsic for pmaddwd * Remove stray comment. * Move a few more things to overloaded intrinsics * Remove unused runtime functions. * Fix some corner case target flags * ssse... * Run clang-format * Replace introspection test. * Remove x86_avx512 initmod * clang-tidy * Remove x86_avx512 from makefile too * Revert simd_op_check * clang-format off on tables * Fix merge conflicts * Add abs and absd support * Pattern match some absd patterns * Clean up dead logic. * Remove duplicate merge content. * Update Generator.cpp * Update Generator.cpp * Also check one sided saturating add. * Fix requirement for abs_i8x32 * Fix some saturating add/sub patterns. * More ways to express rounding halving add/sub * Add widening_shift_right * Use lower_intrinsic to handle unknown calls. * Use pattern match results. * Fix incorrect patterns * All(?!) ARM shifts working * Remove unused declarations * Don't substitute all lets * Reduce code duplication in tables * Simplify negated shifts. * Handle possible pmaddwd overload resolution failures. * Fix some overflow cases. * Small cleanups. * Review fixes * Fix some broken patterns * Don't hardcode 4 * PatternMatchIntrinsics -> FindIntrinsics * Re-enable uhsub patterns * Add useful simplication to lower_int_uint_mod * Lower unknown intrinsics. * Also check for bitwise_and * Simplify bitwise_and(x, -1). * Add back some necessary simplifications. * Fix incorrect narrowing of widening add/sub. * Fix boneheaded add/sub swap. * Skip finding intrinsics for scalar types. * Add accidentally removed break. * Improve comments. * clang-format, clang-tidy, and other fixes * Try to fix compiler-specific errors, more clang-tidy. * More clang-tidy fixes. * Tweak comments on rounding_shift_left/right. * Move default mulhi_shr and sorted_avg lowering to lower_intrinsics. * Argh clang-format. * Better coverage of shift correctness. * Fix unknown intrinsic visitor. * Add comments to rounding shifts. * Add TODO for C++14 * Use pattern matching for vector reduce. * Simplify pattern matching a little. * Remove stray newline in vectorreduce ops. * Add HL_SIMD_OP_CHECK_FILTER env variable * Fix and refactor vector reduce codegen * clang-format * pmulhrsw doesn't exist until sse3! * Fix incorrect lack of fall-through * Don't lower int/uint division in FindIntrinsics. * Fix missing check of op type. * Don't handle Mod at all. * Programmatically generate split argument intrinsic wrappers. * clang-format * More clang-format * Clean up IRMatch helpers. * Small cleanups/review comments. * Fix boneheaded bug. * Fix addp * Fix float addp * Fix incorrect rounding shift saturation patterns. * Update Hexagon vector reductions. * clang-format * Small cleanups. * clang-tidy * Fix sign of shifts on arm32 * Fix LLVM10 workaround * Work around opaque LLVM failures. * Pattern match dp4a/dp2a * clang-format * Don't try to use shift_right_narrow patterns on invalid shifts. * Don't rely on mul visitor to produce shifts. * Fix arm32 * Address some review comments. * Speculatively fix non-locally-reproducing dot product failures * Fix CUDA dot products. * clang-format * Remove debugging code. * clang-tidy * Put needed check back. * Bring back some tests/patterns. * Renable saturating_add pattern. * Bring back a few more tests. * Fix mixed sign swapped ops case. * Better implementation of handling mixed signs. * Fix some issues. * Avoid ADL fights with common user helper functions. * Don't try to find intrinsics for bool operations. * Remove redundant patterns. * Bring back div/mod lowering. * Update PowerPC to use pattern matched intrinsics. Co-authored-by: Steven Johnson <srj@google.com>	12 January 2021, 23:02:59 UTC
27f55dd	Steven Johnson	12 January 2021, 17:55:36 UTC	Remove OpenGL support (part 1) (#5626) * Remove OpenGL support (part 1) Fixes #5475 This removes the OpenGL backend (but not the OpenGLCompute backend) from public use: - Remove Target::OpenGL - remove DeviceAPI::GLSL - remove Func::glsl() and Func::shader() - remove all OpenGL-specific apps and tests - remove HalideRuntimeOpenGL.h - remove some internal code that is OpenGL-only Note that there is still internal code that needs trimming; since the OpenGLCompute backend uses some of the same code, and some of the same build deps, and some of the same runtime shared-library loading, I tried to err on the side of leaving code/buildrules/etc in place for now, with the plan to clean that up in subsequent PRs. Note also that feature Target::EGL is still present, as I believe it is still useful in conjunction with OpenGLCompute.	12 January 2021, 17:55:36 UTC
9b99acb	Dillon Sharlet	11 January 2021, 21:25:36 UTC	Clean up includes (#5584) * Remove unused wildcard/type info. * Use std::unique_ptr to avoid sketchy lifetime management. * Clean up includes * Pull some changes from small-cleanups3 * clang-format * Add missing include. * Add missing include. * clang-format. * Fix function type * clang-format * Add missing include Co-authored-by: Steven Johnson <srj@google.com>	11 January 2021, 21:25:36 UTC
cc03c9c	Andrew Adams	09 January 2021, 00:28:07 UTC	Prototype of multiple scattering update definitions (#5553) Add "gather" and "scatter" intrinsics, which let you write update definitions which store multiple values at once to different computed locations. Useful for doing things like swapping or permuting elements in-place. See comments in IROperator.h for more details.	09 January 2021, 00:28:07 UTC
9c59d94	Alex Reinking	09 January 2021, 00:22:40 UTC	Set version to 12.0.0. Fixes #5259 (#5289)	09 January 2021, 00:22:40 UTC
2e5f1e0	xndcn	09 January 2021, 00:06:05 UTC	Fix issues in OpenGL backend (#5545) Co-authored-by: Alex Reinking <alex_reinking@berkeley.edu> Co-authored-by: Steven Johnson <srj@google.com> Co-authored-by: Alex Reinking <alex.reinking@gmail.com>	09 January 2021, 00:06:05 UTC
392b53e	Steven Johnson	08 January 2021, 23:21:24 UTC	Check error results from all egl calls (#5619) * Check error results from all egl calls We were ignoring the result from a couple of calls. * Update opengl_egl_context.cpp * Update Generator.cpp * Update Generator.cpp	08 January 2021, 23:21:24 UTC
2b3aaa8	xndcn	08 January 2021, 19:46:56 UTC	Add max threads checking for Metal (#5588) * Add max threads checking for Metal Originally, this checking will be asserted by Metal API Validation in Xcode, otherwise the program will crash or output wrong results. * Disable the max threads checking for Metal in non-debug runtime * Disable error/metal_threads_too_large test for non-OSX target	08 January 2021, 19:46:56 UTC
081f472	xndcn	07 January 2021, 22:49:08 UTC	Add CLDoubles feature check for OpenCL double type (#5610) Similar to CLHalf feature check for half type.	07 January 2021, 22:49:08 UTC
46fc56a	Steven Johnson	07 January 2021, 18:29:38 UTC	Don't allow CUDACapability80 on LLVM10 (#5617) LLVM10 can't handle that version of Cuda; we never noticed till now because we didn't have a buildbot with a GPU that could handle it. Modify the sniffers to cap capability at 75 for LLVM10 builds, and fail with user errors if that capability is explicitly requested.	07 January 2021, 18:29:38 UTC
f38801e	Dillon Sharlet	06 January 2021, 17:52:38 UTC	Use std::unique_ptr to manage CodeGen classes (#5583) * Remove unused wildcard/type info. * Use std::unique_ptr to avoid sketchy lifetime management. * Pull some changes from small-cleanups3 * Use auto for some loops. * clang-tidy	06 January 2021, 17:52:38 UTC
8063879	prdelgado	05 January 2021, 21:50:53 UTC	replaced indentation in line 20 with spaces to show proper error message (#5580) * replaced indentation in line 20 with spaces to show proper error message * added error message detail with alternative solution based on PR feedback	05 January 2021, 21:50:53 UTC
8383cc9	Andrew Adams	28 December 2020, 23:49:45 UTC	Delete lane extraction code in vectorization (#5596)	28 December 2020, 23:49:45 UTC
890a519	pkubaj	23 December 2020, 01:15:44 UTC	Fix build on FreeBSD/powerpc64 (#5572) * Fix build on FreeBSD/powerpc64 FreeBSD doesn't use getauxval, but elf_aux_info. * Make the conditional only work Linux and FreeBSD * Make the conditional only for FreeBSD and Linux	23 December 2020, 01:15:44 UTC
8a0f4a1	Dillon Sharlet	22 December 2020, 01:54:36 UTC	Fix sketchy shadowing that breaks on some compilers. Fixes #5581 (#5587) * Fix sketchy shadowing that breaks on some compilers. Fixes #5581 * Fix another sketchy shadowing.	22 December 2020, 01:54:36 UTC
b22598c	Dillon Sharlet	21 December 2020, 22:56:00 UTC	Remove unused wildcard/type info. (#5582)	21 December 2020, 22:56:00 UTC
5ac8808	Dillon Sharlet	21 December 2020, 21:54:37 UTC	Fix several bugs on Hexagon and some cleanup (#5570) * Fix several bugs on Hexagon. * clang-format actually found a bug	21 December 2020, 21:54:37 UTC
1dbcf19	Steven Johnson	21 December 2020, 17:08:20 UTC	Decouple wasm's +bulk-memory from wasm_threads (#5574) * Decouple wasm's +bulk-memory from threads When `wasm_threads` was added, `+bulk-memory` codegen was enabled in conjunction with this feature, due to some inscrutable error which apparently didn't get recorded. From inspection of the spec for bulk-memory, and experimentation with the most recent version of Emscripten (2.0.10), I can't find any reason that this actually needs to be enabled, so I've moved it into its own new feature flag. Also: drive-by fix in Target to format the tables in `get_runtime_compatible_target()` better, and to remove some wasm-related entries from the 'must match' table that didn't actually need to match. * Create .gitignore	21 December 2020, 17:08:20 UTC
ef45c87	Steven Johnson	19 December 2020, 00:45:34 UTC	Fix for trunk LLVM (#5576)	19 December 2020, 00:45:34 UTC
83b040d	Zalman Stern	17 December 2020, 21:40:16 UTC	Add a feature to name cached memoizations and to evict them by name. (#5510) This PR adds an optional ```EvictionKey``` parameter to the ```memoize``` scheduling option. EvictionKeys are user provided labels of up to 64-bits that can be used to request that labeled items in the cache be removed to free up space. Co-authored-by: Steven Johnson <srj@google.com>	17 December 2020, 21:40:16 UTC