81074fd | Shoaib Kamil | 05 May 2020, 17:09:14 UTC | Second try at merge | 05 May 2020, 17:09:14 UTC |
ff78629 | Andrew Adams | 05 May 2020, 02:55:47 UTC | Merge branch 'super_simplify_v2' of https://github.com/Halide/Halide into super_simplify_v2 | 05 May 2020, 02:55:47 UTC |
48df1a4 | Andrew Adams | 05 May 2020, 02:55:42 UTC | Add tool to dedup and simplify exprs | 05 May 2020, 02:55:42 UTC |
636b8df | Andrew Adams | 04 May 2020, 21:18:23 UTC | Merge pull request #4896 from halide/srj-simp2 minor super_simplify fixes | 04 May 2020, 21:18:23 UTC |
91ccc9d | Steven Johnson | 04 May 2020, 21:16:59 UTC | minor super_simplify fixes - add a sanity check in super_simplify to bail from the loop if we get too many counterexamples (preventing missed pathological cases from hanging) - remove unnecessary lock in find_rules loop - make BLACKLISTING logging debug(1) instead of debug(0) | 04 May 2020, 21:16:59 UTC |
4513165 | Andrew Adams | 04 May 2020, 18:33:27 UTC | More aggressive predicate synthesis | 04 May 2020, 18:33:27 UTC |
d20114d | Andrew Adams | 04 May 2020, 18:33:01 UTC | Allow arbitrary z3 timeouts. | 04 May 2020, 18:33:01 UTC |
12f0694 | Andrew Adams | 04 May 2020, 18:32:14 UTC | Don't use prove_me rules. Some of them are wrong. | 04 May 2020, 18:32:14 UTC |
ff754f0 | Andrew Adams | 04 May 2020, 15:26:38 UTC | Merge branch 'super_simplify_v2' of https://github.com/Halide/Halide into super_simplify_v2 | 04 May 2020, 15:26:38 UTC |
97512a5 | Andrew Adams | 04 May 2020, 15:26:23 UTC | Fresh rules from apps | 04 May 2020, 15:26:23 UTC |
4c34a6b | Andrew Adams | 04 May 2020, 15:26:06 UTC | Generate statistical significance results | 04 May 2020, 15:26:06 UTC |
35dd284 | Andrew Adams | 04 May 2020, 15:25:30 UTC | Turn on optimizations for the tools | 04 May 2020, 15:25:30 UTC |
f36385b | Shoaib Kamil | 04 May 2020, 14:04:33 UTC | Checkpoint pre-merge | 04 May 2020, 14:04:33 UTC |
2907b99 | Andrew Adams | 03 May 2020, 01:42:05 UTC | Merge pull request #4889 from halide/srj-simp Add optional output path for filter-rewrite-rules | 03 May 2020, 01:42:05 UTC |
a3edcf2 | Steven Johnson | 02 May 2020, 23:15:01 UTC | Add optional output path for filter-rewrite-rules | 02 May 2020, 23:15:01 UTC |
fadc168 | Andrew Adams | 02 May 2020, 23:00:29 UTC | Merge pull request #4888 from halide/srj-simp error-checking fixes | 02 May 2020, 23:00:29 UTC |
059ede8 | Steven Johnson | 02 May 2020, 22:57:10 UTC | Update find_rules.cpp | 02 May 2020, 22:57:10 UTC |
a575c17 | Steven Johnson | 02 May 2020, 22:42:12 UTC | Update find_rules.cpp | 02 May 2020, 22:42:12 UTC |
a84b5ac | Steven Johnson | 02 May 2020, 22:34:42 UTC | error-checking fixes - make blacklist.txt a required arg (so it doesn't assume 'current directory' - add error-checking to the iostream open() calls to ensure that failure to open files isn't overlooked | 02 May 2020, 22:34:42 UTC |
c1ae0ac | Andrew Adams | 02 May 2020, 19:07:06 UTC | Require that at least one var is eliminated in all new rules. | 02 May 2020, 19:07:06 UTC |
c4ad8fb | Andrew Adams | 02 May 2020, 19:06:22 UTC | Add ratios and new apps to results script | 02 May 2020, 19:06:22 UTC |
7b7f6e1 | Andrew Adams | 02 May 2020, 19:06:01 UTC | Add Mul to RHS recognition | 02 May 2020, 19:06:01 UTC |
353f92a | Andrew Adams | 02 May 2020, 19:05:36 UTC | Increase default accuracy of all benchmarks | 02 May 2020, 19:05:36 UTC |
bf69774 | Andrew Adams | 02 May 2020, 19:05:19 UTC | Add more apps and more random schedules for them | 02 May 2020, 19:05:19 UTC |
b772f5b | Andrew Adams | 01 May 2020, 19:45:15 UTC | Fix for repeated constants of different types | 01 May 2020, 19:45:15 UTC |
6d7a445 | Andrew Adams | 01 May 2020, 19:16:57 UTC | Fix float parsing | 01 May 2020, 19:16:57 UTC |
244fb81 | Andrew Adams | 01 May 2020, 19:11:31 UTC | Fresh rules from the apps | 01 May 2020, 19:11:31 UTC |
6350d18 | Andrew Adams | 01 May 2020, 18:35:24 UTC | Delete two human-added rules, one of which was incorrect! | 01 May 2020, 18:35:24 UTC |
c653925 | Andrew Adams | 01 May 2020, 18:18:01 UTC | Fresh batch of rules from the apps | 01 May 2020, 18:18:01 UTC |
7185b23 | Andrew Adams | 01 May 2020, 00:05:53 UTC | Add rules synthesized from second pass at blacklist.txt | 01 May 2020, 00:05:53 UTC |
3f3d37e | Andrew Adams | 01 May 2020, 00:04:49 UTC | Turn back on parallel rule filtering | 01 May 2020, 00:04:49 UTC |
b0f67bc | Andrew Adams | 01 May 2020, 00:04:36 UTC | Add initial batch of rewrite rules from apps | 01 May 2020, 00:04:36 UTC |
cb227cf | Andrew Adams | 30 April 2020, 23:35:11 UTC | Merge remote-tracking branch 'origin/master' into super_simplify_v2 | 30 April 2020, 23:35:11 UTC |
8dde2ed | Andrew Adams | 30 April 2020, 22:05:22 UTC | Add instructions for running synthesis experiment | 30 April 2020, 22:05:22 UTC |
f4cf2a6 | Andrew Adams | 30 April 2020, 22:03:02 UTC | Predicate synthesis improvements | 30 April 2020, 22:03:02 UTC |
6c5e292 | Andrew Adams | 30 April 2020, 22:00:22 UTC | Fix z3 model parsing | 30 April 2020, 22:00:22 UTC |
2e064d0 | Andrew Adams | 30 April 2020, 22:00:06 UTC | Fix parsing of let exprs | 30 April 2020, 22:00:06 UTC |
98ce581 | Andrew Adams | 30 April 2020, 21:59:50 UTC | Make find_rules output to file | 30 April 2020, 21:59:50 UTC |
f9da218 | Andrew Adams | 30 April 2020, 21:59:37 UTC | Handle cases in predicate synthesis where bootstrapping doesn't terminate | 30 April 2020, 21:59:37 UTC |
98622c9 | Andrew Adams | 30 April 2020, 21:59:07 UTC | Add mod to more_general_than | 30 April 2020, 21:59:07 UTC |
1f875b0 | Steven Johnson | 29 April 2020, 17:41:13 UTC | Merge pull request #4883 from halide/srj-logging Add object-size and compile-time to CompilerLogger | 29 April 2020, 17:41:13 UTC |
7b8c573 | Dillon Sharlet | 29 April 2020, 07:02:58 UTC | Merge pull request #4875 from halide/revert_bounds_of_promise_clamped Don't inject min/max exprs in bounds of a promise_clamped of constant | 29 April 2020, 07:02:58 UTC |
7544c42 | Steven Johnson | 29 April 2020, 00:38:30 UTC | Emit object_code_size as a number, not a string | 29 April 2020, 00:38:30 UTC |
8557e80 | Steven Johnson | 29 April 2020, 00:37:10 UTC | Merge pull request #4879 from halide/clang_tidy_introspection Run clang-tidy on Introspection.cpp | 29 April 2020, 00:37:10 UTC |
db9ec76 | Steven Johnson | 29 April 2020, 00:16:38 UTC | Add object-size and compile-time to CompilerLoger | 29 April 2020, 00:30:36 UTC |
3b8f634 | Andrew Adams | 28 April 2020, 23:47:56 UTC | Merge remote-tracking branch 'origin/master' into super_simplify_v2 | 28 April 2020, 23:47:56 UTC |
d267874 | Andrew Adams | 28 April 2020, 23:45:54 UTC | Learn from (x % 2) == y and similar | 28 April 2020, 23:45:54 UTC |
f1a062c | Steven Johnson | 28 April 2020, 23:20:18 UTC | Merge pull request #4882 from halide/srj-dlsym Call dlsym() directly in runtime/metal.cpp | 28 April 2020, 23:20:18 UTC |
35c275d | Steven Johnson | 28 April 2020, 23:18:22 UTC | Merge pull request #4871 from halide/srj-telemetry Add basic Compiler Logging engine to Halide compiler | 28 April 2020, 23:18:22 UTC |
a252109 | Steven Johnson | 28 April 2020, 21:50:06 UTC | Merge branch 'master' into revert_bounds_of_promise_clamped | 28 April 2020, 21:50:06 UTC |
2762561 | Steven Johnson | 28 April 2020, 21:49:21 UTC | Fix Mac code | 28 April 2020, 21:49:21 UTC |
d40d978 | Steven Johnson | 28 April 2020, 21:46:37 UTC | Merge pull request #4881 from halide/srj-nits Minor clang-tidy issues in IRPrinter | 28 April 2020, 21:46:37 UTC |
53c90c7 | Steven Johnson | 28 April 2020, 21:45:49 UTC | Merge pull request #4880 from halide/double_braces Fix std::array initialization | 28 April 2020, 21:45:49 UTC |
8cb1ce6 | Steven Johnson | 28 April 2020, 21:32:59 UTC | Call dlsym() directly in runtime/metal.cpp Followup to https://github.com/halide/Halide/pull/4877 -- `halide_get_symbol` isn't defined in our iOS runtime; it might be useful to add it there, but the bottlenecking is unnecessary here, so a spot-fix should be simpler. | 28 April 2020, 21:35:52 UTC |
f5ea389 | Steven Johnson | 27 April 2020, 18:34:12 UTC | Up to now, we've typically gathered metadata about Halide compilation on an ad-hoc basis (e.g. via debug() statements); this adds a CompilerLogger output to Halide to allow us to gather data in a more structured way, so that we can do batch analysis of things like Simplifier rules more easily. This is modeled as a Generator output for now (though it could be used manually with the JIT if desired); the `compiler_log` output is a JSON file that contains basic information about the Generator being driven, along with information about: - Expressions that failed `can_prove` (including a text version of the original and failed version) - Loop vars that are non-monotonic - Eventually, the usage rate for each Simplifier rule (this requires some additional scaffolding in the Simplifier to enable) The CMake build file was modified to emit `compiler_log` files by default for all Generators (I did not attempt to modify the Makefile in this way, though). For debugging purposes, you can use HL_DEBUG_COMPILER_LOGGER=1 to dump all CompilerLogger to stderr (whether or not compiler_log output is specified for a specific Generator). You can also optionally specify HL_OBFUSCATE_COMPILER_LOGGER=1, which will cause all logged data to try to scrub interesting identifiers away; this is intended to allow gathering data on proprietary code and contributing it to a public analyzer (e.g. to find missed simplifier opportunities). The JSON looks something like this: ``` { "generator_name" : "interpolate", "function_name" : "interpolate", "target" : "x86-64-linux-avx-no_runtime-sse41", "generator_args" : "auto_schedule=false", "non_monotonic_loop_vars" : { "downsampled_1.s0.y.v10" : [ "(((downsampled_1.s0.y.v10*-2) + ((t140.s*-2) + ((t140.s*2) + (select((0 < downsampled_1.s0.y.v10), -14, -15) + (downsampled_1.s0.y.v10*2)))))*-1)", "(((downsampled_1.s0.y.v10*-2) + ((t142.s*-2) + ((t142.s*2) + (select((0 < downsampled_1.s0.y.v10), -14, -15) + (downsampled_1.s0.y.v10*2)))))*-1)", "((downsampled_1.s0.y.v10*-2) + (select((0 < downsampled_1.s0.y.v10), -14, -15) + (downsampled_1.s0.y.v10*2)))" ], "downsampled_1.s0.y.v9" : [ "(let t140.s = min(((downsampled_1.s0.y.v9*8) + 4), downsampled_1.s0.y.max) in (((downsampled_1.s0.y.v10*2) + ((t140.s*2) - (select((0 < downsampled_1.s0.y.v10), -14, -15) + ((downsampled_1.s0.y.v10 + t140.s)*2)))) + -14))", ...etc... ], ...etc... }, "failed_to_prove" : { "(v0 <= (min(v0, 16) + (((max(v0, 16) + -1)/16)*16)))" : [ "(let t2839 = min(input.extent.0, 16) in (let t2840 = (normalize.s0.y.v10 + normalize.s0.y.v10.base) in (((((((uint1)1 && ((0 + (t2839 + -16)) <= (t2839 + -16))) && ((15 + (min((t2839 + (((max(input.extent.0, 16) + -1)/16)*16)), input.extent.0) + -16)) >= (min((((input.extent.0 + -1)/16)*16), (input.extent.0 + -16)) + 15))) && (t2840 <= t2840)) && (t2840 >= t2840)) && (0 <= 0)) && (3 >= 3))))" ] }, "version": "HalideJSONCompilerLoggerV1" } ``` If you specify HL_OBFUSCATE_COMPILER_LOGGER=1, this will instead look something like: ``` { // names omitted "non_monotonic_loop_vars" : { "anon0" : [ "(((anon0*-2) + ((anon1*-2) + ((anon1*2) + (select((0 < anon0), -14, -15) + (anon0*2)))))*-1)", "(((anon0*-2) + ((anon2*-2) + ((anon2*2) + (select((0 < anon0), -14, -15) + (anon0*2)))))*-1)", "((anon0*-2) + (select((0 < anon0), -14, -15) + (anon0*2)))" ], "anon11" : [ "((((((select((0 < anon11), 1022, 0) + anon11) + anon12)/512)*-512) + anon12) + anon11)", ...etc... ], ...etc... }, "failed_to_prove" : { "(anon58 <= (min(anon58, 16) + (((max(anon58, 16) + -1)/16)*16)))" : [ "(let anon59 = min(anon60, 16) in (let anon61 = (anon11 + anon62) in (((((((uint1)1 && ((0 + (anon59 + -16)) <= (anon59 + -16))) && ((15 + (min((anon59 + (((max(anon60, 16) + -1)/16)*16)), anon60) + -16)) >= (min((((anon60 + -1)/16)*16), (anon60 + -16)) + 15))) && (anon61 <= anon61)) && (anon61 >= anon61)) && (0 <= 0)) && (3 >= 3))))" ] }, "version": "HalideJSONCompilerLoggerV1" } ``` If generating a multitarget, the JSON will be an array of these objects (one per subtarget): ``` [ { "generator_name" : "interpolate", "target" : "x86-64-linux-avx-sse41", ... }, { "generator_name" : "interpolate", "target" : "x86-64-linux-avx", ... }, { "generator_name" : "interpolate", "target" : "x86-64-linux", ... } ] ``` There are definitely more things that we want to add to this that aren't yet implemented; off the top of my head: - Halide IR-building/lowering time - LLVM codegen time - total compilation time - final code size | 28 April 2020, 20:25:49 UTC |
8d844f2 | Andrew Adams | 28 April 2020, 20:04:24 UTC | Fix simplify test | 28 April 2020, 20:04:24 UTC |
f3b57f2 | Andrew Adams | 28 April 2020, 19:54:50 UTC | Merge remote-tracking branch 'origin/super_simplify_v2' into rule_removal_experiments | 28 April 2020, 19:54:50 UTC |
116b69b | Andrew Adams | 28 April 2020, 19:45:49 UTC | clang-tidy and clang-format disagree | 28 April 2020, 19:45:49 UTC |
d9daf0e | Andrew Adams | 28 April 2020, 19:38:29 UTC | Fix std::array initialization | 28 April 2020, 19:38:29 UTC |
e1ea3a7 | Andrew Adams | 28 April 2020, 19:07:28 UTC | Run clang-tidy on Introspection.cpp | 28 April 2020, 19:07:28 UTC |
f7bbab0 | Andrew Adams | 28 April 2020, 19:02:02 UTC | Improve comments. | 28 April 2020, 19:02:02 UTC |
6c29235 | Steven Johnson | 28 April 2020, 18:46:44 UTC | Minor clang-tidy issues in IRPrinter | 28 April 2020, 18:46:44 UTC |
fd5c263 | Andrew Adams | 28 April 2020, 16:55:07 UTC | Add test | 28 April 2020, 16:55:07 UTC |
d7ceeb2 | Andrew Adams | 28 April 2020, 16:40:16 UTC | Teach is_monotonic about unsafe_promise_clamped | 28 April 2020, 16:40:16 UTC |
40d2067 | Shoaib Kamil | 28 April 2020, 14:21:23 UTC | Merge pull request #4877 from halide/shoaibkamil/fix_potential_ios_link_error Switch to using dlsym for MTLCopyAllDevices() | 28 April 2020, 14:21:23 UTC |
69fda07 | Andrew Adams | 28 April 2020, 01:48:45 UTC | Synthesized new simplifier rules from autoscheduled apps Predicate synthesis currently set to z3 only | 28 April 2020, 01:48:45 UTC |
2d245ee | Shoaib Kamil | 28 April 2020, 00:08:50 UTC | Formatting | 28 April 2020, 00:08:50 UTC |
bae49b0 | Shoaib Kamil | 27 April 2020, 23:00:14 UTC | Switch to using dlsym for MTLCopyAllDevices() | 27 April 2020, 23:00:14 UTC |
d44cb3e | Andrew Adams | 27 April 2020, 19:14:38 UTC | Don't inject min/max exprs in the bounds of a promise_clamped of a constant No effect in public Halide, but I hear it causes at least one regression inside Google. The reason for injecting the min/max exprs in the case where the first arg doesn't vary has since gone away now that we push producers inside the guard-with-if if statement of a consumer. | 27 April 2020, 19:15:24 UTC |
5b4eac0 | Andrew Adams | 24 April 2020, 18:17:04 UTC | Merge pull request #4870 from halide/reenable_introspection Reenable introspection by default, and add support for the endbr64 in… | 24 April 2020, 18:17:04 UTC |
1e91a20 | Andrew Adams | 24 April 2020, 18:16:43 UTC | Merge pull request #4869 from halide/delete_uphill_div_rules Adjust simplifier rules so that they're provably cycle-free | 24 April 2020, 18:16:43 UTC |
994428a | Andrew Adams | 24 April 2020, 02:05:00 UTC | Delete debugging code | 24 April 2020, 02:05:00 UTC |
e8243b8 | Steven Johnson | 23 April 2020, 20:02:26 UTC | Merge branch 'master' into delete_uphill_div_rules | 23 April 2020, 20:02:26 UTC |
7d8f1f6 | Andrew Adams | 23 April 2020, 19:38:30 UTC | Reenable introspection by default, and add support for the endbr64 instruction Was accidentally deleted in b0b6a4a84c53b47a91a05b4487ef504fe1345e73 Also add handling for newer gccs. They now inject endbr64 instructions at function entry. | 23 April 2020, 19:38:30 UTC |
8975aaf | Andrew Adams | 23 April 2020, 18:34:41 UTC | Adjust simplifier rules so that they're provably cycle-free Several of these division rules weren't strength reducing - they created more ops (or stronger ops) than they removed. This commit tries just adjusting them to be strength-reducing. I'm a little nervous of it because it means in (x*4 - y)/2 we no longer move the x out of the numerator. Needs thorough testing. | 23 April 2020, 18:34:41 UTC |
91eb7e3 | Steven Johnson | 23 April 2020, 18:13:01 UTC | Merge pull request #4847 from pelikan/fix-plain-c-with-libcxx fix build when -stdlib=libc++ was passed into the plain C test | 23 April 2020, 18:13:01 UTC |
a277e60 | Steven Johnson | 23 April 2020, 17:31:36 UTC | Merge pull request #4868 from halide/shift_overflow Internal compiler code can introduce shifts by greater than the type size | 23 April 2020, 17:31:36 UTC |
9fc5b79 | Julie Newcomb | 22 April 2020, 22:41:54 UTC | fix parens in Simplify_Sub | 22 April 2020, 22:41:54 UTC |
754ab2a | Andrew Adams | 22 April 2020, 22:36:06 UTC | Internal compiler code can introduce shifts by greater than the type size. So this should be an overflow IR node instead of being an eager user assert. The specific case was a can_prove(x << y) substituting in random values for y when HL_DEBUG_CODEGEN>0 to probe for counterexamples. Some of these random values where greater than the bit-size of x. | 22 April 2020, 22:36:06 UTC |
658474b | Julie Newcomb | 22 April 2020, 22:23:00 UTC | comment out failing min/max/select likely tests in correctness_simplify | 22 April 2020, 22:23:00 UTC |
5bcdf80 | Steven Johnson | 22 April 2020, 20:41:27 UTC | Merge pull request #4840 from Infinoid/python-rdom-getitem Add RDom[i] method to Python bindings. | 22 April 2020, 20:41:27 UTC |
6a13b23 | Julie Newcomb | 22 April 2020, 06:49:29 UTC | Fixup | 22 April 2020, 07:42:11 UTC |
e682691 | Julie Newcomb | 22 April 2020, 06:48:20 UTC | merge master | 22 April 2020, 06:48:20 UTC |
a841d91 | Steven Johnson | 20 April 2020, 22:45:51 UTC | Merge pull request #4861 from Infinoid/python-buffer-leak Fix a leak in python extension wrapper code (Issue #4859) | 20 April 2020, 22:45:51 UTC |
f3af437 | Steven Johnson | 20 April 2020, 22:41:59 UTC | Merge pull request #4844 from pelikan/fix-eigen3-build fix eigen3 build on Gentoo Linux by using pkg-config | 20 April 2020, 22:41:59 UTC |
a3ed181 | Andrew Adams | 20 April 2020, 16:37:38 UTC | Merge remote-tracking branch 'origin/master' into super_simplify_v2 | 20 April 2020, 16:37:38 UTC |
8815280 | Steven Johnson | 19 April 2020, 21:17:32 UTC | Merge pull request #3037 from halide/add_image_checks_after_bounds_inference Move add_image_checks after bounds inference (See #3036) | 19 April 2020, 21:17:32 UTC |
7cb4f11 | Andrew Adams | 17 April 2020, 23:21:24 UTC | Don't add impure or non-integer things to the containing loops vector in trim_no_ops | 19 April 2020, 16:55:48 UTC |
daac7e5 | Andrew Adams | 17 April 2020, 22:04:26 UTC | Compute func inside GuardWithIf if statement if possible If a Func is computed inside all of the splits on a variable, it should also be computed inside the guardwithif if statement. This means it doesn't have to rely on bounds inference understanding said if statement. | 19 April 2020, 16:55:48 UTC |
649451c | Andrew Adams | 17 April 2020, 21:20:06 UTC | Fix bug with zero- or negative-sized realizations The get dead-stripped *before* allocation bounds inference, now that simplification runs earlier, so we need to make sure allocation bounds inference doesn't get confused by the lack of any access to the buffers. | 19 April 2020, 16:55:48 UTC |
9698c68 | Andrew Adams | 06 April 2020, 23:12:55 UTC | Added test that composes tail strategies in different ways ... ... and checks that bounds are tight when all the splits are exact. | 19 April 2020, 16:55:48 UTC |
020caba | Andrew Adams | 06 April 2020, 23:09:58 UTC | Don't use the loop_min in the promise_clamped from GuardWithIf | 19 April 2020, 16:55:48 UTC |
e8f0dd4 | Andrew Adams | 06 April 2020, 23:08:52 UTC | Don't keep IR just because it refers to the bounds of an input/output | 19 April 2020, 16:55:48 UTC |
1ec4226 | Andrew Adams | 24 March 2020, 21:02:18 UTC | Move printf before assert So that you actually get to see it if the assert fails. | 19 April 2020, 16:55:48 UTC |
bbfd486 | Andrew Adams | 24 March 2020, 21:02:00 UTC | Remove constant substitution in storage folding No point now that full simplification has run | 19 April 2020, 16:55:48 UTC |
0eb29f2 | Andrew Adams | 24 March 2020, 21:01:44 UTC | Remove max on fuse factor Should no longer be necessary | 19 April 2020, 16:55:48 UTC |
8f69a51 | Andrew Adams | 24 March 2020, 21:01:03 UTC | Rearrange lowering to fix input bounds inference issue add_image_checks needs to happen after bounds inference so that it detects buffer overreads on the input. In its current place it only detects if the input is large enough for the output to be correct, but it doesn't check for the presence of guard bands to cover loading out-of-bounds values that then get discarded. In theory this could fault, if a guard band is needed on a dimension with large stride. For this to not break a whole bunch of code due to false positives with overly strict bounds, we need better simplification. So the first simplification pass is moved before allocation bounds inference. As a side benefit, this also reduces the amount of IR that pass needs to chew through, so it speeds up lowering. Fixes #3036 | 19 April 2020, 16:55:48 UTC |
1010cd1 | Andrew Adams | 19 April 2020, 16:55:08 UTC | Merge pull request #4864 from halide/fix_trunk_llvm Fix for llvm trunk | 19 April 2020, 16:55:08 UTC |
c0226f9 | Andrew Adams | 19 April 2020, 06:08:47 UTC | Deal with failure case in llvm::Expected | 19 April 2020, 06:08:47 UTC |
aad1ab9 | Andrew Adams | 19 April 2020, 00:53:07 UTC | Fix for llvm trunk | 19 April 2020, 00:53:07 UTC |