Revision 53401f1cb4e829b3a0d74295473c9a48fff2bbab authored by Keno Fischer on 29 October 2020, 21:36:53 UTC, committed by Keno Fischer on 29 October 2020, 21:48:17 UTC
When inference detects a call cycle, one of two things could happen: 1. It determines that in order for inference to converge it needs to limit the signatures of some methods to something more general, or 2. The cycle is determined to be harmless at the inference level (because though there is a cycle in the CFG there is no dependency cycle of type information). In the first case, we simply disable optimizations, which is sensible, because we're likely to have to recompute some information anyway, when we actually get there dynamically. In the second case however, we do do optimizations, but it's a bit of an unusual case. In particular, inference usually delivers methods to inlining in postorder (meaning callees get optimized before their callers) such that a caller can always inline a callee. However, if there is a cycle, there is of course no unique postorder of functions, since by definition we're looking a locally strongly connected component. In this case, we would just essentially pick an arbitrary order (and moreover, depending on the point at which we enter the cycle and subsequently cached, leading to potential performance instabilities, depending on function order). However, the arbitrary order is quite possibly suboptimal. For example in #36414, we have a cycle randn -> _randn -> randn_unlikely -> rand. In this cycle the author of this code expected both `randn` and `_randn` to inline and annotated the functions as such. However, in 1.5+ the order we happed to pick would have inlined randn_unlikely into _randn (had it not been marked noinline), with a hard call break between randn and _randn, whch is problematic from a performance standpoint. This PR aims to address this by introducing a heuristic: If some functions in a cycle are marked as `@noinline`, we want to make sure to infer these last (since they won't ever be inlined anyway). To ensure this happens, while restoring postorder if this happens to break the cycle, we perform a DFS traversal rooted at any `@noinline` functions and then optimize the functions in the cycle in DFS-postorder. Of course still may still not be a true postorder in the inlining graph (if the `@noinline` functions don't break the cycle), but even in that case, it should be no worse than the default order. Fixes #36414 Closes #37234
1 parent 0d5c38b
File | Mode | Size |
---|---|---|
checksums | ||
patches | ||
tools | ||
valgrind | ||
.gitignore | -rw-r--r-- | 26 bytes |
Makefile | -rw-r--r-- | 5.3 KB |
NATIVE.cmake | -rw-r--r-- | 192 bytes |
SuiteSparse_wrapper.c | -rw-r--r-- | 1.6 KB |
Versions.make | -rw-r--r-- | 1.0 KB |
blas.mk | -rw-r--r-- | 8.5 KB |
curl.mk | -rw-r--r-- | 2.8 KB |
dsfmt.mk | -rw-r--r-- | 2.4 KB |
gfortblas.alias | -rw-r--r-- | 706 bytes |
gfortblas.c | -rw-r--r-- | 4.4 KB |
gmp.mk | -rw-r--r-- | 2.6 KB |
libdSFMT.def | -rw-r--r-- | 778 bytes |
libgit2.mk | -rw-r--r-- | 4.2 KB |
libgit2.version | -rw-r--r-- | 76 bytes |
libssh2.mk | -rw-r--r-- | 2.3 KB |
libssh2.version | -rw-r--r-- | 83 bytes |
libuv.mk | -rw-r--r-- | 2.1 KB |
libuv.version | -rw-r--r-- | 82 bytes |
libwhich.mk | -rw-r--r-- | 1.2 KB |
libwhich.version | -rw-r--r-- | 78 bytes |
llvm-options.mk | -rw-r--r-- | 907 bytes |
llvm-ver.make | -rw-r--r-- | 359 bytes |
llvm.mk | -rw-r--r-- | 22.7 KB |
mbedtls.mk | -rw-r--r-- | 3.4 KB |
mpfr.mk | -rw-r--r-- | 2.6 KB |
nghttp2.mk | -rw-r--r-- | 2.1 KB |
objconv.mk | -rw-r--r-- | 1.2 KB |
openblas.version | -rw-r--r-- | 79 bytes |
openlibm.mk | -rw-r--r-- | 1.4 KB |
openlibm.version | -rw-r--r-- | 78 bytes |
p7zip.mk | -rw-r--r-- | 2.7 KB |
patchelf.mk | -rw-r--r-- | 1.8 KB |
pcre.mk | -rw-r--r-- | 2.2 KB |
suitesparse.mk | -rw-r--r-- | 6.9 KB |
unwind.mk | -rw-r--r-- | 5.2 KB |
utf8proc.mk | -rw-r--r-- | 1.5 KB |
utf8proc.version | -rw-r--r-- | 78 bytes |
zlib.mk | -rw-r--r-- | 1.4 KB |
zlib.version | -rw-r--r-- | 71 bytes |
Computing file changes ...