Revision - dd43551 - Further refactoring of semantic checking (#1102)

Revision dd435512219f435ea13498e6124930fd4cf823a9 authored by Tim Foley on 18 November 2019, 18:36:38 UTC, committed by GitHub on 18 November 2019, 18:36:38 UTC

Further refactoring of semantic checking (#1102)

* Split apart `SemanticsVisitor`

The existing `SemanticsVisitor` type was the visitor for expressions, statements, and declarations, and its monolithic nature made it hard to introduce distinct visitors for different phases of checking (despite the fact that we had, de facto, multiple phases of declaration checking).

This change splits up `SemanticsVisitor` as follows:

* There is nosw a `SharedSemanticsContext` type which holds the shared state that all semantics visiting logic needs. This includes state that gets mutated during the course of semantic checking.

* The `SemanticsVisitor` type is now a base class that holds a pointer to a `SharedSemanticsContext`. Most of the non-visitor functions are still defined here, just to keep the code as simple as possible. The `SemanticsVisitor` type is no longer a "visitor" in any meaningful way, but retaining the old name minimizes the diffs to client code.

* There are distinct `Semantics{Expr|Stmt|Decl}Visitor` types that have the actual `visit*` methods for an appropriate subset of the AST hierarchy. These all inherit from `SemanticsVisitor` primarily so that they can have easy access to all the helper methods it defines (which used to be accessible because these were all the same object).

Any client code that was constructing a `SemanticsVisitor` now needs to construct a `SharedSemanticsContext` and then use that to initialize a `SemanticsVisitor`. Similarly, any code that was using `dispatch()` to invoke the visitor on an AST node needs to construct the appropriate sub-class and then invoke `dispatch()` on it instead.

This is a pure refactoring change, so no effort has been made to move state or logic onto the visitor sub-types even when it is logical. Similarly, no attempt has been made to hoist any code out of the common headers to avoid duplication between `.h` and `.cpp` files. Those cleanups will follow.

The one cleanup I allowed myself while doing this was getting rid of the `typeResult` member in `SemanticsVisitor` that appears to be a do-nothing field that got written to in a few places (for unclear reasons) but never read.

* Remove some statefulness around statement checking

Some of the state from the old `SemanticsVisitor` was used in a mutable way during semantic checking:

* The `function` field would be set and the restored when checking the body of a function so that things like `return` statements could find the outer function.

* The `outerStmts` list was used like a stack to track lexically surrounding statements to resolve things like `break` and `continue` targets.

Both of these meant that semantic checking code was doing fine-grained mutations on the shared semantic checking state even though the statefullness wasn't needed.

This change moves the relevant state down to `SemanticsStmtVisitor`, which is a type we create on-the-fly to check each statement, so that we now only need to establish the state once at creation time.

The list of outer statements is handled as a linked list threaded up through the stack (a recurring idiom through the codebase).

There was one place where the `function` field was being used that wasn't strictly inside statement checking: it appears that we were using it to detect whether a variable declaration represents a local, so I added an `_isLocalVar` function to serve the same basic purpose.

With this change, the only stateful part of `SharedSemanticsContext` is the information to track imported modules, which seems like a necessary thing (since deduplication requires statefullness).

* Refactor declaration checking to avoid recursion

The flexiblity of the Slang language makes enforcing ordering on semantic checking difficult. In particular, generics (including some of the built-in standard library types) can take value arguments, so that type expressions can include value expressions. This means that being able to determine the type of a function parameter may require checking expressions, which may in turn require resolving calls to an overloaded function, which in turn requires knowing the types of the parameters of candidate callees.

Up to this point there have been two dueling approaches to handling the ordering problem in the semantic checking logic:

1. There was the `EnsureDecl` operation, supported by the `DeclCheckState` type. Every declaration would track "how checked" it is, and `EnsureDecl(d, s)` would try to perform whatever checks are needed to bring declaration `d` up to state `s`.

2. There was top-down orchestration logic in `visitModuleDecl()` that tried to perform checking of declarations in a set of fixed phases that ensure things like all function declarations being checked before any function bodies.

Each of these options had problems:

1. The `EnsureDecl()` approach wasn't implemented completely or consistently. It only understood two basic levels of checking: the "header" of a declaration was checked, and then the "body," and it relied on a single `visit*()` routine to try and handle both cases. Things ended up being checked twice, or in a circular fashion.

2. Rather than fix the problems with `EnsureDecl()` we layered on the top-down orchestration logic, but doing so ignores the fact that no fixed set of phases can work for our language. The orchestration logic was also done in a relatively ad hoc fashion that relied on using a single visitor to implement all phases of checking, but it added a second metric of "checked-ness" that worked alongside `DeclCheckState`.

This change strives to unify the two worlds and make them consistent. One of the key changes is that instead of doing everything through a single visitor type, we now have distinct visitors for distinct phases of semantic checking, and those phases are one-to-one aligned with the values of the `DeclCheckState` type.

More detailed notes:

* Existing sites that used to call `checkDecl` to directly invoke semantic checking recursively now use `ensureDecl` instead. This makes sure that `ensureDecl` is the one bottleneck that everything passes through, so that it can guarantee that each phase of checking gets applied to each declaration at most once.

* The existing `visitModuleDecl` was revamped into a `checkModule` routine that does the global orchestration, but now it is just a driver routine that makes sure `ensureDecl` gets called on everything in an order that represents an idealized "default schedule" for checking, while not ruling out cases where `ensureDecl()` will change the ordering to handle cases where the global order is insufficient.

* Because `checkModule` handles much of the recursion over the declaration hierarchy, many cases where a declaration `visit*()` would recurse on its members have been eliminated. The only case where a declaration should recursively `ensureDecl()` its members is when its validity for a certain phase depends on those members being checked (e.g., determining the type of a function declaration depends on its parameters having been checked).

* All cases where a `visit*()` routine was manually checking the state/phase of checking have been eliminated. It is now the responsibility of `ensureDecl` to make sure that checking logic doesn't get invoked twice or in an inappropriate order.

* Most cases where a `visit*()` routine was manually *setting* the `DeclCheckState` of a declaration have been eliminated. The common case is now handled by `ensureDecl()` directly, and `visit*()` methods only need to override that logic when special cases arise. E.g., when a variable is declared without a type `(e.g., `let foo = ...;`) then we need to check its initial-value expression to determine its type, so that we must check it further than was initially expected/required.

* This change goes to some lengths to try and keep semantic checking logic at the same location in the `slang-check-decl.cpp` file, so each of the per-phase visitor types is forward declared at the top of the file, and then the actual `visit*()` routines are interleaved throughout the rest of the file. A future change could do pure code movement (no semantic changes) to arrive at a more logical organization, but for now I tried to stick with what would minimize the diffs (although the resulting diffs can still be messy at times).

* One important change to the semantic checking logic was that the test for use of a local variable ahead of its declaration (or as part of its own initial-value expression) was moved around, since its old location in the middle of the `ensureDecl` logic made the overall flow and intention of that function less clear. There is still a need to fix this check to be more robust in the future.

* Add some design documentation on semantic checking

The main thing this tries to lay out is the strategy for declaration checking and the rules/constraints on programmers that follow from it.

* fixup: typos found during review

1 parent 1123ff2

Files
Changes

Permalinks

File	Mode	Size
docs
examples
external
prelude
source
tests
tools
.editorconfig	-rw-r--r--	937 bytes
.gitattributes	-rw-r--r--	95 bytes
.gitignore	-rw-r--r--	480 bytes
.gitmodules	-rw-r--r--	774 bytes
.travis.yml	-rw-r--r--	1.7 KB
CODE_OF_CONDUCT.md	-rw-r--r--	3.1 KB
LICENSE	-rw-r--r--	1.1 KB
README.md	-rw-r--r--	7.1 KB
appveyor.yml	-rw-r--r--	4.0 KB
premake5.lua	-rw-r--r--	28.3 KB
slang-com-helper.h	-rw-r--r--	4.8 KB
slang-com-ptr.h	-rw-r--r--	4.8 KB
slang-tag-version.h	-rw-r--r--	36 bytes
slang.h	-rw-r--r--	120.6 KB
slang.sln	-rw-r--r--	9.9 KB
test.bat	-rw-r--r--	1.4 KB
travis_build.sh	-rw-r--r--	460 bytes
travis_test.sh	-rw-r--r--	435 bytes

Showing with 0 additions and 0 deletions (0 / 0 diffs computed)

Computing file changes ...

Further refactoring of semantic checking (#1102)

README.md