Skip to content

Dyno: unify return, break, throw etc. handling across Resolver, type inference, and split init etc. #26955

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 32 commits into from
Mar 26, 2025

Conversation

DanilaFe
Copy link
Contributor

@DanilaFe DanilaFe commented Mar 20, 2025

Makes progress on https://github.com/Cray/chapel-private/issues/7229, but does not resolve it.

This PR takes the first step towards handling early-return or early-error patterns in the standard library. At the same time, it improves various passes' handling of diverse control flow (enables continue, return, etc in resolution, break and continue in init/deinit etc.). It also reduces cases in which default init etc. was resolved for declarations that are not reachable.

See future work below for cases this PR does not handle.

I approached this by making the following observation: a large number of passes / visitors rely on an approximating control flow information.

  • ReturnTypeInferrer (used for computing function return types) keeps a stack of frames along with information about whether they continued, had a break, or returned. This is used for iterating over param loops and ignoring unreachable returns.
  • VarScopeVisitor (used for not handling init/deinit/etc. in dead code, but also for enabling more split-init cases)
  • Resolver (not used prior to this PR, but would need to reason about returns etc. to avoid resolving dead code)

In each case, this is done by creating a stack of frames (which contain data like returnsOrThrows, continues, etc), and updating that stack during traversals. In each case, there is distinct logic to reconcile information from various frames, and to handle constructs like if and select where param-true branches cause other branches to be ignored.

This PR unifies those traversals by defining a new abstract base class called BranchSensitiveVisitor. This class does not have any visit methods of its own (the subclasses are intended to provide that). It provides methods such as enterScope and exitScope which take care of tracking and combining control flow information (e.g., deciding if all branches return/throw). It also provides methods branchSensitivelyTraverse, which implement common param-folding logic for conditionals and if/elses. It then switches all frame-based traversals to use this new BranchSensitiveVisitor. This has numerous advantages:

  1. There is less duplicated code (resolving a TODO in return-type-inference.cpp, which started as a watered down duplication of VarScopeVisitor's code, but evolved in an orthogonal direction).
  2. Each pass is guaranteed to behave the same way w.r.t. control flow. If, e.g., Resolver skips code because it's dead, we know that ReturnTypeInferrer and VarScopeVisitor will skip that code too).
  3. It's easy to enhance analyses (like Resolver) with control flow handling without much additional code.

As a nice side effect, VarScopeVisitor became aware of continue and break (previously only handled by return type inference), and Resolver became better at handling break and continue in param loop).

This PR takes an opinionated stance on #26968 and disallows split-init across returns. This is a consequence of not visiting dead code for computing split-init types.

AST Traversal Changers

In all traversals, we want to skip visiting code when it's dead. This is both a performance concern (why waste time resolving functions that are not reached?) and a correctness concern (e.g., compilerError is guaranteed not to do anything directly after a return). How does one add this logic to every single visit / traverse call?

I considered several options:

  1. Adding a "prelude" to each visit method that returns early if the BranchSensitiveVisitor has determined that it will not be reached. This prelude could be a macro to avoid duplicating a lot of code. However, this approach is undesirable because it leads to some duplication (invocations of the macro at least; duplicate code at worst) in every single visit method. Further, authors of visit methods must remember to include this prelude lest the dead-code sensitivity be affected.
  2. Following the lead of ResolvedVisitor and having a custom visit method that dispatches to some other method (e.g., visitImpl). This way, the visit can be defined once, and the dead-code checking can happen before the dispatch to visitImpl. However, this introduces a second dispatch, introduces churn (changing all visit to visitImpl on resolver), and is anti-compositional (if visit has different signatures, like visit(node) in Resolver and visit(node, RV) in VarScopeVisitor, visitImpl would have to also have a different signature; this introduces an m-by-n implementation problem).
  3. Configuring the traverse function to exit early before even invoking enter and exit. This can be done using a template specialization to provide a "default value", and to avoid runtime overhead.

I settled for 3. Specifically, all invocations of traverse now reference a template class with a static method, AstVisitorPrecondition<T>::skipSubtree(node, visitor) to decide whether to invoke enter / exit on a node and its children. The default template for AstVisitorPrecondition always returns true, which means the default behavior of traverse remains the same. However, visitors that want to return early / skip visiting code altogether can specialize the template and define a condition. Resolver, VarScopeVisitor, and ReturnTypeInferrer all do this; they check the current frame for signs that return has already been encountered, and if so, do not descend into any more AST nodes. For Resolver, this does not happen when scopeResolveOnly is set, since we always want to enter every node in that mode.

I find this appealing because user code saw very little churn (10 lines to specialize the template for Resolver, e,g.), and each new enter / exit method added to Resolver will automatically work with early returns (since the traverse call which invokes enter/ exit knows to check). The precondition could be easily defined for all visitors affected by this PR.

Macro Tweaks

@jabraham17 and I have settled on a pattern in chapel-py in which the X-macro files (e.g., uast-methods.h) auto-defined missing X-macros and undefine them automatically. This helps reduce a lot of boilerplate. As I was experimenting with dispatch (approach 2) above, I found it convenient to apply this pattern to the Dyno code proper. Rather than polluting uast-classes-list.h and type-classes-list.h (which IMO serve as good references), I added wrapper files that handle the aforementioned niceties. I then switched AstNode.h and Type.h to using these wrappers, saving a considerable amount of noise. I did not end up with new uses of the adapters in my code (since I went with template specialization instead), but I thought the cleanup is worth including here.

Module Changes

This PR reverts (some) changes to module code from b789f5b and ee57b16, which were previously required to work around the resolver's inability to reason about early returns. Some code, however (ChapelDomain.chpl from b789f5b), relies on compilerError being treated as early return, which is future work.

Reviewed by @arezaii -- thanks!

Future Work

Testing

  • dyno tests
  • performance testing -- frontend tests did not get any slower in CI or locally
  • paratest

Tested cases

  • VarScopeVisitor now knows about break and continue
  • VarScopeVisitor now stops traversing after return, which should enable reasoning about early returns (already tested)
  • Resolver now reasons about control flow and skips resolving dead code
  • Resolver now properly handles continue, break, and return in param loops

@DanilaFe DanilaFe changed the title Dyno: expand Resolver to handle flow sensitivity Dyno: unify return, break, throw etc. handling across Resolver, type inference, and split init etc. Mar 22, 2025
@DanilaFe DanilaFe force-pushed the flow-sensitivity branch 2 times, most recently from b77fb80 to 3391feb Compare March 22, 2025 00:25
DanilaFe added 13 commits March 21, 2025 17:27
Signed-off-by: Danila Fedorin <[email protected]>
Signed-off-by: Danila Fedorin <[email protected]>
Signed-off-by: Danila Fedorin <[email protected]>
Signed-off-by: Danila Fedorin <[email protected]>
Signed-off-by: Danila Fedorin <[email protected]>
This reverts commit ee57b16.

The changes introduced in preceding commits mean that the
Resolver can handle early returns.

Signed-off-by: Danila Fedorin <[email protected]>
@DanilaFe DanilaFe marked this pull request as ready for review March 23, 2025 02:40
Copy link
Collaborator

@arezaii arezaii left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great improvements and the refactored code is a plus too!

All my comments are just nits that are either simple typos or an inconsistency with use of single quotes or back ticks around terms in the comments. Specifically there is inconsistent use of '' around the reserved words used in control flow and the use of back ticks around param-known and param-decided, etc. (note that I didn't call all these out with individual comments)

return false;
}

/** Incororate the control flow information from a child statement into
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Incororate" -> "Incorporate"

}
}

/** When exiting an AST node, incorporate the information form its
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"form" -> "from"

}

/** code to run right after we entered a new scope and pushed a new frame.
The default implemnetation sets up subFrame slots for handling Conditionals etc. */
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"implemnetation" -> "implementation"

virtual const types::Param* determineWhenCaseValue(const uast::AstNode* ast, ExtraData extraData) = 0;

/** Overriden by subclasses to determine the value of an if condition.
This is used for evaluating short-circuited / `param known` (or not) Conditional statements. */
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

discrepancy in tick marks used with this and the above usage of "param known"

}
return false;
}
// Not param-known condition; visit both branches as normal.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

`param`-known?

@DanilaFe
Copy link
Contributor Author

the check_large_files check is malfunctioning. I will merge if all other checks pass.

@DanilaFe DanilaFe merged commit e0df7a3 into chapel-lang:main Mar 26, 2025
8 of 9 checks passed
DanilaFe added a commit that referenced this pull request Mar 28, 2025
…7004)

Resolves Cray/chapel-private#7229.

Depends on #26999.

Quoting from that issue: a lot of Chapel production code (including in
library modules) looks like this:

```Chapel
proc foo(param cond) {
    if cond then compilerError("This condition should not be met");
    codeThatOnlyCompilesIfCondIsFalse();
}
```

These rely on the fact that `codeThatOnlyCompilesIfCondIsFalse()` will
not be resolved if an error is already emitted. This means terminating
resolution of `foo()`, much like we do with `return` in
#26955.
This PR implements that.

One conceptual curiosity is that this conflicts with Dyno's existing
approach of attempting to continue resolving programs to get partial
information. Specifically, expressions that emitted errors are marked
with `ErroneousType`, and if the rest of the types can be inferred
despite the error, we infer them. This is desirable (partial information
is useful in editors, for example), but counter to stopping resolution
in its tracks. This PR goes for a two-phase handling of `compilerError`,
which distinguishes functions that call compiler error directly and
functions that simply observe errors emitted by a call.

* __For functions that invoke `compilerError` directly__, this PR makes
use of the changes in #26955
to stop resolution in its tracks. This requires some communication
between the resolver and the other branch-sensitive passes to inform
them of the fatal error, which I achieve using a new flag on
`ResolvedExpression` called `causedFatalError_`[^1].
* Functions that are meant as "helpers" for invoking `compilerError`
(i.e., those that are ignored using the `depth` argument) are considered
"direct" callers to `compilerError` for the purposes of this phase.
That's because they mostly just serve to augment the `compilerError` in
some way.
* __For invocations of functions that error__, this PR simply marks the
call as `ErroneousType`, without interrupting the resolution of the
current function etc. Thus, if we call `foo()`, and `foo()` issues a
`compilerError`, we print the error and mark `foo()` as `ErroneousType`,
but (as described above) continue resolving with partial information.

This way, I believe we can reconcile the expectation of `compilerError`
as terminating resolution, while still allowing us to compute partial
information in other contexts. See the program tested by this PR:

```Chapel
      proc foo() {
        bar();
        return 42;
      }
      proc bar() {
        compilerError("Hello");
        compilerWarnning("inside bar, after call to compilerError");
      }
      var x = foo();
```

Here, the error "Hello" immediately terminates resolution of `bar()` (so
the "inside bar" warning is never printed). This error is issued on the
`bar();` line, but we proceed past this line, and, since the `return 42`
doesn't depend on the call to `bar()`, correctly determine that the `var
x` is an `int`.

## Testing
- [x] dyno tests
- [x] paratest
- [x] spectests still work as before

Reviewed by @benharsh -- thanks!

[^1]: I have checked and on my machine, this flag does not cause the
`ResolvedExpression` type to increaase in size, presumably due to the
way that struct is padded. So this does not cause size / memory
penalties .
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants