Dyno: unify return, break, throw etc. handling across Resolver, type inference, and split init etc. #26955

DanilaFe · 2025-03-20T19:50:44Z

Makes progress on https://github.com/Cray/chapel-private/issues/7229, but does not resolve it.

This PR takes the first step towards handling early-return or early-error patterns in the standard library. At the same time, it improves various passes' handling of diverse control flow (enables continue, return, etc in resolution, break and continue in init/deinit etc.). It also reduces cases in which default init etc. was resolved for declarations that are not reachable.

See future work below for cases this PR does not handle.

I approached this by making the following observation: a large number of passes / visitors rely on an approximating control flow information.

ReturnTypeInferrer (used for computing function return types) keeps a stack of frames along with information about whether they continued, had a break, or returned. This is used for iterating over param loops and ignoring unreachable returns.
VarScopeVisitor (used for not handling init/deinit/etc. in dead code, but also for enabling more split-init cases)
Resolver (not used prior to this PR, but would need to reason about returns etc. to avoid resolving dead code)

In each case, this is done by creating a stack of frames (which contain data like returnsOrThrows, continues, etc), and updating that stack during traversals. In each case, there is distinct logic to reconcile information from various frames, and to handle constructs like if and select where param-true branches cause other branches to be ignored.

This PR unifies those traversals by defining a new abstract base class called BranchSensitiveVisitor. This class does not have any visit methods of its own (the subclasses are intended to provide that). It provides methods such as enterScope and exitScope which take care of tracking and combining control flow information (e.g., deciding if all branches return/throw). It also provides methods branchSensitivelyTraverse, which implement common param-folding logic for conditionals and if/elses. It then switches all frame-based traversals to use this new BranchSensitiveVisitor. This has numerous advantages:

There is less duplicated code (resolving a TODO in return-type-inference.cpp, which started as a watered down duplication of VarScopeVisitor's code, but evolved in an orthogonal direction).
Each pass is guaranteed to behave the same way w.r.t. control flow. If, e.g., Resolver skips code because it's dead, we know that ReturnTypeInferrer and VarScopeVisitor will skip that code too).
It's easy to enhance analyses (like Resolver) with control flow handling without much additional code.

As a nice side effect, VarScopeVisitor became aware of continue and break (previously only handled by return type inference), and Resolver became better at handling break and continue in param loop).

This PR takes an opinionated stance on #26968 and disallows split-init across returns. This is a consequence of not visiting dead code for computing split-init types.

AST Traversal Changers

In all traversals, we want to skip visiting code when it's dead. This is both a performance concern (why waste time resolving functions that are not reached?) and a correctness concern (e.g., compilerError is guaranteed not to do anything directly after a return). How does one add this logic to every single visit / traverse call?

I considered several options:

Adding a "prelude" to each visit method that returns early if the BranchSensitiveVisitor has determined that it will not be reached. This prelude could be a macro to avoid duplicating a lot of code. However, this approach is undesirable because it leads to some duplication (invocations of the macro at least; duplicate code at worst) in every single visit method. Further, authors of visit methods must remember to include this prelude lest the dead-code sensitivity be affected.
Following the lead of ResolvedVisitor and having a custom visit method that dispatches to some other method (e.g., visitImpl). This way, the visit can be defined once, and the dead-code checking can happen before the dispatch to visitImpl. However, this introduces a second dispatch, introduces churn (changing all visit to visitImpl on resolver), and is anti-compositional (if visit has different signatures, like visit(node) in Resolver and visit(node, RV) in VarScopeVisitor, visitImpl would have to also have a different signature; this introduces an m-by-n implementation problem).
Configuring the traverse function to exit early before even invoking enter and exit. This can be done using a template specialization to provide a "default value", and to avoid runtime overhead.

I settled for 3. Specifically, all invocations of traverse now reference a template class with a static method, AstVisitorPrecondition<T>::skipSubtree(node, visitor) to decide whether to invoke enter / exit on a node and its children. The default template for AstVisitorPrecondition always returns true, which means the default behavior of traverse remains the same. However, visitors that want to return early / skip visiting code altogether can specialize the template and define a condition. Resolver, VarScopeVisitor, and ReturnTypeInferrer all do this; they check the current frame for signs that return has already been encountered, and if so, do not descend into any more AST nodes. For Resolver, this does not happen when scopeResolveOnly is set, since we always want to enter every node in that mode.

I find this appealing because user code saw very little churn (10 lines to specialize the template for Resolver, e,g.), and each new enter / exit method added to Resolver will automatically work with early returns (since the traverse call which invokes enter/ exit knows to check). The precondition could be easily defined for all visitors affected by this PR.

Macro Tweaks

@jabraham17 and I have settled on a pattern in chapel-py in which the X-macro files (e.g., uast-methods.h) auto-defined missing X-macros and undefine them automatically. This helps reduce a lot of boilerplate. As I was experimenting with dispatch (approach 2) above, I found it convenient to apply this pattern to the Dyno code proper. Rather than polluting uast-classes-list.h and type-classes-list.h (which IMO serve as good references), I added wrapper files that handle the aforementioned niceties. I then switched AstNode.h and Type.h to using these wrappers, saving a considerable amount of noise. I did not end up with new uses of the adapters in my code (since I went with template specialization instead), but I thought the cleanup is worth including here.

Module Changes

This PR reverts (some) changes to module code from b789f5b and ee57b16, which were previously required to work around the resolver's inability to reason about early returns. Some code, however (ChapelDomain.chpl from b789f5b), relies on compilerError being treated as early return, which is future work.

Reviewed by @arezaii -- thanks!

Future Work

compilerError, which is considered an "early return" in production
branches that both "exit" but differently (e.g., code after if bla then continue else break does not execute)
param loops in the VarScopeVisitor family of passes (init/deinit, split init, copy elision) (see https://github.com/Cray/chapel-private/issues/4209)
labeled break / continue (https://github.com/Cray/chapel-private/issues/7245)

Testing

dyno tests
performance testing -- frontend tests did not get any slower in CI or locally
paratest

Tested cases

VarScopeVisitor now knows about break and continue
VarScopeVisitor now stops traversing after return, which should enable reasoning about early returns (already tested)
Resolver now reasons about control flow and skips resolving dead code
Resolver now properly handles continue, break, and return in param loops

…itor Signed-off-by: Danila Fedorin <[email protected]>

Signed-off-by: Danila Fedorin <[email protected]>

This reverts commit ee57b16. The changes introduced in preceding commits mean that the Resolver can handle early returns. Signed-off-by: Danila Fedorin <[email protected]>

Signed-off-by: Danila Fedorin <[email protected]>

arezaii

Great improvements and the refactored code is a plus too!

All my comments are just nits that are either simple typos or an inconsistency with use of single quotes or back ticks around terms in the comments. Specifically there is inconsistent use of '' around the reserved words used in control flow and the use of back ticks around param-known and param-decided, etc. (note that I didn't call all these out with individual comments)

arezaii · 2025-03-24T21:29:29Z