-
Notifications
You must be signed in to change notification settings - Fork 336
Runtime Analysis replacing static Dataflow analysis #14633
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft
hubertp
wants to merge
19
commits into
develop
Choose a base branch
from
wip/hubert/10525-visualizations
base: develop
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
+2,454
−478
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Distinguishing UUID into External and Internal to allow to better control (required) dependencies.
Tracking runtime dependencies no longer affects regular execution*. All tracking is done in instrumentation by carefully wrapping and unwrapping Ref objects at specific nodes. *writing/reading from a frame slot is limited to objects and not optimized for longs
Rather than associating visualizations with either RuntimeCache or Ref entries, fallback to the old logic using synchronization. This is due to the fact on how instrumentation collects node information when entering some non-top context. It was impossible to track when to send visualization updates when popping context without a separate synchronization map which now finally made sense. 5 tests are failing, mostly due to subexpression invalidation (or rather lack of it). This will be introduced in the next PR. RuntimeAnalysis should be somehow accessible via state but that is currently problematic. This will be resolved eventually.
1. Updating IdMap in a subexpression now invalidates the whoe RHS expression. 2. RHS with an explicit type ascription will NOT be cached because a) the ascribed expression shares the same external ID with type ascription b) (more importantly) `TypeCheckExpressionNode` is currently treated specially as an `ExpressionNode` and is NOT instrumentable. Maybe we can mitigate those 2 limitations but for now, let's leave it as is.
All logic for building ref dependencies is now encapsulated in callbacks implementation. Limit exposure to invalid runtime dependency construction by limiting to 1 ExecuteJob.
1. Subexpressions execution Correctly detect when a parent expression needed to be updated and therefore invalidated, when re-executing subexpressions. That way we avoid spurious updates to expressions/visualizations 2. ChangesetBuilder would detect changes too precisely. ChangesetBuilder would identify the subexpression update (like applying an argument to partially applied function) but not the RHS of an expression. This led to stale computations. Now all changes are mapped to RHS, if possible.
We are now more precise
Many of tests require proper runtime analysis despite missing identifiers for RHSs. GUI could also be missing explicit IDs (by accident or on purpose) and runtime analysis should continue to work in such cases. Added an unfortunate hack to `ForceNode` when a Ref is not being unwrapped. A proper fix is still TBD.
5 tasks
Some of the persisted objects were equal despite reporting the opposite.
2 tasks
Member
|
Member
|
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
CI: Clean build required
CI runners will be cleaned before and after this PR is built.
CI: No changelog needed
Do not require a changelog entry for this PR.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Pull Request Description
!DRAFT PR/WIP!
This is an experiment in replacing Dataflow analysis with runtime tracking.
(compared to other attempt) This change tracks/caches only results of variable assignments. The tracking of dependencies is only performed during instrumentation/interactive mode thus not affecting batching execution.
In a nutshell the problem can be simply observed with an example of
where we want to collect runtime information of local variables
xdepends on nothingydepends onxzdepends on nothingadepends onzandyThat way, if
xchanges, then onlyyandaget re-executed.Refis an object that wraps every instrumented RHS value of anAssignmentNode. As a resultAssignmentNodesetsRefobjects to theVirtualFrameslot rather than the value itself. Whenever instrumentedReadLocalVariableNodegets value from the frame it gets aRefobject, allowing to register a dependency between variables. The code above would roughly translate during instrumentation toThe main logic that creates Ref objects and unwraps them when necessary is in
onEnter/onReturnValuemethods ofIdExecutionInstrumentation.Care had to be taken in order to properly track runtime dependencies between function calls, e.g.,
makes sure that
xis a dependency ona.As
Refis an object that is only manipulated during instrumentation/invalidation it should not be present in any of Truffle nodes.AssignmentNodes,ReadLocalVariableNode) we make sure that each has a unique identifier. As every expression has a (lazily-generated) UUID, we had to enhance it with additional information for the correct identification of instrumentable and cache-ready nodes.Every expression node is assigned a
RuntimeIDthat apart from the existing UUID informs if the UUID has been assigned externally (via GUI,ExternalUUID) or internally (via a randomly generated UUID,InternalUUID). AdditionallyRuntimeIDcarries information if the given expression should be cached, based on its position in the program tree representation.Note that both
ExternalUUIDandInternalUUIDmay refer to a cached node but not allExternalUUIDare always cached (GUI assigns UUID to many expressions). This covers scenarios where many of our tests do not explicitly set UUID for all expressions but also situations where GUI might be missing those as well (uncommon but still possible). We want runtime tracking to continue to work in all such scenarios.Only RHSs of local assignments are being cached (compared to additional self arguments in the existing solution).
When upserting visualizations, visualizations are being evaluated ASAP, rather than only whenever the corresponding expression was executed. Most of the time such evaluation should not require write lock on compilation, thus reducing the time to user feedback (TTUF). Additionally, whenever a cached value is already available, visualization is being executed with the cached value immediately, reducing the TTUF as well. If no cached value is present a fresh execution is scheduled.
Re-execution still executes from the beginning of a given program. Most of the time, whenever RHSs of assignments are cached, the execution of expressions is therefore immediate. Assignments need to be re-executed in order to initialize frames' slots.
Visualizations for subexpressions is problematic with the new approach (e.g. widgets) and common enough not to be ignored. The solution is for every visualization to always identify a parent expression. An additional map will carry information about such (subexpression) visualizations. If during execution of an assignment a RHS is not cached, a subexpression and all associated visualizations will be executed, as usual. However if RHS is cached but has subexpressions with not yet executed visualizations, it is still fully executed without updating or invalidating RHS or its dependencies. This diverges from to the current Dataflow analysis approach which would often trigger excessive cache invalidations and re-executions on new visualization requests.
Care had to be taken in order to execute only 1(!)
ExecuteJobat a time. If parallel execution is enabled, Truffle instrumentation appears to execute (Probe-) Nodes sometimes in an out-of-order fashion, making the logic that tracks dependencies unpredictable.ChangesetBuilder had to be adapted in order to correctly identify RHSs that need to be invalidated on edits. Similarly IdMaps had to be taken into account, as revealed by failing tests.
Currently every context has its own
RuntimeCache. This slightly complicates the dependency tracking between function calls as instrumentation has to be able to still make a dependency between return value of a function call and caller.RefInvalidationencapsulates a logic for invalidating cached RHSs. It takes a special care to invalidate caches between contexts (every function call has its own runtime cache). This is related to the fact that care had to be taken to only send visualizations/expression updates whenever we are in the right context, as revealed by some tests inRuntimeVisualizationsTest.Mostly tested with real/big projects and unit testing in
RuntimeVisualizationsTest/RuntimeServerTest.There are still failing tests/broken logic so this is very much WIP.
Checklist
Please ensure that the following checklist has been satisfied before submitting the PR:
Scala,
Java,
TypeScript,
and
Rust
style guides. In case you are using a language not listed above, follow the Rust style guide.
or the Snowflake database integration, a run of the Extra Tests has been scheduled.