-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Define extension API for user-defined invariants. #14029
Comments
Thanks @wiedld -- I don't fully understand the usecase
Could you provide an example of such an invariant? I normally think of "invariants" as some property that always holds true for a certain type of node (for example that It would perhaps make sense to provide a way to define invariants for Defining an invarint check for |
For example, if we added a function like this to the impl ExecutionPlan {
...
fn check_node_invariants(&self, invariant_level: InvariantLevel) -> Result<()>
Ok(())
}
...
} |
Agreed. If this^^ is the final form of the execution plan invariants, then we don't need an additional interface for the execution plan invariant extensions. But that would still leave the logical plan invariant extensions for consideration. If that WIP keeps the |
I would recommend adding a method to: https://docs.rs/datafusion/latest/datafusion/logical_expr/trait.UserDefinedLogicalNode.html Perhaps even the same API: pub trait UserDefinedLogicalNode {
...
fn check_invariants(&self, invariant_level: InvariantLevel) -> Result<()>
Ok(())
}
... 🤔 Then you could extend
|
Another use case, as discussed in person with @wiedld and @crepererum, would be enforcing the invariant that all nodes that had |
Is your feature request related to a problem or challenge?
As part of the work to automatically check invariants for the logical and execution plans, we have provided infrastructure to run an invariant checker. This invariant checker runs at limited time points in order to not degrade planning performance; e.g. after all optimizations are completed. In debug mode, it runs these checks more often and can therefore help to quickly isolate at which point (e.g. which specific optimizer run) make the plan become invalid.
We want to also enable users to add their own invariants. Users are already able to add their own Logical and Execution plan extensions, as well as their own optimization runs which modify these plans. Therefore it may be useful for an invariant extension interface for user-defined invariants. e.g. If a change in Datafusion core's optimizer passes will cause a problem in a user-defined Logical plan extension, then the user could define an invariant based upon what their Logical plan extension requires.
Refer to specific examples in this conversation, for plan extensions which have their own invariants. For the example case of our own
ProgressiveEval
-- we require the input partition streams to have specific sort orders, non-overlapping column ranges, and no pushdown of the offset (issue) in order to provide the correct result. An invariant check, performed after each optimizer run (while in debug mode), would enable us to quickly isolate the problem during DF upgrade.(We have several other, more complex, examples of how changes in the optimization of UNIONs has produced invalid plans for our
SortPreservingMerge
. So this is not a one-off example, the above is merely the simplest concrete example.)Describe the solution you'd like
Take the existing invariant infrastructure provided as part of this issue, and provide extension points for users to define their own invariants.
Describe alternatives you've considered
Alternative 1: for a user-defined Execution plan extension, have a runtime check of invariants be performed.
Alternative 2: for either Logical or Physical plan extensions, the user can define an optimization run which is intended to detect invariant violations which are in conflict with their plan extensions.
Additional context
No response
The text was updated successfully, but these errors were encountered: