-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a method for connecting providers to different inputs #192
base: main
Are you sure you want to change the base?
Conversation
I need some more time to wrap my head around this, but I feel this mechanism is backwards. What you don't show above is the follow-up, namely a provider def next_correction(da: FootprintCorrectedData) -> MoreCorrectedData: ... There are two cases:
Taken together, the mechanism is thus either incorrect (case 1), or we have covered up one lie with a second lie (case 2). All this also only works because the types are all aliases for the same underlying data type. While I have no concrete suggestion for now, I feel it would be better to look into how the problems in the underlying code (definition of Maybe the root cause of our trouble is that what the functions (providers) actually want is to define a certain input contract. For lack of a better mechanism we have done so using domain types, but this is a "one dimensional" approach and too entangled into how Sciline builds the graphs. Is there a better way to define input contracts and relate them to a graph structure that does not have this problem? |
I agree that this proposal can't work in all cases. It is a global change of the graph. So it can't deal with local changes that you described. When it comes to skipping steps, I see two distinct cases: A user wants to experiment with what would happen if a certain step was not performed. In this case, quickly rewriting the graph is convenient. But it does not matter that the final result may be wrong. It was never intended to skip the step. And regardless of whether we skip it globally or only for part of the graph, the final result will be wrong to some degree. There is an optional step. In this case, we need to build the pipeline around this fact. One way to do this is to turn the types around. Instead of using def fn_that_can_use_any_data(da: RawData) -> Processed: ...
def fn_that_needs_footprint_corrected(da: FootprintCorrectedData) -> MoreCorrectedData: ... and pl = pl.override_input(RawData, FootprintCorrectedData) This would insert the footprint correction before any use of |
This is only a draft because implementing proper support for generics is non-trivial and I would like to get feedback before doing so. However, even in its current state, this implementation is useful.
This PR proposes a mechanism for replacing the inputs of providers with arbitrary nodes. I initially outlined the idea on Slack after our discussion about workflows on 2025-01-15.
override_inputs
allows side-stepping domain types and telling Sciline to use a different type instead.Benefits
'Lying' domain types
We have some pipelines where users can switch off operations. E.g., footprint correction in ESSreflectometry. Doing so keeps a
FootprintCorrectedData
node in the graph which is misleading.Given a provider like
and using
effectively removes footprint correction from the graph. The resulting pipeline still contains
correct_by_footprint
andFootprintCorrectedData
but they are no longer ancestors to any other nodes. This means that the correction won't be used bycompute
. And it won't show up in visualisations or when serialising to JSON.Choosing alternative algorithms
In powder diffraction, we allow the user to select between normalisation by proton charge and monitor. This can be achieved like so:
ProtonChargeNormalizedRunData
andHistogramMonitorNormalizedRunData
and use those as return types of normalisation providers.NormalizedRunData
and use that as input type for all providers consuming normalised data.Any providers that request
NormalizedRunData
will now getProtonChargeNormalizedRunData
instead.I implemented this in TODO
Downsides
This is a one time operation.
The providers are not removed. This means that adding providers later can reintroduce the overridden type. This can partially be ameliorated by removing the overridden nodes. Or fixed full by creating an alias instead that affects all providers added later. (A bit like aliases in coord transforms)
No type safety
We normally benefit from some level of type safety by relying on type hints. This is lost if we allow arbitrary overrides. I don't think this is a major problem given that Dask has no type safety either.
Complex implementation
Especially for generic types. This may not be worth the effort. But as I showed above, even a simple implementation fixes the most pressing issues.