-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support requesting TaskGraph
in providers
#113
Comments
Without having thought to deeply I think this is likely a dangerous idea that might cause a lot of pain. Can you explain why other solutions are not feasible? |
Since The only alternative I see is doing this 'in post'. E.g. pipeline = ...
task_graph = pipeline.get(Result)
result = task_graph.compute(Result)
metadata = make_metadata(task_graph)
save(result, metadata) However, The only danger I see is that the provider could modify the task graph. But that would be just as illegal as modifying any other input. |
Apart from the architectural monster that we would be creating, that will be a nightmare to test, maintain, and modify in the future.. Why not a couple lines of Python? def run_and_save(pipeline: sciline.Pipeline):
graph = pipeline.get((Result, MetaData))
results = graph.compute()
metadata = assemble_metadata(results[MetaData], graph)
save(results[Result], metadata) |
Because that functions needs to be copied and adjusted in every workflow. Can you elaborate what you mean by 'architectural monster'? |
How is that different from your suggested
I think we would need to do something like this, but probably there is more:
Given the simple solution provided above, I do not see a justification for adding such complexity. |
I doubt that the automatic metadata is rarely good enough. So I was thinking of having a post-processing step that a user can hook into by adding their own provider to the graph. With your function, that would look like this: def run_and_save(pipeline: sciline.Pipeline, post_process: Callable[[MetaData], MetaData]):
graph = pipeline.get((Result, MetaData))
results = graph.compute()
metadata = assemble_metadata(results[MetaData], graph)
metadata = post_process(metadata)
save(results[Result], metadata) So it is possible but builds a new customisation mechanism outside of the graph. And then there is the question whether a single
How about every pipeline starting with a provider for
How would this lead to an infinite recursion? |
I am not sure what you are trying to say here, or what would be better with the self-referential solution?
Not sure what that changes. You would then also need to add special case handling to the "check if all inputs/providers" are used, requiring modifications to the graph structure analysis, ...
Call |
How so? I'm suggesting a provider like this: from typing import NewType
import weakref
from sciline import Pipeline
from sciline.task_graph import TaskGraph
class TaskGraphProvider:
def __init__(self) -> None:
self._task_graph: weakref.ReferenceType | None = None
def set_task_graph(self, tg: TaskGraph) -> None:
self._task_graph = weakref.ref(tg)
def __call__(self) -> TaskGraph:
if self._task_graph is None:
raise RuntimeError("Broken task graph: TaskGraph reference has not been initialized")
if (tg := self._task_graph()) is None:
raise RuntimeError("Broken task graph: TaskGraph reference has been cleared")
return tg
RawData = NewType('RawData', int)
ReducedData = NewType('ReducedData', int)
def load() -> RawData:
return RawData(3)
def reduce(corrected: RawData) -> ReducedData:
return ReducedData(corrected + 3)
pl = Pipeline((load, reduce))
pl[TaskGraph] = TaskGraphProvider()
tg = pl.get(ReducedData)
find_task_graph_provider(tg).set_task_graph(tg)
result = tg.compute(ReducedData) Except that
Only if the requested output depends on the task graph. But then, yes, that should of course not be done. But I don't see it as a strong reason against supporting this. |
I am not claiming it is impossible, but I still don't see any reason for significantly complicating the design/architecture and adding "unnecessary" features, given that there is a very simple solution, solving the "problem" differently. Have you actually tried this in practice, and gained experience that such a solution is clearly not feasible? Unless it is not, my answer would be a categorical "No, this will not be implemented". |
I have not tried it yet. But this will be part of the ORSO file writer scipp/essreflectometry#27 |
I suggest we try it for 6-12 months in practice. Marking as won't fix until we have information to the contrary. |
When building metadata for the output (e.g. in scipp/essreflectometry#6), it can be useful / necessary to inspect the pipeline or task graph. To this end, if we want to do everything in a pipeline instead of a separate step after the pipeline, we need to be able to request the task graph in a provider.
So I suggest supporting providers like this:
where Sciline detects
TaskGraph
and automatically provides the current task graph.Of course, this is only usable if we have methods to inspect task graph as discussed in #107
The text was updated successfully, but these errors were encountered: