Support requesting `TaskGraph` in providers #113

jl-wynen · 2024-01-31T13:34:52Z

When building metadata for the output (e.g. in scipp/essreflectometry#6), it can be useful / necessary to inspect the pipeline or task graph. To this end, if we want to do everything in a pipeline instead of a separate step after the pipeline, we need to be able to request the task graph in a provider.

So I suggest supporting providers like this:

def make_metadata(task_graph: sciline.TaskGraph, other_args...):
    ...

where Sciline detects TaskGraph and automatically provides the current task graph.

Of course, this is only usable if we have methods to inspect task graph as discussed in #107

The text was updated successfully, but these errors were encountered:

SimonHeybrock · 2024-01-31T14:16:54Z

Without having thought to deeply I think this is likely a dangerous idea that might cause a lot of pain. Can you explain why other solutions are not feasible?

jl-wynen · 2024-01-31T14:32:03Z

Since make_metadata needs to inspect the actual task graph to figure out which corrections where applied, it cannot run as a provider right now.

The only alternative I see is doing this 'in post'. E.g.

pipeline = ...
task_graph = pipeline.get(Result)
result = task_graph.compute(Result)
metadata = make_metadata(task_graph)
save(result, metadata)

However, make_metadata needs to access intermediate results (in ORSO: wavelength range, scattering angle). So it needs to recompute those. Or we need to request all of them manually in the code above.
Also, this means that we cannot use bind_and_call for saving data but have to provide all parameters manually.

The only danger I see is that the provider could modify the task graph. But that would be just as illegal as modifying any other input.

SimonHeybrock · 2024-02-01T05:40:07Z

The only danger I see is that the provider could modify the task graph. But that would be just as illegal as modifying any other input.

Apart from the architectural monster that we would be creating, that will be a nightmare to test, maintain, and modify in the future..

Why not a couple lines of Python?

def run_and_save(pipeline: sciline.Pipeline):
    graph = pipeline.get((Result, MetaData))
    results = graph.compute()
    metadata = assemble_metadata(results[MetaData], graph)
    save(results[Result], metadata)

jl-wynen · 2024-02-01T07:54:24Z

Because that functions needs to be copied and adjusted in every workflow.

Can you elaborate what you mean by 'architectural monster'?

SimonHeybrock · 2024-02-01T09:57:16Z

Because that functions needs to be copied and adjusted in every workflow.

How is that different from your suggested make_metadata above? Doesn't that also have to be copied and adjusted for every workflow?

Can you elaborate what you mean by 'architectural monster'?

I think we would need to do something like this, but probably there is more:

Add special special handling to prevent insertion of providers or params to provide sciline.TaskGraph
In the graph building logic, add special handling if providers depends on TaskGraph and insert a placeholder.
Once the graph is built, iterate over the graph, look for the placeholder, and replace it by a reference to the graph (or probably a copy).
Add special handling to the visualization code.
Figure out how to store such task graphs (which we plan to do), because this looks very non-standard.
Write lots of documentation, explaining how the mechanism works, and the pitfalls (such as infinite recursion, ...).

Given the simple solution provided above, I do not see a justification for adding such complexity.

jl-wynen · 2024-02-01T10:38:08Z

How is that different from your suggested make_metadata above? Doesn't that also have to be copied and adjusted for every workflow?

I doubt that the automatic metadata is rarely good enough. So I was thinking of having a post-processing step that a user can hook into by adding their own provider to the graph. With your function, that would look like this:

def run_and_save(pipeline: sciline.Pipeline, post_process: Callable[[MetaData], MetaData]):
    graph = pipeline.get((Result, MetaData))
    results = graph.compute()
    metadata = assemble_metadata(results[MetaData], graph)
    metadata = post_process(metadata)
    save(results[Result], metadata)

So it is possible but builds a new customisation mechanism outside of the graph.

And then there is the question whether a single Result type makes sense or whether we compute several distinct results and write them all out to the same (set of) files which all have the same metadata.

I think we would need to do something like this, but probably there is more:

How about every pipeline starting with a provider for TaskGraph? When building the actual graph, that provider would need to be given a (weak) ref to the task graph. Everything else would stay the same.

and the pitfalls (such as infinite recursion, ...).

How would this lead to an infinite recursion?

SimonHeybrock · 2024-02-01T10:56:30Z

How is that different from your suggested make_metadata above? Doesn't that also have to be copied and adjusted for every workflow?

I doubt that the automatic metadata is rarely good enough. So I was thinking of having a post-processing step that a user can hook into by adding their own provider to the graph. With your function, that would look like this:
def run_and_save(pipeline: sciline.Pipeline, post_process: Callable[[MetaData], MetaData]):
    graph = pipeline.get((Result, MetaData))
    results = graph.compute()
    metadata = assemble_metadata(results[MetaData], graph)
    metadata = post_process(metadata)
    save(results[Result], metadata)
So it is possible but builds a new customisation mechanism outside of the graph.

And then there is the question whether a single Result type makes sense or whether we compute several distinct results and write them all out to the same (set of) files which all have the same metadata.

I am not sure what you are trying to say here, or what would be better with the self-referential solution?

I think we would need to do something like this, but probably there is more:

How about every pipeline starting with a provider for TaskGraph? When building the actual graph, that provider would need to be given a (weak) ref to the task graph. Everything else would stay the same.

Not sure what that changes. You would then also need to add special case handling to the "check if all inputs/providers" are used, requiring modifications to the graph structure analysis, ...

and the pitfalls (such as infinite recursion, ...).

How would this lead to an infinite recursion?

Call graph.compute() in make_metadata?

jl-wynen · 2024-02-01T12:23:28Z

Not sure what that changes. You would then also need to add special case handling to the "check if all inputs/providers" are used, requiring modifications to the graph structure analysis, ...

How so? I'm suggesting a provider like this:

from typing import NewType
import weakref

from sciline import Pipeline
from sciline.task_graph import TaskGraph

class TaskGraphProvider:
    def __init__(self) -> None:
        self._task_graph: weakref.ReferenceType | None = None

    def set_task_graph(self, tg: TaskGraph) -> None:
        self._task_graph = weakref.ref(tg)

    def __call__(self) -> TaskGraph:
        if self._task_graph is None:
            raise RuntimeError("Broken task graph: TaskGraph reference has not been initialized")
        if (tg := self._task_graph()) is None:
            raise RuntimeError("Broken task graph: TaskGraph reference has been cleared")
        return tg

RawData = NewType('RawData', int)
ReducedData = NewType('ReducedData', int)

def load() -> RawData:
    return RawData(3)

def reduce(corrected: RawData) -> ReducedData:
    return ReducedData(corrected + 3)

pl = Pipeline((load, reduce))
pl[TaskGraph] = TaskGraphProvider()

tg = pl.get(ReducedData)
find_task_graph_provider(tg).set_task_graph(tg)

result = tg.compute(ReducedData)

Except that pl[TaskGraph] = TaskGraphProvider() would be in Pipeline.__init__ and find_task_graph_provider(tg).set_task_graph(tg) in Pipeline.build. Apart from that, everything should work as is. And I think it already would if #114 was possible.

How would this lead to an infinite recursion?
Call graph.compute() in make_metadata?

Only if the requested output depends on the task graph. But then, yes, that should of course not be done. But I don't see it as a strong reason against supporting this.

SimonHeybrock · 2024-02-05T04:35:27Z

I am not claiming it is impossible, but I still don't see any reason for significantly complicating the design/architecture and adding "unnecessary" features, given that there is a very simple solution, solving the "problem" differently.

Have you actually tried this in practice, and gained experience that such a solution is clearly not feasible?

Unless it is not, my answer would be a categorical "No, this will not be implemented".

jl-wynen · 2024-02-05T08:35:18Z

I have not tried it yet. But this will be part of the ORSO file writer scipp/essreflectometry#27

SimonHeybrock · 2024-02-06T05:03:02Z

I suggest we try it for 6-12 months in practice. Marking as won't fix until we have information to the contrary.

jl-wynen added the enhancement New feature or request label Jan 31, 2024

jl-wynen added this to Development Board Jan 31, 2024

github-project-automation bot moved this to Next in Development Board Jan 31, 2024

SimonHeybrock closed this as completed Feb 6, 2024

github-project-automation bot moved this from Next to Done in Development Board Feb 6, 2024

SimonHeybrock added the wontfix This will not be worked on label Feb 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support requesting `TaskGraph` in providers #113

Support requesting `TaskGraph` in providers #113

jl-wynen commented Jan 31, 2024 •

edited

Loading

SimonHeybrock commented Jan 31, 2024

jl-wynen commented Jan 31, 2024

SimonHeybrock commented Feb 1, 2024

jl-wynen commented Feb 1, 2024

SimonHeybrock commented Feb 1, 2024

jl-wynen commented Feb 1, 2024

SimonHeybrock commented Feb 1, 2024

jl-wynen commented Feb 1, 2024

SimonHeybrock commented Feb 5, 2024

jl-wynen commented Feb 5, 2024

SimonHeybrock commented Feb 6, 2024

Support requesting TaskGraph in providers #113

Support requesting TaskGraph in providers #113

Comments

jl-wynen commented Jan 31, 2024 • edited Loading

SimonHeybrock commented Jan 31, 2024

jl-wynen commented Jan 31, 2024

SimonHeybrock commented Feb 1, 2024

jl-wynen commented Feb 1, 2024

SimonHeybrock commented Feb 1, 2024

jl-wynen commented Feb 1, 2024

SimonHeybrock commented Feb 1, 2024

jl-wynen commented Feb 1, 2024

SimonHeybrock commented Feb 5, 2024

jl-wynen commented Feb 5, 2024

SimonHeybrock commented Feb 6, 2024

Support requesting `TaskGraph` in providers #113

Support requesting `TaskGraph` in providers #113

jl-wynen commented Jan 31, 2024 •

edited

Loading