feat: add method to get provider from type #135

jokasimr · 2024-02-19T13:33:35Z

Fixes #98 and #108

jokasimr · 2024-02-19T13:37:09Z

I think that now after adding the type matching logic we don't need to treat subproviders separately anymore.

That would simplify adding the __getitem__ and __contains__ methods. So therefore I'm considering making that refactoring here, what do you think @SimonHeybrock ?

SimonHeybrock · 2024-02-19T13:54:55Z

src/sciline/pipeline.py

+        matches = [
+            (provider, bound)
+            for ptype, provider in self._providers.items()
+            if (bound := _find_bounds_to_make_compatible_type_tuple((tp,), (ptype,)))
+            is not None
+        ]


This replaces a constant-time lookup by a linear-time iteration. And this is called for every requirement, recursively. That is we can the complexity of Pipeline.get from something like O(N) to O(N^2).

I do not know how large pipelines will get, but this may be severe problem we should not simply ignore.

Can you somehow keep the rest of your refactor, but still have a constant-time lookup by default, falling back to iteration only if there is no match?

You're absolutely correct. We could group the providers by their origin type ahead of the iteration to achieve the same effect as the old constant time lookup.

However, I tested this for the esssans workflow using

%%timeit iofq = pipeline.get(BackgroundSubtractedIofQ)

and the result is that the time goes from 2ms to 10ms.

I'd say that's not particularly significant and right now I don't think it is worth complicating the code for.
We can always add this in the future if it becomes an issue.

I think I disagree. The ESSsans workflow is "tiny", and I am surprised that even there this takes 5x the time! Since the effect is quadratic I do not think we should ignore this.

Why do yousuggest we would have to pre-group? Can't we look for an exact match, simply calling self._providers.get(tp) and use that if found?

Can't we look for an exact match, simply calling self._providers.get(tp) and use that if found?

Yeah sure we can do that.

I don't agree 5x is dramatic. Consider that if the effect was truly O(n^2) we could expect a slowdown proportional to the number of providers in the pipeline, that is, >50.

Can't we look for an exact match, simply calling self._providers.get(tp) and use that if found?

Yeah sure we can do that.

I don't agree 5x is dramatic. Consider that if the effect was truly O(n^2) we could expect a slowdown proportional to the number of providers in the pipeline, that is, >50.

I would not expect 50x, since the code also does other things, apart from provider lookup.

But to clarify:

Switching to an algorithm with much worse scaling (here O(N^2) instead ofO(N) should never be done lightheartedly and without great care, in particular if we are not in full control over all the contexts where the algorithm is used (see next point).

Sciline is a generic package that is meant to be useful well beyond the context of the neutron-scattering data-reduction workflows that we are writing. Therefore, the benchmark you ran above is not a sufficient data point (or rather, it shows that we should not make the change).

SimonHeybrock · 2024-02-19T14:43:07Z

src/sciline/pipeline.py

+        if tp in self._providers:
+            # Optimization to quickly find non-generic providers
+            matches = [(self._providers[tp], {})]


Nitpicking, but you replaced a single lookup by two lookups. Can we use the walrus approach like the original implementation?

jl-wynen · 2024-02-20T10:45:08Z

src/sciline/pipeline.py

@@ -434,6 +433,18 @@ def __setitem__(self, key: Type[T], param: T) -> None:
            )
        self._set_provider(key, Provider.parameter(param))

+    def __getitem__(self, tp: Union[Type[T], Item[T]]) -> Provider:


This makes Provider part of the public interface. So can you import it as well as ArgSpec, ProviderLocation, ProviderKind, ToProvider in the top level init.py? (ToProvider maybe should be exposed in typing instead)

Hm maybe we want to avoid that actually.
How about we make the interface such that it returns the same type as was passed into the pipeline.
That is, if the tp matches a provider with kind='function' it returns a function, and if tp matches a provider with kind='parameter' then it returns the value?

I'm not sure how the parameter tables should fit here.

jl-wynen · 2024-02-20T10:57:31Z

src/sciline/pipeline.py

About the conceptual issues, I was wrong stating that this makes Pipeline a Mapping. It does not because we would have to also implement __iter__ and __len__. And actually, since there is a __setitem__, if anything, it should be MutableMapping, thus we would also need __delitem__.
However, Pipeline implements part of the interface (and also the interface of Sequence). I'm not entirely sure how it fits in.

More concretely, implementing __getitem__ makes Python think that pipelines are iterable with integer indices:

import sciline as sl def to_string(i: int) -> str: return str(i) pl = sl.Pipeline([to_string], params={int: 3}) list(pl)

raises

sciline.handler.UnsatisfiedRequirement: ('No provider found for type', 0)

This not great. So you would also need to implement __iter__. And since you would be going for a dict-like interface, you would also need keys, values, and items.

And that leaves us with the asymmetry between __getitem__ and __setitem__. For generic providers, the former currently returns different values and accepts different keys than the latter. Even if __setitem__ were extended to support providers on top of parameters.
Also, what about parameter tables or sentinels? Would __getitem__ return those special, internal providers? Or how would it represent them?

With the proposed semantics, the new functions allow checking what a pipeline can produce and how. So it does not allow treating Pipeline like a container. So named methods seem more appropriate to me.

I think this makes a lot of sense.
How about adding a get_provider method instead of __getitem__?

Also, what about parameter tables or sentinels? Would getitem return those special, internal providers? Or how would it represent them?

I don't know, what do you think? I'm thinking that it's likely that a user wants to access the parameter tables that a pipeline uses, but they should preferably be returned in the same form that they were inserted, i.e. as sl.ParamTable.

I don't know, what do you think? I'm thinking that it's likely that a user wants to access the parameter tables that a pipeline uses, but they should preferably be returned in the same form that they were inserted, i.e. as sl.ParamTable.

Just like the discussion about using a graph data structure inside Pipeline, this shows that the implicit way in which we handle param-tables is problematic. We may need to re-think that (come up with a more explicit interface, or way the express the nesting?).

jl-wynen · 2024-02-23T10:45:54Z

src/sciline/pipeline.py

+        provider = self._get_unique_provider(tp, HandleAsBuildTimeException())[0]
+        if provider.kind == 'function':
+            return provider.func
+        return provider.func()


What about other kinds?

Yes that's something we need to talk about. What other cases are there, I looked at the ProviderKinds type but didn't recognize all of them. However, I think it's unlikely that a user will request other types, so for now I'm open to just raising NotImplemented in the other cases.

Well, users might request Series. But that, along with the other kinds, may be subject to change if param tables are removed.

Yes, that's what I thought as well, so I don't think it makes sense to invest time to solve that now.

jl-wynen · 2024-02-23T10:48:03Z

src/sciline/pipeline.py

+        parameter the method returns the value of the parameter.'''
+        provider = self._get_unique_provider(tp, HandleAsBuildTimeException())[0]
+        if provider.kind == 'function':
+            return provider.func


Why does it return the function and not the provider object?

Because I agreed with the concern that the Provider class isn't currently exposed, and returning a Provider here would expose it. I think just returning the same type that was provided by the user keeps things simpler.

I'm not concerned about exposing it. This can be useful in its own right.

Your solution makes it difficult to distinguish between providers and parameters. You can't check the type of the return value of get_provider because parameters can be callable and providers can now be any callable object, not just functions. So you would have to compare the returned type to tp.

My only concern about expose Provider is that it would increase the api surface area. I agree that it's a problem that you can't generally determine if the returned provider is a parameter or a function.

refactor: remove subprovider concept

596b517

jokasimr requested a review from SimonHeybrock February 19, 2024 13:34

SimonHeybrock reviewed Feb 19, 2024

View reviewed changes

jokasimr added 2 commits February 19, 2024 15:32

optimization for nongeneric providers

f86bae4

fix: use more suitable method

5a3a900

SimonHeybrock reviewed Feb 19, 2024

View reviewed changes

jokasimr added 7 commits February 19, 2024 15:56

feat: add __getitem__ and __contains__

2d48817

fix: do only one lookup

ded4f8a

mypy

7b4149b

fix

86da6fc

mypy

81fba82

test: getitem

33c015b

test: contains

49b088a

jokasimr requested a review from jl-wynen February 20, 2024 09:03

jokasimr self-assigned this Feb 20, 2024

jokasimr marked this pull request as ready for review February 20, 2024 09:04

jokasimr changed the title ~~refactor: remove subprovider concept~~ feat: add method to get provider from type Feb 20, 2024

jl-wynen reviewed Feb 20, 2024

View reviewed changes

jokasimr added 2 commits February 23, 2024 10:19

better names + don't expose provider class

11002b7

fix tests

bc53626

jokasimr requested a review from jl-wynen February 23, 2024 09:44

jl-wynen reviewed Feb 23, 2024

View reviewed changes

jokasimr mentioned this pull request Apr 15, 2024

Simplify remove subproviders #157

Closed

jokasimr closed this Sep 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add method to get provider from type #135

feat: add method to get provider from type #135

jokasimr commented Feb 19, 2024

jokasimr commented Feb 19, 2024

SimonHeybrock Feb 19, 2024

jokasimr Feb 19, 2024 •

edited

Loading

SimonHeybrock Feb 19, 2024

jokasimr Feb 19, 2024

SimonHeybrock Feb 19, 2024

SimonHeybrock Feb 19, 2024

jl-wynen Feb 20, 2024

jokasimr Feb 20, 2024

jl-wynen Feb 20, 2024

jokasimr Feb 20, 2024

SimonHeybrock Feb 20, 2024

jl-wynen Feb 23, 2024

jokasimr Feb 23, 2024

jl-wynen Feb 23, 2024

jokasimr Feb 23, 2024

jl-wynen Feb 23, 2024

jokasimr Feb 23, 2024

jl-wynen Feb 23, 2024

jokasimr Feb 23, 2024

feat: add method to get provider from type #135

feat: add method to get provider from type #135

Conversation

jokasimr commented Feb 19, 2024

jokasimr commented Feb 19, 2024

Choose a reason for hiding this comment

jokasimr Feb 19, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jokasimr Feb 19, 2024 •

edited

Loading