-
Notifications
You must be signed in to change notification settings - Fork 130
rtree.index.Index: avoid use of args/kwargs #344
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| def __init__( | ||
| self, | ||
| filename: str | bytes | None = None, | ||
| stream: Any | None = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Haven't yet figured out how to type this
| arrays = None | ||
| basename = None | ||
| storage = None | ||
| if args: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No more guessing games!
| del r1 | ||
| self.assertTrue(storage.hasData) | ||
|
|
||
| r2 = index.Index(storage, properly=settings, overwrite=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actual typo/bug in our tests discovered by removing **kwargs
| :param interleaved: True or False, defaults to True. | ||
| This parameter determines the coordinate order for all methods that | ||
| :param arrays: A tuple of arrays for bulk addition. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wasn't documented
| hits = list(self.idx.intersection((0, 0, 60, 60))) | ||
|
|
||
| self.assertTrue(len(hits), 10) | ||
| self.assertEqual(len(hits), 10) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also a bug
|
Could use help figuring out the final remaining test failure... |
Options ordering changed? The custom filename test failure must mean an unexpected default is getting used or something? 🤷 |
|
That's what I thought too, but that is just a simple test for |
|
First let's discuss whether or not this is something we want and later we can fix things. |
Custom filenames? Yes, definitely. The janky kwargs/args configuration of the options? It was the style back in the day. I'm sure it could be remade to be something much better if someone has the time to put into it. I'm certainly not adverse to that, but it probably means making deprecating and then breaking things. |
|
We can convert kwargs to parameters immediately without deprecation. Let me see if I can deprecate args and add parameters, it should actually be relatively easy. |
|
Still need to deprecate (instead of remove) some properties and fix the test. |
|
My preference here would be to create a subclass of Index which has a nicer and cleaner interface for construction. It can then map this to the legacy API. This would allow us to retain backwards compatibility and provide a route for code to be gradually updated. Also, |
|
@FreddieWitherden that would make it impossible to deprecate the old interface, because any instantiation of the subclass would also instantiate the parent class. The reverse would actually be better, new classes that implement the logic with the old class being a wrapper around those. However, I still think the current approach is easiest, as users will have the fewest things to change. My only remaining complaint about the current class is that it is highly dynamically typed, making type checking almost impossible. Any attempts to change that would likely result in a completely different library. |
I'm not entirely sure what you mean? The old interface absolutely can be deprecated and marked internal. Then, after a suitable period of time it can be removed (and the code underpinning the new supported interface updated accordingly). No code using the new interface would need to change (as how it is actually coded is very much an implementation detail so if it provides functionality itself or inherits it from a super class is immaterial to consumers).
The problem with the current class is mostly around construction. The actual query interface is okay. Hence my suggestion around subclassing it and having these subclasses provide more reasonable constructors/static factory methods. |
|
Are you suggesting to create a subclass of |
|
Rename |
|
@hobu how would you like to proceed here? I think designing new subclasses is outside the scope of what I have time to work on. Could convert many of the args/kwargs to formal inputs though. |
We can't break the old API. This library is way too old and entrenched and breaking things only for the sake of making them better is going to annoy people and break downstream users who might not even know they're using the library. We can make a new API and replumb the old API to take advantage of new internals, however (anyone want to jump to pybind11 or similar while we're at? 😄). The amount of work to do this is what prevents anyone from starting, unfortunately. |
|
So I think we should get a release out the door and then decide how far we want to go in terms of a new API. The limiting factor for me at the moment is on the libspatialindex side where I feel the MVR tree is not particularly useful in its current form and the TPR tree really is quite a different beast entirely. |
|
What's left to do before we cut a release? |
|
#346 would be nice to have, but I am unsure how to handle the type annotations properly for the new interfaces. |
|
This PR has turned out to be more work than expected to ensure backwards compatibility. TorchGeo is also planning on moving away from R-tree in the future, so I don't want to spend too much time on this. I'll close for now, but if anyone wants to revive this to at least get rid of kwargs, I would be happy to review. |
I'm curious what prompted this. Need for pure Python? |
|
Actually no, need for more features. We have a huge ongoing effort to rewrite TorchGeo with full time series support: torchgeo/torchgeo#2382. We did a big cross-comparison study between
The 2nd bullet point is actually the main feature we need that prompted this search. We want to be able to separately factorize space and time, so that we can guarantee we are sampling from all locations only once and iterating across time. To do that with R-tree, we could have two separate R-trees (one 2D space, one 1D time), but even then we can't guarantee we don't repeat the same location multiple times. We also did a suite of benchmarking tests to compare performance. While R-tree is significantly faster at search (as expected since it only supports bounding boxes), geopandas is significantly faster for intersection/union (because it is implemented natively instead of in Python). I'm planning on doing a big write-up of this at some point. Will share it with you to get feedback and make sure there's nothing I'm missing w.r.t. features or performance. We will likely write a whole paper on the TorchGeo 1.0 time series release. |
I spent longer than I would have liked trying to figure out why:
wasn't working. If we weren't using
*args, **kwargseverywhere, I would have immediately figured this out. This PR starts withIndex, but eventually I plan to move to other classes as well.*args, **kwargsshould be avoided for the following reasons:Basically the only time to use
*args, **kwargsis when you have one function that wraps around another function and you need to pass through all arguments.At the moment, this change is backwards incompatible since I replace
*argswith individual parameters and remove support for additionalPropertypassthroughs. I could undo this change if we want to only remove**kwargsand keep things backwards compatible.