Skip to content

Conversation

@adamjstewart
Copy link
Collaborator

I spent longer than I would have liked trying to figure out why:

index = Index(properties=Property(dimension=3, interleaved=False))

wasn't working. If we weren't using *args, **kwargs everywhere, I would have immediately figured this out. This PR starts with Index, but eventually I plan to move to other classes as well.

*args, **kwargs should be avoided for the following reasons:

  • Does not report an issue when unknown kwargs are used (my problem)
  • Does not report an issue when misspelled kwargs are used (actually discovered this in the tests)
  • Does not support type hints

Basically the only time to use *args, **kwargs is when you have one function that wraps around another function and you need to pass through all arguments.

At the moment, this change is backwards incompatible since I replace *args with individual parameters and remove support for additional Property passthroughs. I could undo this change if we want to only remove **kwargs and keep things backwards compatible.

def __init__(
self,
filename: str | bytes | None = None,
stream: Any | None = None,
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haven't yet figured out how to type this

arrays = None
basename = None
storage = None
if args:
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No more guessing games!

del r1
self.assertTrue(storage.hasData)

r2 = index.Index(storage, properly=settings, overwrite=False)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actual typo/bug in our tests discovered by removing **kwargs

:param interleaved: True or False, defaults to True.
This parameter determines the coordinate order for all methods that
:param arrays: A tuple of arrays for bulk addition.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wasn't documented

hits = list(self.idx.intersection((0, 0, 60, 60)))

self.assertTrue(len(hits), 10)
self.assertEqual(len(hits), 10)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also a bug

@adamjstewart
Copy link
Collaborator Author

Could use help figuring out the final remaining test failure...

@hobu
Copy link
Member

hobu commented Jan 5, 2025

Could use help figuring out the final remaining test failure...

Options ordering changed? The custom filename test failure must mean an unexpected default is getting used or something? 🤷

@adamjstewart
Copy link
Collaborator Author

That's what I thought too, but that is just a simple test for filename, the first argument. I tried explicitly adding filename= but nothing changed.

@adamjstewart
Copy link
Collaborator Author

First let's discuss whether or not this is something we want and later we can fix things.

@hobu
Copy link
Member

hobu commented Jan 6, 2025

something we want

Custom filenames? Yes, definitely.

The janky kwargs/args configuration of the options? It was the style back in the day. I'm sure it could be remade to be something much better if someone has the time to put into it. I'm certainly not adverse to that, but it probably means making deprecating and then breaking things.

@adamjstewart
Copy link
Collaborator Author

We can convert kwargs to parameters immediately without deprecation. Let me see if I can deprecate args and add parameters, it should actually be relatively easy.

@adamjstewart
Copy link
Collaborator Author

Still need to deprecate (instead of remove) some properties and fix the test.

@FreddieWitherden
Copy link
Contributor

My preference here would be to create a subclass of Index which has a nicer and cleaner interface for construction. It can then map this to the legacy API. This would allow us to retain backwards compatibility and provide a route for code to be gradually updated.

Also, Index is not a great name for a class; from rtree.index import Index also looks a bit weird as an Index could really be anything (database Index, B tree, ...).

@adamjstewart
Copy link
Collaborator Author

@FreddieWitherden that would make it impossible to deprecate the old interface, because any instantiation of the subclass would also instantiate the parent class. The reverse would actually be better, new classes that implement the logic with the old class being a wrapper around those.

However, I still think the current approach is easiest, as users will have the fewest things to change. My only remaining complaint about the current class is that it is highly dynamically typed, making type checking almost impossible. Any attempts to change that would likely result in a completely different library.

@FreddieWitherden
Copy link
Contributor

@FreddieWitherden that would make it impossible to deprecate the old interface, because any instantiation of the subclass would also instantiate the parent class. The reverse would actually be better, new classes that implement the logic with the old class being a wrapper around those.

I'm not entirely sure what you mean? The old interface absolutely can be deprecated and marked internal. Then, after a suitable period of time it can be removed (and the code underpinning the new supported interface updated accordingly). No code using the new interface would need to change (as how it is actually coded is very much an implementation detail so if it provides functionality itself or inherits it from a super class is immaterial to consumers).

However, I still think the current approach is easiest, as users will have the fewest things to change. My only remaining complaint about the current class is that it is highly dynamically typed, making type checking almost impossible. Any attempts to change that would likely result in a completely different library.

The problem with the current class is mostly around construction. The actual query interface is okay. Hence my suggestion around subclassing it and having these subclasses provide more reasonable constructors/static factory methods.

@adamjstewart
Copy link
Collaborator Author

Are you suggesting to create a subclass of Index or an entirely new class that is independent of Index? If the former, we can't raise a warning message about deprecation in Index without also raising it in the subclasses. If the latter, this would be a lot of duplicated code, but I guess it would be fine in principle.

@FreddieWitherden
Copy link
Contributor

Rename Index to _Index. Create a new class Index(_Index) and mark as deprecated. Create a new class Rtree(_Index) (and friends) with a tweaked API that we want to support long term.

@adamjstewart
Copy link
Collaborator Author

@hobu how would you like to proceed here? I think designing new subclasses is outside the scope of what I have time to work on. Could convert many of the args/kwargs to formal inputs though.

@hobu
Copy link
Member

hobu commented Feb 5, 2025

how would you like to proceed here?

We can't break the old API. This library is way too old and entrenched and breaking things only for the sake of making them better is going to annoy people and break downstream users who might not even know they're using the library.

We can make a new API and replumb the old API to take advantage of new internals, however (anyone want to jump to pybind11 or similar while we're at? 😄). The amount of work to do this is what prevents anyone from starting, unfortunately.

@FreddieWitherden
Copy link
Contributor

So I think we should get a release out the door and then decide how far we want to go in terms of a new API.

The limiting factor for me at the moment is on the libspatialindex side where I feel the MVR tree is not particularly useful in its current form and the TPR tree really is quite a different beast entirely.

@hobu
Copy link
Member

hobu commented Feb 6, 2025

What's left to do before we cut a release?

@FreddieWitherden
Copy link
Contributor

#346 would be nice to have, but I am unsure how to handle the type annotations properly for the new interfaces.

@adamjstewart
Copy link
Collaborator Author

This PR has turned out to be more work than expected to ensure backwards compatibility. TorchGeo is also planning on moving away from R-tree in the future, so I don't want to spend too much time on this. I'll close for now, but if anyone wants to revive this to at least get rid of kwargs, I would be happy to review.

@adamjstewart adamjstewart deleted the index/no-kwargs branch April 5, 2025 19:30
@hobu
Copy link
Member

hobu commented Apr 6, 2025

TorchGeo is also planning on moving away from R-tree in the future

I'm curious what prompted this. Need for pure Python?

@adamjstewart
Copy link
Collaborator Author

Actually no, need for more features. We have a huge ongoing effort to rewrite TorchGeo with full time series support: torchgeo/torchgeo#2382. We did a big cross-comparison study between rtree.index.Index, shapely.STRtree, geopandas.GeoDataFrame, and pystac.Catalog. We ended up choosing geopandas, which has the following features that R-tree doesn't (as far as I know):

The 2nd bullet point is actually the main feature we need that prompted this search. We want to be able to separately factorize space and time, so that we can guarantee we are sampling from all locations only once and iterating across time. To do that with R-tree, we could have two separate R-trees (one 2D space, one 1D time), but even then we can't guarantee we don't repeat the same location multiple times.

We also did a suite of benchmarking tests to compare performance. While R-tree is significantly faster at search (as expected since it only supports bounding boxes), geopandas is significantly faster for intersection/union (because it is implemented natively instead of in Python).

I'm planning on doing a big write-up of this at some point. Will share it with you to get feedback and make sure there's nothing I'm missing w.r.t. features or performance. We will likely write a whole paper on the TorchGeo 1.0 time series release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants