-
Notifications
You must be signed in to change notification settings - Fork 68
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bulk loading using space-filling curve algorithm(s) #102
Comments
Some related issues:
|
Don't have a reference for this but I thought |
I'm not sure if much of the existing functionality can be adapted to be honest - the top-down approach seems to be "too different" from the bottom up approach. I don't think that this is a problem though - Hilbert curves are probably easier to implement (?) than the existing implementation anyway. The existing code is somewhat complex as it has to work in arbitrary dimensions. From memory, the current code successively partitions the elements along all axis and then recurses into the final clusters. For example:
This results in 2 * 3 = 6 sub-partitions with known AABB that will form a parent node in the rtree. Then, each sub-partition is broken down recursively until the number of items is small enough to be put into a leaf node. For Hilbert curves, higher dimensions don't seem to be a concern for the rtree - it only matters for calculating the Hilbert score. That's why I can well imagine that this implementation is going to be quite different and simpler. I think the reason why I've decided against using space filling curves is that I've never really understood how to adapt them well to data that is heavily skewed. From my understanding, the Hilbert curve wouldn't be able to assign different Hilbert scores for elements that are very close together, especially in higher dimensions. I didn't look too deeply into that though - maybe this is more a theoretical concern that doesn't happen in practice? |
I think maybe it's worthwhile to split this into two issues? 1) a bulk loading API and 2) implementing OMT? I think bulk loading is very useful for a lot of cases where you don't need the tree to be dynamic. I'm not familiar with the OMT algorithm; from the paper's introduction it seems they claim it's an improvement on STR?
I've never worked with so heavily skewed data that a hilbert-packed r tree was a bad fit. I have noticed that hilbert-packed r tree implementations seem to first compute the total bounds of the data and constrain the scale of the hilbert values to that space. So if you're working with lon/lat data covering the US you don't need to compute hilbert values over |
For the avoidance of confusion: OMT is the current bulk-loading algorithm – it's just that the API is intertwined with the implementation because in the distant past rstar didn't have a bulk-loading API and I asked for one using OMT, which @Stoeoef duly implemented. |
Oh whoops I missed that a bulk loading API already existed 🤦♂️ |
rstar currently supports bulk-loading using the OMT algorithm. However, using a Hilbert curve to order the nodes can result in a tree with
better storage utilisation andless overlap. There is an extant implementation for an R tree at https://github.com/flatgeobuf/flatgeobuf/blob/master/src/rust/src/packed_r_tree.rs#L193 that should be relatively easily portable.A related issue is then the creation of a new bulk-loading API which allows selection of the bulk loading algorithm, depending on the application.
The text was updated successfully, but these errors were encountered: