-
Notifications
You must be signed in to change notification settings - Fork 68
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make node constructors public, add RTree::new_from_root #109
base: master
Are you sure you want to change the base?
Make node constructors public, add RTree::new_from_root #109
Conversation
While there is no |
@adamreichold Thanks, completely forgot about that. I've added some warnings to the documentation of |
I'd like to merge this and then cut a new release this week. Are people happy with the state of this PR? |
I am happy with the state of the changes but I am still undecided whether we should merge this at all. We did never discuss alternatives, for example driving custom |
In that case I'm happy to continue to let it bake for now. |
Sorry for the late reply. I don't really understand how one would provide a custom implementation for |
I'm also for merging this as is; it helps a reasonable use-case mentioned above. @nickguletskii could you provide us more details on the compact representation that required custom serialization? |
My comment was alluding to the serializer, not the |
Sure! I'm storing leafs separately from the tree inside an arena, and serializing only the topology of the tree (naively or using a something like a Prüfer code), without the AABBs. During deserialization, I reconstruct the AABBs on-the-fly. This significantly reduces the size of the tree when transferred over the network, and reduces the CPU time and allocations necessary to load it in comparison to performing |
This sounds like something we'd want to integrate into the generic Serde implementation. |
I think it should be possible to implement a
I'll try to find some time to contribute and benchmark a custom implementation of However, I would still appreciate |
cd05425
to
14d859d
Compare
I can see merit to this argument and I also did not write that we must go for a custom deserializer, just that we should discuss alternatives before committing to an API which basically makes no guarantees and needs two warning notes. Consider the description of your approach in #109 (comment), do you think it at all possible to implement a safe (in the sense domain correctness) API which takes the topology (in some suitable representation), recomputes the AABB of parent nodes and the fly and necessarily produce a valid r* tree or fails? (As an example of the direction I am thinking about, you could probably pass a root node where all envelopes in all parent nodes are empty and recompute those envelopes in the constructor which would at least ensure that the envelopes stored in the tree are consistent without significant overhead compared to computing them upfront and the constructor relying on them being correct. Of course, I am not sure if it possible to validate topology in similar manner with reasonable efficiency.) |
The current API proposed in this PR already doesn't allow the user to provide bogus envelopes for parent nodes, since The tricky part is checking the RTree invariants. We could always call something like the sanity check used in the tests (sans the envelope checks), but that would be an extra tree traversal. Alternatively, we could make a node builder struct that would keep track of node subtree depth and verify the child counts. This is how I would probably do this if I wanted to allow the user to build their own RTrees manually. That, however, was not the intended purpose of the proposed APIs - I just wanted a way to load an R* tree generated by rstar from my own custom format with minimal overhead, akin to what is possible through serde, which is why the warnings mention that the tree must originate from the current version of rstar. If you want, I can probably find time to make a prototype for the node builder, but I think that adding it would imply that arbitrary trees satisfying the R* tree invariants are supported by this library. |
I understand that, but we have to consider what happens when such an API is made available, e.g. what kind of bug reports we will be getting by opening this door. There are costs to you-are-holding-it-wrong API and hence I have a strong desire for safer alternatives. Especially since thinking back when this was an struct MyRTreeLookalike {
root: rstar::ParentNode<MyObject>,
size: usize,
}
let tree = unsafe { std::mem::transmute::<rstar::RTree<MyObject, rstar::DefaultParams>>(MyRTreeLookalike { root, size }) }; (assuming that Of course, the ergonomics of this are horrible as you now have to ensure that Of course, at the end of day, I will probably just throw in the towel and stop arguing against merging this, because as you point out even if we do add a safe API to build up arbitrary trees, then
which means that we still have to be very very careful if we ever write any unsafe code in this crate to rely on nothing but the invariants that we can actually check through these API which might still be too high a cost when taking into account that someone has to build them. (I think it would be unfair to ask that of you because this was not what you were trying to achieve.) |
Stepping back, it looks like the std data-structs like btreemap are indeed being serialized as a sequence, and not the tree itself. Ideally we should switch to the packed serialization suggested in this PR. This would need:
@nickguletskii Do you think you can contribute this / something similar? Would be quite useful for many others I believe. Would also circumevent the issues pointed above with wanting to make internal functions pub. On the pub issue, how about an |
I think what you propose would certainly be nice as an efficency improvement to our internal serialization support, but I think it goes beyond the linked implementations as those just build up the tree structures upon deserialization using the public API which has considerable overhead. I think the equivalent would be using Finally, the API taking the vector of leaves and split indices basically has the same issue as the one proposed here if made public. It is not clear if the resulting tree is a valid R* tree and therefore which invariants other code in this crate can assume when operating on this tree.
I'll just withdraw my objections. As long as we have no |
rstar/CHANGELOG.md
if knowledge of this change could be valuable to users.This PR makes some internal APIs public to allow users to implement custom tree serialization and deserialization. I am currently using this functionality to save and load large rstar RTrees to a custom compact representation in order to deliver them over the network.