Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bulk Load via Iterator #174

Open
bugeats opened this issue Aug 14, 2024 · 3 comments
Open

Bulk Load via Iterator #174

bugeats opened this issue Aug 14, 2024 · 3 comments

Comments

@bugeats
Copy link

bugeats commented Aug 14, 2024

This would be nice purely for ergonomics, whether or not the elements are ultimately collected into a vec internally. Poking around the source however, it does look like the bulk load might be able to avoid such intermediate collection. Either way is a win.

// The current way
let elements = my_iter.collect::<Vec<_>>();
let tree = RTree::bulk_load(elements)

// When std::iter::FromIterator is implemented for RTree
let tree = my_iter.collect::<RTree>();

Bonus easy sub-trees:

let subtree = my_tree.into_iter().filter(x).collect::<RTree<_>>();

I understand the sub-trees can also be created by using drain_with_selection_function, but that requires setting up a SelectionFunction, possibly cloning the original tree, and discarding the drained items. I'm not actually sure which method would be most performant.

@adamreichold
Copy link
Member

it does look like the bulk load might be able to avoid such intermediate collection.

I have not tried recently, but I do not think this is straight forward as we use `Vec::split_off´ internally to divide the workload for recursion.

@Stoeoef
Copy link
Collaborator

Stoeoef commented Aug 15, 2024

it does look like the bulk load might be able to avoid such intermediate collection.

I have not tried recently, but I do not think this is straight forward as we use `Vec::split_off´ internally to divide the workload for recursion.

Agreed, there is not a good way around collecting into a Vec. In addition to split_off, Rstar needs to sort (more specifically, selection sort) the points frequently, that only works efficiently if the input is in a contiguous memory slice (= Vec).

That being said, I think this does make sense from ergonomics side and should be easy to add.
I never considered using collect for rtrees but like the look of it!

@rmanoka
Copy link
Contributor

rmanoka commented Aug 20, 2024

Without some specialization, where we know the Iterator involved in FromIterator is an exact-sized one, I don't see how we could do anything more efficient that simply collecting the iterator into a Vec and then calling bulk_load.

I do agree that it's more ergonomic, but we need a caveat explaining it'll double-allocate if used on a large iterator.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants