-
Notifications
You must be signed in to change notification settings - Fork 68
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add all-nearest-neighbors (a1NN) iterator for tree-to-tree lookup #41
base: master
Are you sure you want to change the base?
Conversation
Add the ability to efficiently find the nearest neighbor in a target tree for each point in a query tree. This works by traversing the query tree depth-first and at each query tree node pruning nodes from the set of candidate subtrees of the target tree that can not potentially hold the nearest neighbors for any point in the query tree node. This results in speedups on the order of 1.3x versus individual lookup of the query points, and this speedup increases with the size and dimensionality of the trees.
Dear Andrew, I realised I dropped the ball on responding to an earlier query on this. Your implementation looks neat and is along the lines of what I was thinking (that reference you linked is one of the all knn refs I had looked at). However I have to say that I am rather disappointed about the final speed-up. I had honestly expected something in the range of an order of magnitude given consistent locality for many points. Is it possible in the end that the density of points in neuron data is just a bit low? Do you know where the algorithm is spending time right now? Is it very different from the naive implementation of scanning all points with regular knn? |
Hi Greg, I think you may have meant to respond at clbarnes/nblast-rs#20, so I'll respond there. |
Reverting this to draft status as there are performance pitfalls for 3d data that will need to be looked into. |
Thank you very much for this contribution! I'm still trying to understand how all of this works. Thanks a lot for the detailed description and comments, those make this much easier. |
There are a few points to consider wrt #40 and precision issues. Most important is that the size of the discrepancy caused by rounding errors, etc., can be a much smaller magnitude than the discrepancy of the distance of the returned neighbor. A discrepancy near the epsilon level in the wrong direction will cause I'm running into this right now after making several fixes and refactors to this PR, in that the all nearest neighbors iterator and the single nearest neighbor function return different results rarely. This isn't because one is wrong, but because both are: sometimes one yields the correct neighbor, sometimes the other. This only shows up consistently when exercised with large (> 100K node) 3D trees, but of course may be happening undetected with one or both methods more frequently. As a real example, here's a node being ignored by the existing single nearest neighbor lookup via
which results in this being ignored and thus a much larger difference (4%!) in the returned neighbor vs the correct one given by
While One solution could be adding an |
Add the ability to efficiently find the nearest neighbor in a target
tree for each point in a query tree. This works by traversing the query
tree depth-first and at each query tree node pruning nodes from the set
of candidate subtrees of the target tree that can not potentially hold
the nearest neighbors for any point in the query tree node.
This results in speedups on the order of 1.3x versus individual lookup
of the query points, and this speedup increases with the size and
dimensionality of the trees.
This all-nearest-neighbors or "a1NN" lookup is common in point cloud comparison algorithms, and is our primary use of
rstar
.The goods
Note that this difference in performance grows as the trees get larger.
Details of the algorithm
The algorithm works by traversing the query tree in DFS and keeping a stack of pruned subtrees of the the target tree for each depth of that traversal. These subtrees cover the potential nearest neighbors for any point in the query tree node at that depth. This allows points in the query tree to effectively reuse computation by only needing to search for their nearest neighbor in these subtrees.
The pruning works similarly to the existing pruning for single point lookups with
min_max_dist_2
, only instead of doing a point-to-envelope tight upper bound, we need to find an envelope-to-envelop tight upper bound. This is implemented inEnvelope::max_min_max_dist_2
forAABB
. Conceptually it is the maximum overmin_max_dist_2
for any point in the envelope, and is naively implemented that way by iterating over the extrema (corners) of the envelope.Here's an illustrative diagram:
Here
p
isA.max_min_max_dist_2(&B)
andq
isB.max_min_max_dist_2(&A)
.m
is the maximum distance between the envelope, so that the diagram can prime your intuition that this distance can do better than that naive bound.So for each query tree node we consider, we look at the candidate target subtrees of the parent (or children of those subtrees), and find the minimum
max_min_max_dist_2
of that set of subtrees. This means that for any point in the query node, there must be a nearest neighbor within that distance in some subtree. Thus we can prune any subtree whose minimum distance to the query node is greater than that distance.For more details see the implementation. I tried to provide plenty of comments.
Notes
max_min_max_dist_2
like the one formin_max_dist_2
in Optimizations to nearest neighbor #35, but I haven't thought of it and am somewhat skeptical because of the combinatorial aspect.