Parallel KD-Tree construction #165

B1ueber2y · 2022-02-11T23:39:43Z

Hey thanks for the great project! I am wondering if there is any easy way (or forked project) to enable parallelism for KD-Tree construction with nanoflann? Or do you have plans to integrate this feature in the future?

AlphaHot · 2022-04-23T14:04:10Z

What is the MatrixType & coeff?

Mintpasokon · 2023-03-29T07:15:45Z

Hey, I forked the project and made a commit on building nanoflann::KDTreeBaseClass concurrently, which works by adding an extra parameter in KDTreeSingleIndexAdaptorParams (by default it is 1). In my tests, concurrent builds with 8-16 threads can improve building performance by up to around 3x.
See docs about the parameter:
https://github.com/Mintpasokon/nanoflann-concurrent-build/tree/concurrent_build#23-kdtreesingleindexadaptorparamsn_thread_build
I was wondering if I could propose a PR for the project. Please let me know if you have any suggestions or expectations.@jlblancoc

jlblancoc · 2023-03-30T21:49:34Z

I totally love your changes and how you approached it 👍

I checked it with valgrind and gcc sanitizer and both are happy, so I am. Well, actually valgrind --tool=helgrind tests/unit_tests complains about "potential" data races, but I think they are false positives due to the use of atomics. Take a look at it (build with "-g" for debug symbols) if you want first.

Then, yes, feel free of opening a PR, after adding the corresponding line (and your name at the end?) to the CHANGELOG.

Thanks!

Mintpasokon · 2023-03-31T09:37:31Z

Thank you for your reply! I have submitted a PR with some explanation about helgrind. If you have any suggestions or concerns, please let me know. Thanks again!

jlblancoc · 2023-06-15T23:00:38Z

Closing, this feature was already merged and is released in the latest v1.5.0

dokempf · 2023-06-22T07:44:49Z

@Mintpasokon Can you share a bit of details on the dataset/parameters you used to achieve a speedup of 3? I was trying for my application and the result were far off that (incl. being slower at very large thread counts on an HPC node). I would love to understand whether there is room for improvement - sequential kdtree construction has become a bottleneck in our the application...

Mintpasokon · 2023-06-22T14:45:23Z

@dokempf I tested with Ryzen 7 5800X 8c/16t, DDR4 3200, with 1~20 millon 2D points to get 3x speedup. As you are using large amount of threads, I guess there might be some problem with memory access locality, or false sharing, in this line, since the datas in vAcc are random, in worst case, it would require much larger CPU cache for your dataset than actually accessed.
It will probably works by

limit thread count, maybe to 8 or 16.
change behavior of nanoflann, maybe make a copy of origin dataset and actually move datas in dataset rather than indexs in vAcc when building kd tree(in planeSplit()), which may help with locality.
I have no access to my PC now, and I will try the second way when avaliable. Good luck!

Mintpasokon · 2023-06-25T10:47:27Z

@dokempf I made some perf tests around number of points and thread count in kd tree building. It looks like this:

It seems that concurrent builds have a performance advantage when there are at least 50,000 points in the dataset. And there is a linear relationship between the ratio of speedup and the logarithm of points. What is the scale of your problem? It may be related.

jlblancoc · 2023-06-25T22:30:30Z

Thanks for the benchmarking, @Mintpasokon !!
If you feel like that, it would be great if you wanted to do a Pull Request to contribute your benchmark to the nanoflann benchmark repo... just in case we have time someday to put everything there in order and get a decent report :-) In that case, feel free of adding your name and/or affiliation to your files, of course.

dokempf · 2023-06-26T08:41:02Z

The scale of my problems is much larger. I tested datasets of 100k-400M 3D double precision points. The smallest of these do still fit in L3 - still I only see speed-ups of up to 1.3 - if any.

jlblancoc · 2023-06-26T10:08:45Z

@dokempf Probably worth investigating an alternative implementation for such larger datasets with T threads:

Split the input dataset in T clusters (to exploit locality of data, just split into T consecutive blocks)
Run the "classic" single-thread method for each cluster.
Merge the resulting tree nodes.

Overall, for n points, and T threads, the expected average time complexity (if I'm right... just a quick draft):

original: O(n log n) 
parallel: O( n/T · log(n/T)  + T · n/T · log(n/T) ) = O(T · n/T · log(n/T)) = O( n log(n/T) )

Of course, this is just the expected average, and the constants (obviated above) are key in real implementations, but in theory we could obtain some gain.

PS: I would love to, but I don't have bandwidth at present to try to implement and benchmark it, tough... :-(

jlblancoc · 2023-06-26T10:10:29Z

PS: @dokempf if your data is not too sparse as to memory to become a game stopper, you could try a grid-based representation instead, e.g. here I have an implementation

Mintpasokon mentioned this issue Mar 31, 2023

Add concurrent tree build support #198

Merged

jlblancoc closed this as completed Jun 15, 2023

jlblancoc reopened this Jun 26, 2023

Mintpasokon mentioned this issue Jun 27, 2023

Add concurrent kd-tree build perf test MRPT/nanoflann-benchmark#1

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parallel KD-Tree construction #165

Parallel KD-Tree construction #165

B1ueber2y commented Feb 11, 2022

AlphaHot commented Apr 23, 2022

Mintpasokon commented Mar 29, 2023

jlblancoc commented Mar 30, 2023

Mintpasokon commented Mar 31, 2023 •

edited

jlblancoc commented Jun 15, 2023

dokempf commented Jun 22, 2023

Mintpasokon commented Jun 22, 2023

Mintpasokon commented Jun 25, 2023

jlblancoc commented Jun 25, 2023

dokempf commented Jun 26, 2023

jlblancoc commented Jun 26, 2023

jlblancoc commented Jun 26, 2023

Parallel KD-Tree construction #165

Parallel KD-Tree construction #165

Comments

B1ueber2y commented Feb 11, 2022

AlphaHot commented Apr 23, 2022

Mintpasokon commented Mar 29, 2023

jlblancoc commented Mar 30, 2023

Mintpasokon commented Mar 31, 2023 • edited

jlblancoc commented Jun 15, 2023

dokempf commented Jun 22, 2023

Mintpasokon commented Jun 22, 2023

Mintpasokon commented Jun 25, 2023

jlblancoc commented Jun 25, 2023

dokempf commented Jun 26, 2023

jlblancoc commented Jun 26, 2023

jlblancoc commented Jun 26, 2023

Mintpasokon commented Mar 31, 2023 •

edited