-
-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance issue with sum
#238
Comments
For modest-size cubes, the recommended approach is to use numpy/scipy directly, instead of Dask. This is the first point on the Dask best practices page. Because all the individual tasks are very fast to run, the overhead involved with Dask becomes overwhelming. First, the task graph itself is very time consuming to build. Then, the scheduler can't efficiently distribute the tasks onto the workers well, so you'll see lots of blank white spaces between teeny tiny tasks on the task stream graph in the dask dashboard. I've done some very haphazard profiling for the example you give above, and it seems like possibly some of the slowness might be due to slicing. There is work currently being done with high level graphs in dask that might help this problem, but I couldn't say how much that would impact this specific circumstance. Also, I notice that |
That's pretty reasonable - it is a very big difference! There is a lot of work happening on high level graphs in Dask, to try and reduce how costly it is to build the task graphs (or at least, shift some of that cost over onto the workers instead of the client). I'll bring this use case up as an example of something we could try and tackle after slicing and array overlaps. |
Hi @keflavich, I was running into similar performance issue with the sum_label functions. In addition, my arrays are multi-channel and big... Here's a little workaround using dask+numba.jit, in case this is still of interest for you. Cheers, https://gist.github.com/sommerc/e72b59bee50a8b01efb53f57b95a1c97 (tagging also @m-albert as we discussed this on I2K) |
Working on a modest-size cube, I found that
scipy.ndimage.sum
is ~100-300x faster thandask_image.ndmeasure.sum_labels
vs
Note also that the task creation takes nontrivial time:
While I understand that there ought to be some cost to running this processing through a graph with dask, this seems excessively slow. Is there a different approach I should be taking, or is this a bug?
The text was updated successfully, but these errors were encountered: