Open
Description
Working on a modest-size cube, I found that scipy.ndimage.sum
is ~100-300x faster than dask_image.ndmeasure.sum_labels
import numpy as np, scipy.ndimage
blah = np.random.randn(19,512,512)
msk = blah > 3
lab, ct = scipy.ndimage.label(msk)
%timeit scipy.ndimage.sum(msk, labels=lab, index=range(1, ct+1))
# 117 ms ± 2.85 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
vs
rslt = ndmeasure.sum_labels(msk, label_image=lab, index=range(1, ct+1))
rslt
# dask.array<getitem, shape=(6667,), dtype=float64, chunksize=(1,), chunktype=numpy.ndarray>
rslt.compute()
# [########################################] | 100% Completed | 22.9s
Note also that the task creation takes nontrivial time:
%timeit ndmeasure.sum_labels(msk, label_image=lab, index=range(1, ct+1))
# 15.4 s ± 2.02 s per loop (mean ± std. dev. of 7 runs, 1 loop each)
While I understand that there ought to be some cost to running this processing through a graph with dask, this seems excessively slow. Is there a different approach I should be taking, or is this a bug?
Metadata
Metadata
Assignees
Labels
No labels