Replies: 3 comments 6 replies
-
It's important to distinguish between zarr the format and As for We are trying to make indexing faster in |
Beta Was this translation helpful? Give feedback.
-
I think the tensorstore tutorial is a good place to start. You would probably not want to use the format in that example (n5); instead, you can use tensorstore to read and write zarr v2 and v3 arrays (to write zarr groups you can just use |
Beta Was this translation helpful? Give feedback.
-
Sooo 5 months later and I'm still here 😓 Since you mentioned you are using cloud storages for your zarr datasets I wondered which kind of cloud technology you use? Is it as simple as some S3 buckets or are you using anything that is tuned for latency and throughput? Is there anything you can recommend for the purpose of reading data from a network as fast as possible? |
Beta Was this translation helpful? Give feedback.
-
I'm looking for ways to improve the performance of my dataloading pipeline and I found Zarr. To get an idea about throughput, I started a small benchmark script in python. To get a baseline I also run tests using numpy memory mapped arrays.
I'm working with 4D arrays which are quite large. One of my criterias is that I need to access them as a key-value store. From each value, I access randomly on the first axis.
I created some dummy arrays to test throughput.
Here is my complete benchmarking code that compares Zarr to accessing raw Numpy arrays on disk:
It turns out that accessing Numpy arrays outperforms Zarr by a factor of ~6-7
My maximum disk speed is 500MB/s and I reach roughly 400MB/s using numpy. With Zarr I see a throughput of ~50-60MB/s
This difference is so big that I feel like I must be missing something. I tried different chunk sizes and disabled compression completely. Still, Zarr never reaches a throughput that comes even close to Numpy's memory mapped arrays.
Does anyone have a hint on what I'm missing? Is Zarr generally slow for my usecase of accessing large 4D arrays?
Appreciate any help
Beta Was this translation helpful? Give feedback.
All reactions