-
Notifications
You must be signed in to change notification settings - Fork 21
Feature/zarr #675
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature/zarr #675
Conversation
Tests are failing because zarr version 3 requires Python version 3.11 onward |
Hi @Oisin-M, thank you very much for this development. I had a quick look at the code and it seems to me that the "zarr" source is not really needed. |
For datasets on the local filesystem, yes. But the source can also access URLs and other locations supported through xarray. The url source doesn't handle zarrs properly at the moment because they are directories and can't be cached easily. We have to look into the dataset first to know which files are on the server. And I'd argue that most of the time we wouldn't want to eagerly download and cache the dataset anyway, because the on-demand, chunk-based access that xarray supports for remote zarrs is really useful. We can create an exception in the url source, but other sources then need that too. The question "is this location a zarr dataset?" is not easy to answer by the zarr reader's detection and will depend on the source. Starting with In some aspects, zarr is an API (see, e.g., kerchunk and virtualizarr, as far as I understand these) and so matches the source paradigm of ek-data as well as the reader. |
@chpolste, thank you for the explanation. I agree, this should be a source, just like e.g. "opendap" . |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for adding this new source! Once it is in sync with the latest develop it can be merged.
@sandorkertesz synced with develop now. Just to flag
If you think it's better, can remove from the |
@Oisin-M unfortunately we cannot afford disabling all the tests due to this dependency. So I suggest you remove the zarr dependency from |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #675 +/- ##
===========================================
- Coverage 90.98% 90.92% -0.06%
===========================================
Files 167 168 +1
Lines 12663 12679 +16
Branches 613 614 +1
===========================================
+ Hits 11521 11529 +8
- Misses 960 967 +7
- Partials 182 183 +1 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Hi @sandorkertesz, thanks for taking a look. I have now made zarr optional and also made the zarr test automatically skip if it can't find the zarr package, so tests are all passing |
@Oisin-M , I have a question about the source name. What was the reason of choosing "xarray_zarr" and not simply "zarr"? E.g. the "opendap" source is also using xarray to open the data but it does not have "xarray" in the name. |
Hi @sandorkertesz, this was to leave room for having a reader based on the zarr API directly |
@Oisin-M, for simplicity, I suggest that the name should be just "zarr". If you want to add a zarr API based reader in the future it could be controlled by adding a key e.g. "reader" to "from_source". E.g. from_source("zarr", ..., reader="xarray")
from_source("zarr", ..., reader="zarr") What do you think? |
Hi @sandorkertesz, I'd be happy to move towards that as long as we return objects with the same methods for both readers |
Well that is the catch. Not sure what |
Implements a simple
zarr
source for reading zarr files (only version 2 at the moment)