-
Notifications
You must be signed in to change notification settings - Fork 265
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PR adding support for Zarr V3 #3068
base: main
Are you sure you want to change the base?
Conversation
can work with it. I will revise this PR description later. ### Notes: 1. It does not yet run on github actions without errors. 2. It appears to have no memory leaks 3. It has problems with the testing infrastructure that I am working on. 4. CMake does not work for testing v3_nczarr_test because of CMake Policy [CMP002](https://cmake.org/cmake/help/latest/policy/CMP0002.html). 5. Ubuntu + Automake appears to work including testing v3_nczarr_test. 6. make distcheck does not work because of difficulties with VPATH.
Ok, this branch is usable except for MYS2 and Visual Studio. |
Thanks a lot for including the feature from #3066 @DennisHeimbigner and apologies for the delay in the reply!
There're many ways to improve this but here's one quick fix I did on my side that improves the time it takes to get the header information of the big dataset. (Edited for breaking tests)diff --git a/libnczarr/zinfer.c b/libnczarr/zinfer.c
index e11afd6f4..6181b7e32 100644
--- a/libnczarr/zinfer.c
+++ b/libnczarr/zinfer.c
@@ -51,11 +51,11 @@ struct ZarrObjects {
int zarr_version;
int haszmetadata;
} zarrobjects[] = {
-{"zarr.json", ZARRFORMAT3, 0},
-{".zgroup", ZARRFORMAT2, 0},
-{".zarray", ZARRFORMAT2, 0},
-{".zattrs", ZARRFORMAT2, 0},
-{".zmetadata", ZARRFORMAT2, 1},
+{"/zarr.json", ZARRFORMAT3, 0},
+{"/.zgroup", ZARRFORMAT2, 0},
+{"/.zarray", ZARRFORMAT2, 0},
+{"/.zattrs", ZARRFORMAT2, 0},
+{"/.zmetadata", ZARRFORMAT2, 1},
{NULL, 0, 0},
};
@@ -187,7 +187,6 @@ done:
return THROW(stat);
}
-int
NCZ_infer_open_zarr_format(NC_FILE_INFO_T* file)
{
int stat = NC_NOERR;
@@ -198,8 +197,19 @@ NCZ_infer_open_zarr_format(NC_FILE_INFO_T* file)
NC_UNUSED(file);
/* Probe the map for tell-tale objects and dict keys */
-
if(zarrformat == 0) {
+ size_t seglen = 0;
+ struct ZarrObjects *zo = NULL;
+ stat = NC_ENOTZARR; // Until proven otherwise we aren't sure it's a zarr dataset
+ /* We search on the root path for V2 or V3 tags */
+ for (zo = zarrobjects; zo->name; zo++) {
+ if ((stat = nczmap_exists(zfile->map,zo->name)) == NC_NOERR) {
+ zarrformat = zo->zarr_version;
+ break; /* No need to look for more keys */
+ }
+ }
+ }
+ if(zarrformat == 0 || stat != NC_NOERR) {
/* We need to search subtree for a V2 or V3 tag */
switch(stat = nczmap_walk(zfile->map,"/",tagsearch, ¶m)) {
case NC_ENOOBJECT:
@@ -308,7 +318,7 @@ tagsearch(NCZMAP* map, const char* prefix, const char* key, void* param)
if(seglen == 0) return NC_NOERR;
for(zo=zarrobjects;zo->name;zo++) {
- if(strcasecmp(segment,zo->name)==0) {
+ if(strcasecmp(segment,zo->name+1)==0) {
formats->zarrformat = zo->zarr_version;
return NC_NOERR; /* tell walker to stop */
} |
The first suggestion was breaking some tests so I added a commit to tackle this in a better way: mannreis@806c1a7 (s3consolidate-patch.txt) |
H/T Manual Reis Also turn off some debugging output.
Added a fix for issue (#3075) |
Yes. I think a separate PR for that will help with closing this one. |
re: Unidata#3068 (comment) At Manuel Reis request, This commit disables the current Zarr-Over-HTTP (ZOH) support for URLS.
I see the out of space problem is still there. |
I had hoped to include this, it feels like we're almost there, but I'm going to get v4.9.3 out and then come back to this for the subsequent v4.10.0 release. |
I agree. This PR has been nothing but a disaster and I take full responsibility for it. |
re: PR Unidata#3068 Update the ncjson and ncproplist code to lastest.
re: PR Unidata#3068 1. Update the ncjson and ncproplist code to lastest. 2. Fix some references to older API functions. 3. Update the netcdf_json and netcdf_proplist builds.
re: PR Unidata#3068 1. Update the ncjson and ncproplist code to lastest. 2. Fix some references to older API functions. 3. Update the netcdf_json and netcdf_proplist builds.
@DennisHeimbigner Refreshing my memory on this, the original issue appears to have been fixed by some of the changes I made. Now the failures are somehow related to the string comparison output where character counts differ. See 6875ae3 and surrounding commits I made to see what I changed. |
I am still seeing the "out of space error".
|
re: PR Unidata#3068 1. Fix a namespace problem in tinyxml2.cpp. Note that this is a visual studio problem hence use of _MSC_VER.
I did some fixes in this branch to be able to build it (command below). Also for building with cmake ../ -DNETCDF_ENABLE_NCZARR=ON -DNETCDF_ENABLE_NCZARR_S3=ON -DNETCDF_ENABLE_S3_AWS=ON -DNETCDF_ENABLE_NCZARR=ON -D{NETCDF_,}ENABLE_PLUGIN_INSTALL=YES -DNETCDF_ENABLE_LOGGING=ON -DNETCDF_ENABLE_NETCDF_4=ON -DNETCDF_ENABLE_ZOH=ON |
re: PR Unidata#3068 1. Fix a namespace problem in tinyxml2.cpp. Note that this is a visual studio problem hence use of _MSC_VER.
re: PR Unidata#3068 Part of splitting PR 3068. General goal is to fix minor bugs and issues involving S3 code outside of libnczarr. 1. libsrc/s3io.c: - Fix handling of error output of NC_s3sdkinfo. - Implement delete arg to s3io_close 2. Add support for Zarr-Over-HTTP (ZOH) protocol, but leave disabled until given the go-ahead from Manuel Reis. 3. Change the ncs3sdk API to light of eventual addition of Zarr V3 support. 4. Modify s3cleanup.in to catch and remove some previously overlooked entries in the Unidata S3 test bucket. 5. Modify s3gc.in to catch and properly remove Unidata S3 test bucket entries with old UIDs. 6. Modify ds3util.c: - Support use of a non-standard port -- H/T Manuel Reis - Clean up some memory leaks - Add disabled ZOH support - Better doc on how URLs are interpreted. - Document how .aws/config and .aws/credentials files are parsed. 7. Repair bugs in the internal S3 reader/writer module. - Allow API functions to return an http code (see also nch5s3comms.h). - Fix some casting warnings. 8. Fixes to ncs3sdk_h5.c, the dispatch wrapper for the internal S3 reader/writer - Move httptonc to this file. - Make conform to API modifications - Refactor the key search code. - Rename getkeys to list and searchkeys to listall. 9. Fixes to nczarr_test/s3util.c - Reflect ds3util.c API changes: e.g search->listall. 10. Add s3util.c to v3_nczarr_directory 11. Extend unit_test/test_s3dk.c to reflect S3 API changes. 12. Extend unit_test/run_s3sdk.sh to add new tests involving S3 API changes.
re: PR Unidata#3068 Part of splitting PR 3068 1. Libdap2+Libdap4: Update to change "_FillValue" to NC_FillValue constant. 2. Libdap4: Update the algorithm for sizing the packet buffer. 3. Libdap4: Dynamically sort the ATOMICTYPEINFO table so it can be used with binary search in lookupAtomicType function 4. Mark some unused function arguments. 5. Subsume the Hyrax changes in PR Unidata#3089.
Update: 1/6/2025: Ok, this branch is usable except for MYS2 and Visual Studio.
They are failing on some kind of "out of space" error whose
meaning I cannot figure out
=============================================
Update (12/29/24): I am going ahead and merging the code from PR #3066 since that code is pretty easy to follow and I have the best knowledge of what kind of tailoring is needed to adapt to the numerous changes.
=============================================
I am posting this PR as a draft so that Manuel Reis and others can work with it. I will revise this PR description later.
Notes:
make distcheck does not work because of difficulties with VPATH.It works with distcheck now.