You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi there! I was trying to get MultiZarrToZarr to work on a set of files that have been gzip compressed. Fsspec's fsspec.open function has a compression parameter that can be passed to tell fsspec to decompress the files on the fly. I have gotten this to work using zarr files by doing something like this:
I was hoping that this would also work with MultiZarrToZarr and be equally as simple as passing the target_options, however, it seems that while MultiZarrToZarr opens the files using the target_options here to get the file name:
The fsspec.filesystem call opens the files again, but since the target_options are not passed in, the data is not decompressed and the json decoding fails:
OK, so target_options in your case is something that needs to go to open(), not something that the whole target filesystem has configured. Since the file in question was just opened correctly a couple of lines above, I'm not sure why it's being unbundled like this.
I was looking through the repo history and it seems that in the past the of objects returned by fsspec.open were passed to the filesystem call directly. But then at some point it was changed to only pass the of.full_name attributes instead, causing the filesystem to have to re-open the files.
Hi there! I was trying to get
MultiZarrToZarr
to work on a set of files that have been gzip compressed. Fsspec'sfsspec.open
function has acompression
parameter that can be passed to tellfsspec
to decompress the files on the fly. I have gotten this to work using zarr files by doing something like this:I was hoping that this would also work with
MultiZarrToZarr
and be equally as simple as passing thetarget_options
, however, it seems that whileMultiZarrToZarr
opens the files using thetarget_options
here to get the file name:kerchunk/kerchunk/combine.py
Line 265 in dc66b2c
It then does not pass the target options to the
fsspec.filesystem
call here:kerchunk/kerchunk/combine.py
Lines 277 to 283 in dc66b2c
Is this a bug or is there some reason why the
target_options
can't be passed tofsspec.filesystem
there?Analysis of Compressed files
From what I can tell by running this with a debugger, this is what's happening:
open_files
here succeeds:kerchunk/kerchunk/combine.py
Line 265 in dc66b2c
fs.cat
does not include thetarget_options
so the file is not decompressed when read:kerchunk/kerchunk/combine.py
Line 270 in dc66b2c
fo_list
is set to the original list of filenames.kerchunk/kerchunk/combine.py
Line 274 in dc66b2c
fsspec.filesystem
call opens the files again, but since thetarget_options
are not passed in, the data is not decompressed and the json decoding fails:kerchunk/kerchunk/combine.py
Line 277 in dc66b2c
Solution
I think there are 2 additional places that the target options need to be passed in
In the
fo_list = fs.cat(self.path)
fs.cat
call via the**kwargs
parameter:kerchunk/kerchunk/combine.py
Line 270 in dc66b2c
API Docs forfs.cat
: https://filesystem-spec.readthedocs.io/en/latest/api.html#fsspec.spec.AbstractFileSystemSource code forfs.cat
: https://github.com/fsspec/filesystem_spec/blob/4517882f67d635d50b54cd53fd04ee3a37b6943c/fsspec/spec.py#L844EDIT: After trying this out, it seems that the
s3fs
implementation forcat_file
doesn't work the way the syncronous abstract class does where thekwargs
are passed to a call tofs.open
. The s3fs_cat_file
doesn't support kwargs at all: https://github.com/fsspec/s3fs/blob/f3f63cbfbfe71a4355abd63cafd8c678c4a5a0af/s3fs/core.py#L1113fs.filesystem
call:kerchunk/kerchunk/combine.py
Line 277 in dc66b2c
Workaround
I believe I can work around this by opening the files myself and passing in the zarr dictionaries directly. It's just more code for me to write :)
The text was updated successfully, but these errors were encountered: