-
-
Notifications
You must be signed in to change notification settings - Fork 316
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support async FSMap objects in zarr.open #2774
base: main
Are you sure you want to change the base?
Conversation
I recall there being issues with fsmap before, but I confess I don't really know what an fsmap is -- can someone explain what an fsmap is, how it differs from an fsspec filesystem, and why people would use one over the other? I have a vague feeling that it could be useful to have a |
FSMap was created specifically for the needs of Zarr, and it could have been essentially the same as the v2 FSStore, but was much quicker to get out and working within dask/fsspec. FSMap is a dict-compatible interface (mutable-mapping) on top of a FS instance, which zarr worked with since forever and ignores some FS functionality like the file-like API. To make it work with v3 might be complex, because zarr makes async calls, and the FSMap interface is blocking, even if the underlying FS is async. That means that there will be sync() within sync(), which might still work as zarr maintains its own event loop in a thread separate from fsspec's. |
To be clear: this PR does not use the mapper, but constructs a normal store from the mapper's attributes. I support this path. |
thanks for the questions and answers, Davis and Martin! IIUC there are three cases that we'd need to account for:
|
The instance has all the arguments it was made with as attributes, so you can make a new instance with asynchronous=True from those.
Is there a reason to bother doing this? |
I was thinking based on https://filesystem-spec.readthedocs.io/en/latest/async.html#limitations that LocalStore may be faster since it was designed around providing an async interface. |
IMO it's simpler if the path is always fsmap -> fsspecstore |
I doubt it. The disk is not really async (at least with the standard syscalls python uses), so none of the async code should make any difference for local reads at all. |
I ran into #2808 when trying to wrap sync filesystems. My preference for this PR would be to bring the codecov up to 100% and request that this be considered for review, with the plan that the conversion of sync instances to async and the wrapping of sync filesystems could be handled separately. My reasoning is that some people could use this current implementation and I'm not sure how quickly I'll be able to find time to add the other missing features. Update: I added conversion of sync instances to async |
@martindurant are you able to give this a review today? We'd like to get out a release with this bugfix asap. |
FYI I believe |
This sounds good to me. It's better to raise informative error messages when we can. |
These people are currently broken, so I don't think that giving them a "doesn't work" error is any better. |
@martindurant I get some of these warnings if I only run the tests in Unclosed client session
client_session: <aiohttp.client.ClientSession object at 0x11f2b5d10>
Unclosed client session
client_session: <aiohttp.client.ClientSession object at 0x118259a90>
Unclosed client session
client_session: <aiohttp.client.ClientSession object at 0x107b0ec10> |
Your advice in #2808 (comment) works - in this PR we now raise an informative error if |
I suppose that's OK, but why doesn't the store make directories? Of course, for object stores it doesn't matter... |
FYI I believe this is ready for review, ideally by @martindurant |
There is one failure due to a not-captured warning. It will need to be handled to get the greens. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My comments amount to only nitpicks.
@@ -228,7 +234,7 @@ def __eq__(self, other: object) -> bool: | |||
|
|||
|
|||
async def make_store_path( | |||
store_like: StoreLike | None, | |||
store_like: StoreLike | FSMap | None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't this fail if fsspec isn't present, because FSMap isn't imported? Maybe you need
if _has_fsspec:
from fsspec.mapping import FSMap
else:
FSMap = None
or make fsspec a required dependency.
@@ -311,9 +317,18 @@ async def make_store_path( | |||
# We deliberate only consider dict[str, Buffer] here, and not arbitrary mutable mappings. | |||
# By only allowing dictionaries, which are in-memory, we know that MemoryStore appropriate. | |||
store = await MemoryStore.open(store_dict=store_like, read_only=_read_only) | |||
elif _has_fsspec: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you
elif _has_fsspec: | |
elif _has_fsspec and isinstance(store_like, FSMap): |
and then you fall to the same exception in else
rather than having the Type Error twice
else: | ||
msg = f"Unsupported type for store_like: '{type(store_like).__name__}'" # type: ignore[unreachable] | ||
raise TypeError(msg) | ||
raise (TypeError(f"Unsupported type for store_like: '{type(store_like).__name__}'")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The linter requires the redundant outer parentheses?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apparently this is not covered; switching the condition as I suggested above would make it so, or making a test explicitly with a bad type of store.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll check on the linter issue, as for the bad coverage I think we still need this in case fsspec isn't installed. #2872 should fix the code coverage gap, but it's taking longer than I expected to finish that PR because it uncovered other issues with the test matrix design (I'm working on it but might not finish today)
A quick note - I tried this PR on the example I dropped in #2706 and got the following: # See linked example for MRE setup
>>> z = zarr.group(store=store)
Traceback (most recent call last)
...
AttributeError: module 'fsspec.implementations' has no attribute 'asyn_wrapper' Now, there may very well be a problem with my example rather than this PR - all I can say is that the example does work with zarr v2. |
Thanks for the feedback! Could you please which version of fsspec you're using? |
The fsspec version is envPackage Version ----------------------- --------------------- asttokens 3.0.0 asyncssh 2.20.0 cffi 1.17.1 crc32c 2.7.1 cryptography 44.0.1 decorator 5.2.1 Deprecated 1.2.18 donfig 0.8.1.post1 executing 2.2.0 fsspec 2025.2.0 ipython 9.0.0 ipython_pygments_lexers 1.1.1 jedi 0.19.2 matplotlib-inline 0.1.7 numcodecs 0.15.1 numpy 2.2.3 packaging 24.2 parso 0.8.4 pexpect 4.9.0 pip 25.0 prompt_toolkit 3.0.50 ptyprocess 0.7.0 pure_eval 0.2.3 pycparser 2.22 Pygments 2.19.1 PyYAML 6.0.2 sshfs 2025.2.0 stack-data 0.6.3 traitlets 5.14.3 typing_extensions 4.12.2 wcwidth 0.2.13 wrapt 1.17.2 zarr 3.0.5.dev24+gd4d22560 |
This is rough code, but I made some progress on supporting FSMap types and wanted to open a PR for early feedback. This isn't a priority for me, so I'd welcome anyone to take over this PR and/or close it and work in a different PR.
Addresses #2706
TODO:
docs/user-guide/*.rst
changes/