Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Distributed read-write oriented storage options #434

Open
jmosbacher opened this issue Apr 27, 2021 · 0 comments
Open

Distributed read-write oriented storage options #434

jmosbacher opened this issue Apr 27, 2021 · 0 comments

Comments

@jmosbacher
Copy link
Contributor

jmosbacher commented Apr 27, 2021

Whats the problem?

The most developed storage option is the directory storage but its not designed for distributed access.

Proposed solution

For the backend side there are a few interesting options:

  • Use fsspec to abstract away file-system access.
  • Delegate storage management to a package with a focus on distributed access such as zarr (high level) or partd (low level).
  • Switch to a Mapping interface using a combination of zict interfaces and fsspec mappers to pipe data from and to arbitrary destinations from a consistent api.
  • The backend should ideally support async, thread and process safe options for locking.

For frontend improvement:
I think switching to a distributed index over our data would help a lot. In a distributed index you can have many copies of the index each being modified locally and then you define merge strategies for when they finally push or pull changes, the simplest being that only changes that are not overlapping can be merged. This would be very similar to a git repository where you can have many branches but there is usually one master branch that most people pull from and only authorized people push to.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant