Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a rountrip test for writing/reading a manually constructed virtual DataTree using Icechunk #97

Open
maxrjones opened this issue Feb 24, 2025 · 1 comment
Assignees

Comments

@maxrjones
Copy link
Contributor

maxrjones commented Feb 24, 2025

zarr-developers/VirtualiZarr#244

@maxrjones
Copy link
Contributor Author

This test should be the easiest to adapt for a roundtrip DataTree test - https://github.com/zarr-developers/VirtualiZarr/blob/4906477415e74346703e970fbd81fe7ec0e12e81/virtualizarr/tests/test_integration.py#L167-L220.

We could remove the parameterization for now for simplicity, just using decode_times=False and time_vars=[]. Then, rather than concatenating vds1 and vds2 we could construct a DataTree (dt = DataTree.from_dict({"/vds1": vds1, "/vds2": vds2). We also would only want to use roundtrip_as_in_memory_icechunk as the roundtrip_func.

https://github.com/zarr-developers/VirtualiZarr/blob/4906477415e74346703e970fbd81fe7ec0e12e81/virtualizarr/tests/test_integration.py#L118-L130 should be able to stay the same if we add the following in https://github.com/zarr-developers/VirtualiZarr/blob/main/virtualizarr/accessor.py:

@ register_datatree_accessor("virtualize")
class VirtualiZarrDataTreeAccessor:
    """
    Xarray accessor for writing out virtual datatrees to disk.

    Methods on this object are called via `dt.virtualize.{method}`.
    """

    def __init__(self, dt: DataTree):
        self.dt: DataTree = dt

    def to_icechunk(
        self,
        store: "IcechunkStore",
        *,
        group: str | None = None,
        last_updated_at: datetime | None = None,
    ) -> None:
        """
        Write an xarray DataTree to an Icechunk store.

        Any variables backed by ManifestArray objects will be be written as virtual
        references. Any other variables will be loaded into memory before their binary
        chunk data is written into the store.

        If `last_updated_at` is provided, it will be used as a checksum for any virtual
        chunks written to the store with this operation.  At read time, if any of the
        virtual chunks have been updated since this provided datetime, an error will be
        raised.  This protects against reading outdated virtual chunks that have been
        updated since the last read.  When not provided, no check is performed.  This
        value is stored in Icechunk with seconds precision, so be sure to take that into
        account when providing this value.

        Parameters
        ----------
        store: IcechunkStore
            Store to write dataset into.
        group: str, optional
            Path of the group to write the datatree into (default: the root group).
        last_updated_at: datetime, optional
            Datetime to use as a checksum for any virtual chunks written to the store
            with this operation.  When not provided, no check is performed.

        Raises
        ------
        ValueError
            If the store is read-only.

        Examples
        --------
        To ensure an error is raised if the files containing referenced virtual chunks
        are modified at any time from now on, pass the current time to
        ``last_updated_at``.

        >>> from datetime import datetime
        >>> vds.virtualize.to_icechunk(  # doctest: +SKIP
        ...     icechunkstore,
        ...     last_updated_at=datetime.now(),
        ... )
        """
        from virtualizarr.writers.icechunk import datatree_to_icechunk

        datatree_to_icechunk(
            self.ds,
            store,
            group=group,
            last_updated_at=last_updated_at,
        )

Then the bulk of the new functionality would fall under #91

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants