Skip to content

Conversation

@sophiamaedler
Copy link

I was trying to initialize a spatialdata object directly from S3 as done in the tests here:

from upath import UPath
import spatialdata as sd
test = UPath( "s3://spatialdata/spatialdata-sandbox/merfish.zarr", endpoint_url="https://s3.embl.de", anon=True )
sd.read_zarr(test)

Was failing with:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[1], line 4
      2 import spatialdata as sd
      3 test = UPath( "s3://spatialdata/spatialdata-sandbox/merfish.zarr", endpoint_url="https://s3.embl.de/", anon=True )
----> 4 sd.read_zarr(test)

File ~/src/spatialdata/_io/io_zarr.py:282, in read_zarr(store, selection, on_bad_files)
    272     attrs = None
    274 sdata = SpatialData(
    275     images=images,
    276     labels=labels,
   (...)    
    281 )
--> 282 sdata.path = _create_upath(_store)
    283 return sdata

File ~/src/spatialdata/_core/spatialdata.py:590, in SpatialData.path(self, value)
    588     self._path = value
    589 else:
--> 590     raise TypeError("Path must be `None`, a `str` or a `Path` object.")
    592 if not self.is_self_contained():
    593     logger.info(
    594         "The SpatialData object is not self-contained "
    595         "(i.e. it contains some elements that are Dask-backed "
    596         "from locations outside {self.path})."
    597     )

TypeError: Path must be `None`, a `str` or a `Path` object.

The implemented changes fix the issues and result in the sdata object being successfully read from S3.

The code now returns:

SpatialData object, with associated Zarr store: s3://spatialdata/spatialdata-sandbox/merfish.zarr
├── Images
│     └── 'rasterized': DataArray[cyx] (1, 522, 575)
├── Points
│     └── 'single_molecule': DataFrame with shape: (<Delayed>, 3) (2D points)
├── Shapes
│     ├── 'anatomical': GeoDataFrame shape: (6, 1) (2D shapes)
│     └── 'cells': GeoDataFrame shape: (2389, 2) (2D shapes)
└── Tables
      └── 'table': AnnData (2389, 268)
with coordinate systems:
    ▸ 'global', with elements:
        rasterized (Images), single_molecule (Points), anatomical (Shapes), cells (Shapes)
with the following Dask-backed elements not being self-contained:
    ▸ rasterized: [path/spatialdata/spatialdata-sandbox/merfish.zarr/images/rasterized]
    ▸ single_molecule: [path/spatialdata/spatialdata-sandbox/merfish.zarr/points/single_molecule/points.parquet/part.0.parquet]

@sophiamaedler
Copy link
Author

actions failing due to changes introduced in previous commit: c514a0b
I can take a look to see if I can figure out the problem.

@melonora
Copy link
Collaborator

There is PR #971 that fixes the remote storage completely; however, between that PR and zarrv3 being merged, zarrv3 got merged first. So #971 would require an update.

@melonora
Copy link
Collaborator

melonora commented Nov 5, 2025

with the current dask unpinning this PR would not work anymore, neither does the other PR. I am implementing some fixes at the moment. Main problem is that now FSSspectstore requires an async file system.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants