Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing assets are deserialized as None #77

Open
gadomski opened this issue Aug 27, 2024 · 2 comments
Open

Missing assets are deserialized as None #77

gadomski opened this issue Aug 27, 2024 · 2 comments

Comments

@gadomski
Copy link
Member

If many items are converted to a table, and then converted back to dictionaries, any missing assets are converted to None (which is invalid STAC):

import pystac
import stac_geoparquet
from pystac import Item

item: Item = pystac.read_file(
    "https://raw.githubusercontent.com/radiantearth/stac-spec/v1.0.0/examples/simple-item.json"
)
reduced_item = item.full_copy()
del reduced_item.assets["thumbnail"]

table = stac_geoparquet.arrow.parse_stac_items_to_arrow([item, reduced_item])
items = list(stac_geoparquet.arrow.stac_table_to_items(table))
assert items[1]["assets"][
    "thumbnail"
], f"the thumbnail asset is {items[1]['assets']['thumbnail']}"

Output:

Traceback (most recent call last):
  File "check.py", line 13, in <module>
    assert items[1]["assets"][
           ^^^^^^^^^^^^^^^^^^^
AssertionError: the thumbnail asset is None
gadomski added a commit to stac-utils/stac-rs that referenced this issue Aug 27, 2024
Here's an issue that describes what's going on:
stac-utils/stac-geoparquet#77
gadomski added a commit to stac-utils/stac-rs that referenced this issue Aug 27, 2024
Here's an issue that describes what's going on:
stac-utils/stac-geoparquet#77
@kylebarron
Copy link
Collaborator

JSON is more descriptive than Arrow around null and undefined. Because Arrow is columnar, we essentially only preserve null and not undefined (because the column is defined).

I believe pyarrow serializes all Arrow null values as None by default. But I agree we shouldn't be able to construct invalid STAC items, so perhaps we should manually remove None from asset values? Anywhere else that None is invalid? Or should we be coercing None to undefined everywhere?

@gadomski
Copy link
Member Author

perhaps we should manually remove None from asset values?

Yup, that's what I did here. I ran across the issue when doing some test translations of 1000 sentinel-2 items from the PC — some of the items had a missing preview asset, which is a sort-of-common thing to happen in real-world systems in my experience.

Anywhere else that None is invalid?

I think in most cases it's ok, and that assets is a bit of a special case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants