Single value variable of type int32 in NetCDF becomes float64 in Kerchunk #429

rsignell · 2024-03-02T19:52:20Z

@martindurant, looks like we still have a single-value variable problem.
In these AWS Open Data NetCDF files, the variable 'spherical' has a single int32 value but it becomes a float64 after kerchunk:
https://nbviewer.org/gist/rsignell-usgs/5971951d348496229ce121b52a2fb750

(I discovered this because the xroms package designed to work with these ROMS NetCDF files bombed -- took me a while to figure out this was the reason...)

martindurant · 2024-03-05T18:11:47Z

I am fairly puzzled, the metadata says int:

>>> fs = fsspec.filesystem("reference", fo=single_json, remote_protocol="s3", remote_options=so)
>>> fs.cat("spherical/.zarray")
b'{"chunks":[],"compressor":null,"dtype":"<i4","fill_value":-2147483647,"filters":null,"order":"C","shape":[],"zarr_format":2}'

and zarr agrees:

>>> g = zarr.open(fs.get_mapper())
>>> g.spherical.dtype
dtype('int32')

xarray has a bunch of "decode*" flags in open_dataset, but I can't immediately see one that might do the right thing here.

The value, by the way, is just 1. This is actually a boolean?

keewis · 2024-03-06T14:30:00Z

I believe the reason is the fill_value. At the moment, float* is one of the few data types that can have missing values (using nan), while int* can't represent missing values. mask_and_scale=False should be what you're looking for, and I believe you can convert only the ones you need using:

In [20]: import xarray as xr
    ...: 
    ...: ds = xr.Dataset(
    ...:     {
    ...:         "a": ("x", [0, 1, 2], {"_FillValue": 1}),
    ...:         "b": ("x", [0.1, 0.2, 1.0], {"_FillValue": 1.0}),
    ...:     }
    ...: )
    ...: skipped_variables = [
    ...:     name
    ...:     for name, var in ds.variables.items()
    ...:     if "_FillValue" in var.attrs and var.dtype.kind not in "cfmMO"
    ...: ]
    ...: 
    ...: 
    ...: def decode_with_skip(ds, skip=None):
    ...:     if not skip:
    ...:         return xr.decode_cf(ds)
    ...: 
    ...:     return ds[skip].merge(xr.decode_cf(ds.drop_vars(skip)))
    ...: 
    ...: 
    ...: display(ds)
    ...: display(ds.pipe(decode_with_skip, skip=skipped_variables).compute())
<xarray.Dataset> Size: 48B
Dimensions:  (x: 3)
Dimensions without coordinates: x
Data variables:
    a        (x) int64 24B 0 1 2
    b        (x) float64 24B 0.1 0.2 1.0
<xarray.Dataset> Size: 48B
Dimensions:  (x: 3)
Dimensions without coordinates: x
Data variables:
    a        (x) int64 24B 0 1 2
    b        (x) float64 24B 0.1 0.2 nan

(This might change with the custom dtypes in numpy, but it will take some effort to get working "nullable integer" dtypes)

martindurant · 2024-03-06T14:46:47Z

@keewis : but the data here has an int fill_value and no _Fill_Value. Are you saying that having a fill value of any sort will cause a cast int->float even when there are actually no nulls?

martindurant · 2024-03-06T14:48:27Z

Ah indeed, if I set the fill_value to null in the JSON, you get an int :|

keewis · 2024-03-06T14:52:58Z

zarr's fill_value is translated to the _FillValue attribute. The masking is applied without checking the actual values (which is potentially expensive) using where, and the mask value and the promoted dtypes are decided in xarray.core.dtypes.maybe_promote.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Single value variable of type int32 in NetCDF becomes float64 in Kerchunk #429

Single value variable of type int32 in NetCDF becomes float64 in Kerchunk #429

rsignell commented Mar 2, 2024 •

edited

Loading

martindurant commented Mar 5, 2024

keewis commented Mar 6, 2024 •

edited

Loading

martindurant commented Mar 6, 2024

martindurant commented Mar 6, 2024

keewis commented Mar 6, 2024 •

edited

Loading

Single value variable of type int32 in NetCDF becomes float64 in Kerchunk #429

Single value variable of type int32 in NetCDF becomes float64 in Kerchunk #429

Comments

rsignell commented Mar 2, 2024 • edited Loading

martindurant commented Mar 5, 2024

keewis commented Mar 6, 2024 • edited Loading

martindurant commented Mar 6, 2024

martindurant commented Mar 6, 2024

keewis commented Mar 6, 2024 • edited Loading

rsignell commented Mar 2, 2024 •

edited

Loading

keewis commented Mar 6, 2024 •

edited

Loading

keewis commented Mar 6, 2024 •

edited

Loading