Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Iris won't load netCDF files from the new CDS-beta #6149

Open
leosaffin opened this issue Sep 20, 2024 · 5 comments
Open

Iris won't load netCDF files from the new CDS-beta #6149

leosaffin opened this issue Sep 20, 2024 · 5 comments
Assignees

Comments

@leosaffin
Copy link

🐛 Bug Report

Iris won't load netCDF files from the new CDS-beta. Someone has also reported this on a post about the updates to the CDS API here (https://forum.ecmwf.int/t/changes-to-grib-to-netcdf-converter-on-cds-beta-ads-beta/4322/19). I suspect the issue comes from the changes to their converter described there. However, I think iris not loading the files is an iris bug. If I change the offending line (traceback below) in iris from

total_bytes = cf_var.size * cf_var.dtype.itemsize

to

total_bytes = cf_var.size * np.dtype(cf_var.dtype).itemsize

Then the file loads fine. Simple fix but I thought it's worth reporting in case this might lead to other issues elsewhere.

Running ncdump on the file shows up the difference too. The variables look like this. The string in front of the string variables wouldn't have been there before. I don't know if that is an issue or not.

double latitude(latitude) ;
    latitude:_FillValue = NaN ;
    string latitude:units = "degrees_north" ;
    string latitude:standard_name = "latitude" ;
    string latitude:long_name = "latitude" ;
    string latitude:stored_direction = "decreasing" ;

How To Reproduce

Steps to reproduce the behaviour:

  1. Download a netCDF file from CDS-beta. I just ticked the first box for each value here (https://cds-beta.climate.copernicus.eu/datasets/reanalysis-era5-pressure-levels?tab=download), i.e. Divergence at 1hPa, 1st January 1940
  2. Try to open it with iris

Expected behaviour

The netCDF file should load. It works with xarray and CF python

Additional context

Click to expand this section...

AttributeError Traceback (most recent call last)
Cell In[2], line 1
----> 1 cubes = iris.load("test_era5.nc")

File ~/miniforge3/envs/core/lib/python3.12/site-packages/iris/init.py:326, in load(uris, constraints, callback)
302 def load(uris, constraints=None, callback=None):
303 """Load any number of Cubes for each constraint.
304
305 For a full description of the arguments, please see the module
(...)
324
325 """
--> 326 return _load_collection(uris, constraints, callback).merged().cubes()

File ~/miniforge3/envs/core/lib/python3.12/site-packages/iris/init.py:294, in _load_collection(uris, constraints, callback)
292 try:
293 cubes = _generate_cubes(uris, callback, constraints)
--> 294 result = _CubeFilterCollection.from_cubes(cubes, constraints)
295 except EOFError as e:
296 raise iris.exceptions.TranslationError(
297 "The file appears empty or incomplete: {!r}".format(str(e))
298 )

File ~/miniforge3/envs/core/lib/python3.12/site-packages/iris/cube.py:97, in _CubeFilterCollection.from_cubes(cubes, constraints)
95 pairs = [_CubeFilter(constraint) for constraint in constraints]
96 collection = _CubeFilterCollection(pairs)
---> 97 for cube in cubes:
98 collection.add_cube(cube)
99 return collection

File ~/miniforge3/envs/core/lib/python3.12/site-packages/iris/init.py:275, in _generate_cubes(uris, callback, constraints)
273 if scheme == "file":
274 part_names = [x[1] for x in groups]
--> 275 for cube in iris.io.load_files(part_names, callback, constraints):
276 yield cube
277 elif scheme in ["http", "https"]:

File ~/miniforge3/envs/core/lib/python3.12/site-packages/iris/io/init.py:219, in load_files(filenames, callback, constraints)
217 fnames = handler_map[handling_format_spec]
218 if handling_format_spec.constraint_aware_handler:
--> 219 for cube in handling_format_spec.handler(fnames, callback, constraints):
220 yield cube
221 else:

File ~/miniforge3/envs/core/lib/python3.12/site-packages/iris/fileformats/netcdf/loader.py:641, in load_cubes(file_sources, callback, constraints)
638 if mesh is not None:
639 mesh_coords, mesh_dim = _build_mesh_coords(mesh, cf_var)
--> 641 cube = _load_cube(engine, cf, cf_var, cf.filename)
643 # Attach the mesh (if present) to the cube.
644 for mesh_coord in mesh_coords:

File ~/miniforge3/envs/core/lib/python3.12/site-packages/iris/fileformats/netcdf/loader.py:326, in _load_cube(engine, cf, cf_var, filename)
324 these_settings = CHUNK_CONTROL.var_dim_chunksizes.get(cf_var.cf_name, {})
325 with CHUNK_CONTROL.set(**these_settings):
--> 326 return _load_cube_inner(engine, cf, cf_var, filename)

File ~/miniforge3/envs/core/lib/python3.12/site-packages/iris/fileformats/netcdf/loader.py:355, in _load_cube_inner(engine, cf, cf_var, filename)
350 _assert_case_specific_facts(engine, cf, cf_var.cf_group)
352 # Run the actions engine.
353 # This creates various cube elements and attaches them to the cube.
354 # It also records various other info on the engine, to be processed later.
--> 355 engine.activate()
357 # Having run the rules, now add the "unused" attributes to each cf element.
358 def fix_attributes_all_elements(role_name):

File ~/miniforge3/envs/core/lib/python3.12/site-packages/iris/fileformats/_nc_load_rules/engine.py:95, in Engine.activate(self)
85 def activate(self):
86 """Run all the translation rules to produce a single output cube.
87
88 This implicitly references the output variable for this operation,
(...)
93
94 """
---> 95 run_actions(self)

File ~/miniforge3/envs/core/lib/python3.12/site-packages/iris/fileformats/_nc_load_rules/actions.py:575, in run_actions(engine)
573 auxcoord_facts = engine.fact_list("auxiliary_coordinate")
574 for auxcoord_fact in auxcoord_facts:
--> 575 action_build_auxiliary_coordinate(engine, auxcoord_fact)
577 # Detect + process and special 'ukmo' attributes
578 # Run on every cube : they choose themselves whether to trigger.
579 action_ukmo_stash(engine)

File ~/miniforge3/envs/core/lib/python3.12/site-packages/iris/fileformats/_nc_load_rules/actions.py:92, in action_function..inner(engine, *args, **kwargs)
89 @wraps(func)
90 def inner(engine, *args, **kwargs):
91 # Call the original rules-func
---> 92 rule_name = func(engine, *args, **kwargs)
93 if rule_name is None:
94 # Work out the corresponding rule name, and log it.
95 # Note: an action returns a name string, which identifies it,
96 # but also may vary depending on whether it successfully
97 # triggered, and if so what it matched.
98 rule_name = _default_rulenamesfunc(func.name)

File ~/miniforge3/envs/core/lib/python3.12/site-packages/iris/fileformats/nc_load_rules/actions.py:410, in action_build_auxiliary_coordinate(engine, auxcoord_fact)
407 rule_name += f"
{coord_type}"
409 cf_var = engine.cf_var.cf_group.auxiliary_coordinates[var_name]
--> 410 hh.build_auxiliary_coordinate(engine, cf_var, coord_name=coord_name)
412 return rule_name

File ~/miniforge3/envs/core/lib/python3.12/site-packages/iris/fileformats/_nc_load_rules/helpers.py:1238, in build_auxiliary_coordinate(engine, cf_coord_var, coord_name, coord_system)
1236 points_data = cf_coord_var.cf_label_data(cf_var)
1237 else:
-> 1238 points_data = _get_cf_var_data(cf_coord_var, engine.filename)
1240 # Get any coordinate bounds.
1241 cf_bounds_var, climatological = get_cf_bounds_var(cf_coord_var)

File ~/miniforge3/envs/core/lib/python3.12/site-packages/iris/fileformats/netcdf/loader.py:219, in _get_cf_var_data(cf_var, filename)
217 result = cf_var._data_array
218 else:
--> 219 total_bytes = cf_var.size * cf_var.dtype.itemsize
220 if total_bytes < _LAZYVAR_MIN_BYTES:
221 # Don't make a lazy array, as it will cost more memory AND more time to access.
222 # Instead fetch the data immediately, as a real array, and return that.
223 result = cf_var[:]

AttributeError: type object 'str' has no attribute 'itemsize'

@ukmo-ccbunney
Copy link
Contributor

Hello @leosaffin
Thanks for the bug report.

So, the issue is with the expver variable, which is stored using the "variable length" string type.

If I remove the expver variable from the netCDF file you proposed downloading in the description (using ncks -x -v expver), then Iris can successfully load the data into a cube.

The string types in the attributes are not actually a problem, Iris can decode these fine (although I have admittedly not seen this in an attribute before).

However, I think your proposed solution still stands - I will test.

@ukmo-ccbunney
Copy link
Contributor

I think that whilst the proposed solution:

total_bytes = cf_var.size * np.dtype(cf_var.dtype).itemsize

does allow the code to proceed, it is not working entirely as expected as itemsize for a Unicode dtype (which is what dtype(str) resolves to) is returned as 0, so total_bytes will always be calculated as zero for a variable length string.

@ukmo-ccbunney
Copy link
Contributor

The issue is that the length of a variable length (VLEN) array cannot be determined until the data has been read from disk, which sort of negates the point of the check that is being made in the netcdf loader here:

total_bytes = cf_var.size * cf_var.dtype.itemsize
if total_bytes < _LAZYVAR_MIN_BYTES:
# Don't make a lazy array, as it will cost more memory AND more time to access.
# Instead fetch the data immediately, as a real array, and return that.
result = cf_var[:]

This would presumably be the case for any variable length type, not just strings.

Perhaps in the case of VLEN datatypes, we should err on the side of caution and always load lazily?

@trexfeathers
Copy link
Contributor

Perhaps in the case of VLEN datatypes, we should err on the side of caution and always load lazily?

This should be fine. The code you reference was added in response to #5053 - I expect the Venn overlap between workflows affected by #5053 and workflows handling variable length arrays to be small, if it exists at all. And if we do make a couple of workflows slower it may be a necessary sacrifice! Just needs a sensible What's New entry.

@trexfeathers
Copy link
Contributor

We've had reports that the ncks workaround doesn't work 100% of the time. The team are deployed on other priorities at the moment (Backlog · 🐔Iris v3.11 (github.com)), but here is a slightly more sophisticated workaround in the meantime:

from pathlib import Path

from iris.cube import CubeList
from ncdata.iris import to_iris
from ncdata.netcdf4 import from_nc4
from ncdata.threadlock_sharing import enable_lockshare

enable_lockshare(iris=True)


def load_expver_as_int(file_path: Path | str) -> CubeList:
    """Load a NetCDF file and convert the 'expver' variable to an integer.

    Iris does not yet support this sort of string variable
    ([https://github.com/SciTools/iris/issues/6149](https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FSciTools%2Firis%2Fissues%2F6149&data=05%7C02%7Cml-avd-support%40metoffice.gov.uk%7C8b79183c31844ea437bb08dce8842f1c%7C17f1816120d7474687fd50fe3e3b6619%7C0%7C0%7C638640901632719501%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=NOSnN0eVLfkT6T2lPZD1DqF0DWrnlyVJj6mkkm8ZZL4%3D&reserved=0)), but we know that
    `expver` is a string representing an integer, so convert it to `int` before
    passing to Iris.
    """
    file_path = Path(file_path)
    dataset = from_nc4(file_path)
    if "expver" not in dataset.variables:
        raise KeyError(f"Variable 'expver' not found in {file_path}")

    new_dtype = int
    expver = dataset.variables["expver"]
    expver.data = expver.data.astype(new_dtype)
    expver.dtype = new_dtype

    return to_iris(dataset)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: No status
Development

No branches or pull requests

3 participants