Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

POC: Handle std names and aliases (#5257) #5313

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

larsbarring
Copy link
Contributor

🚀 Pull Request

Description

This is a "proof-of-concept" PR to address how to better handle standard name aliases. It consists of the following elements:

  • Reorganising the iris.std_names.py a bit to have a separate dict for aliases (via updated tools/generate_std_names.py).
    It now includes some table version information (mentioned in CF global attribute "Conventions": expose to users, and include CF standard name table version #5255), and a separate dict for the standard name descriptions (optional when generated).
  • Adding a new std_name_table.py containing the following functions:
    • get_convention -- return a tentative Conventions string
    • set_alias_processing -- define how to handle aliases:
      "keep" - current behaviour, treat aliases in the same way as currently valid standard names,
      "warn" - issue a warning (default), otherwise as "keep",
      "replace" - silently update aliases to current standard names.
    • get_description -- return the standard name description if available
    • check_valid_standard_name -- check if a name is a standard name or an alias, and do the translation if requested as defined by set_alias_processing
  • std_name_table is [naively] imported in iris.__init__
  • common/mixin._get_valid_standard_name is modified to use check_valid_std_name

No units test have been added (would be good to first get some feedback whether this POC is a reasonable approach .... )


Consult Iris pull request check list

@codecov
Copy link

codecov bot commented May 11, 2023

Codecov Report

Attention: Patch coverage is 45.00000% with 33 lines in your changes are missing coverage. Please review.

Project coverage is 89.19%. Comparing base (a3931f6) to head (d2216e7).
Report is 300 commits behind head on main.

Files Patch % Lines
lib/iris/std_name_table.py 35.29% 30 Missing and 3 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #5313      +/-   ##
==========================================
- Coverage   89.31%   89.19%   -0.13%     
==========================================
  Files          89       90       +1     
  Lines       22375    22430      +55     
  Branches     5368     5383      +15     
==========================================
+ Hits        19985    20007      +22     
- Misses       1640     1670      +30     
- Partials      750      753       +3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

https://github.com/cf-convention/discuss/issues/229
Standard names: hard to spot "typo" in
`surface_upward_mass_flux_of_methane_due_to_emission_from_fires`
    * new optional argument -d/--descr for including standard name descriptions
    * including the description of all standard names
    * including standard name table version and related info.

    * now contains several variables:
      - variables (starting with underscore) used in the processing
      - VERSION (dict)
      - CONVENTIONS_STRING (str)
      - STD_NAME (dict)
      - ALIASES (dict)
      - optionally DESCRIPTION (dict)

    harmonize quotation marks to single quotes in description string
@CLAassistant
Copy link

CLAassistant commented Apr 8, 2024

CLA assistant check
All committers have signed the CLA.

@pp-mo
Copy link
Member

pp-mo commented May 1, 2024

@SciTools/peloton is this still alive issue, and if so could you sign the CLA @larsbarring ?

@larsbarring
Copy link
Contributor Author

I thought I had already signed the CLA because of one or two previous minor contributions. Anyway, now done. Whether it is alive or not I am not sure. I have not made any further effort since back then as I am not sure whether it is a reasonable approach in the context of Iris.

@pp-mo
Copy link
Member

pp-mo commented May 1, 2024

I thought I had already signed the CLA because of one or two previous minor contributions. Anyway, now done. Whether it is alive or not I am not sure. I have not made any further effort since back then ...

Ok thanks!

I am not sure whether it is a reasonable approach in the context of Iris.

Well, I guess we'll take a look + see about it
@larsbarring are you at least clear that something like this would still be useful ?

@larsbarring
Copy link
Contributor Author

Yes, I think that this would still be useful. In the context of CF an aliased standard name is typically regarded as deprecated. Software should be able to read data having an aliased standard name, but new data should use the replacement name. Obviously, there are judgements to be made here regarding how to deal with this in practice. But and aliased standard name should not be considered just as an alternative at the same level as standard name.

Also, note that there will likely be some minor changes to the standard name xml file format as of next version, possibly also backported to all previous versions.

@schlunma
Copy link
Contributor

schlunma commented May 2, 2024

A proper way to handle standard name aliases would also be useful for ESMValTool, see ESMValGroup/ESMValCore#1985.

One issue we currently face is merging cubes with different standard names, which is (in the current iris version) not even allowed if the standard names are aliases of each other.

@bouweandela
Copy link
Member

@LisaBock this may be of interest to you.

@HGWright
Copy link
Contributor

@SciTools/peloton We are planning to talk about this on the 3rd July.

@pp-mo pp-mo self-assigned this Jul 5, 2024
Copy link
Contributor

@trexfeathers trexfeathers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@SciTools/peloton We are planning to talk about this on the 3rd July.

@pp-mo and @trexfeathers met on 3rd July. This review is the summary of that meeting.

@larsbarring we really appreciate you taking the time to make this suggestion. It's obviously bad that it's taken us this long to give it any attention. If your priorities have shifted and you are unable to action our comments then please let us know and we'll pick it up.

To prompt more attention from the team, we're adding this to the 3.11 milestone.

Comment on lines +112 to +117
'_ACCEPT = "accept"\n'
'_WARN = "warn"\n'
'_REPLACE ="replace"\n'
'_ALTERNATIVE_MODES = [_ACCEPT, _WARN, _REPLACE]\n'
'_DEFAULT = "warn"\n'
'_MODE = _DEFAULT\n\n'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can see what you are trying to achieve here, but we would like to achieve it using Iris' existing structure instead:

FUTURE flag

Replacement is introduced as an opt-in setting. In future - typically the next major release - it will switch to being opt-out.

Always warn if replacing

Since you made the proposal, we have agreed a global approach to Iris warnings: Filtering Warnings — Iris 3.9.0 documentation (scitools-iris.readthedocs.io). The idea is that Iris should warn for EVERYTHING that might be a problem for a user, and the user applies warning filters if they don't wish to see a specific type of warning.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should consider whether the rename() method needs a switch to accept aliases, or some other equivalent granular user control.

@pp-mo I can't recall why using a FUTURE flag would not be enough user control on its own?

version["table_name"] = "USER-StdNameTable"
prettydict(outfile, "VERSION", version)
version_string = "-".join(version[k] for k in ["table_name", "version_number"])
outfile.write(f'CONVENTIONS_STRING = "{version_string}"\n\n')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that this was sorted in #5423, albeit in a different way.

Presumably no longer necessary in this PR?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For consistency with existing Iris code, we believe a new attribute should be introduced: alias_standard_name, used in a similar way to invalid_standard_name.

This gives users the level of information and control that they are used to.

invalid_standard_name example

if standard_name is not None:
try:
cube.standard_name = _get_valid_standard_name(standard_name)
except ValueError:
if cube.long_name is not None:
cube.attributes["invalid_standard_name"] = standard_name
else:
cube.long_name = standard_name

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: No status
Development

Successfully merging this pull request may close these issues.

7 participants