Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handling of Standard name aliases #5257

Open
larsbarring opened this issue Apr 18, 2023 · 12 comments
Open

Handling of Standard name aliases #5257

larsbarring opened this issue Apr 18, 2023 · 12 comments

Comments

@larsbarring
Copy link
Contributor

✨ Feature Request

When Iris is installed the file lib/std_names.py is generated from the xml version of the standard name table. In this process all aliased standard names are "promoted" to become on par with the all other standard names.

Wouldn't it be more appropriate to have them replaced by the corresponding new standard names? I.e. that Iris can read/understand the aliased standard names, but translating them to the new ones which then are used in further processing and when writing data.

Motivation

To my understanding the idea behind aliasing a standard name is that the new standard name better/more precisely describes the quantity at hand, perhaps even taking new research results into account.

@larsbarring
Copy link
Contributor Author

C.f. #5255, which mentions the creation of lib/std_names.py during Iris installation.

@bjlittle
Copy link
Member

Hey @larsbarring thanks for raising this.

We need to have a think about the implications of what you're suggesting, so leave it with us and hopefully we'll get back to you asap 👍

@stephenworsley
Copy link
Contributor

Replacing the names outright without user intervention seems like an extreme approach. More prefereable would be a utility which promotes these names and perhaps a warning when such names are loaded to suggest this to a user. Does this seem like a reasonable approach?

@larsbarring
Copy link
Contributor Author

I agree 👍 Either a utility, or perhaps a kwarg to the iris.load functions: something like

iris.load(..., std_name_aliases= {"warn" | "keep" | "replace"} , ...)

@pp-mo
Copy link
Member

pp-mo commented Apr 26, 2023

I'd say that the key place to issue warnings would be on save
We don't have to care that much about 'correct' use of standard-names in data loaded into Iris, since Iris doesn't interpret them for almost any purposes.
But we do try to employ best practice when we write output files.

@larsbarring
Copy link
Contributor Author

Well, iris as such might not use the standard name much, but as users of iris api we might be interested in this and appreciate not having to deal with alias (that might evolve over time) or get an warning that if one is aiiased. But I admit that this is somewhat speculative/forward-looking as I do no have an immediate hands-on use case.

@larsbarring
Copy link
Contributor Author

Just by complete chance, when looking for something else, I came across this use case for handling aliases in a more sophisicated way than what is now the case.

@pp-mo
Copy link
Member

pp-mo commented May 10, 2023

@SciTools/peloton considering what is possible, according to the statement here in the CF FAQ

We may need to be pragmatic, since the suggestion is that older aliases can be ambiguous, in which case an automatic translation is simply not always possible.
But a quick inspection of the table does seem to indicate that it might be practicable, in most/many cases.

Also we have concerns that automatic standard-name translation on load is a bit "dangerous" in breaking user code, since the same code loading the same data would produce cubes with different names on an Iris / standard-name-table upgrade.
( N.B. technically you can upgrade your installation to the latest std-names at any time, though we suspect it's rarely done ! )

Another thought : in keeping with recent decisions, we might prefer to control loading via a content manager than add keywords (but it's only a style thing).

In view of that, and above comments, can you come up with an implementation proposal for enhanced load and save behaviours (or utility) @larsbarring ?

@larsbarring
Copy link
Contributor Author

In fact I have been playing with some ideas during the last couple of days (as always very much limited by pretty basic [pun intended] coding skills):

  • Reorganising the iris.std_names.py a bit to have a separate dict for aliases (done via an updated tools/generate_std_names.py). It now includes some table version information, and a separate dict for the standard name descriptions (optional when generated).
  • Adding a new std_name_table.py containing the following functions:
    get_convention -- return a tentative Conventions string
    set_alias_processing -- define how to handle aliases ("keep" - current behaviour, "warn" - warn and update, "replace" - silently update)
  • get_description -- return the standard name description
  • check_valid_standard_name -- check if a name is a standard name or an alias, and do the translation if requested

std_name_table is naively imported in iris.__init__ and std_name_table.check_valid_std_name is called in common.mixin.py. From a design point of view I think that this as far as I can reach. If you think these ideas, which are rather un-pythonic and un-anything, are worth considering I would need some support for taking it further. E.g. context managers are beyond my level...

@larsbarring
Copy link
Contributor Author

larsbarring commented May 10, 2023

And when again reading your comment @pp-mo, I think that you hit the nail when writing

Also we have concerns that automatic standard-name translation on load is a bit "dangerous" in breaking user code, since the same code loading the same data would produce cubes with different names on an Iris / standard-name-table upgrade. ( N.B. technically you can upgrade your installation to the latest std-names at any time, though we suspect it's rarely done ! )

I totally agree and this was my motivation for asking for standard name version information (#5255). And regarding upgrading the standard name table, I was asking myself whether this could be done more dynamically. I.e. if there was an iris.util.get_new CF_standard_name_table that would basically do what now is done during setup?

@rcomer
Copy link
Member

rcomer commented May 10, 2023

If we had the dictionary of aliases, it would be pretty easy to provide a callback function that renames the cubes. If the dictionary was public then users could use it to create their own callback functions, which would give maximum flexibility.

@larsbarring
Copy link
Contributor Author

OK, I bite the bullet and have just made a POC PR (#5313). The dictionary @rcomer is asking for is available as iris.std_names.ALIASES

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: No status
Development

No branches or pull requests

7 participants