Type Annotations #4272

jamesp · 2021-08-05T09:15:17Z

jamesp
Aug 5, 2021
Maintainer

Should we add static types to iris?

Python has had support for "annotations" since 3.0, but it has taken a number of years for the use of the annotations to shake out. PEP 484 and PEP 560 standardised the use of annotations for static typing. See this towards data science article for a good high-level overview of the state of static typing in Python up to 3.10.

Now that the community has agreed(ish!) on a standard form of type annotation, and the tooling has been developed to make use of it, it may be the right time to start adding type information to iris.

There are two possible approaches to adding type information:

Adding type signatures directly to the code. See the python typing documentation for good examples of what this would look like.
Use *.pyi stub files that only contain type definitions of functions. These are a bit like header files in other compiled languages.

Both have pros and cons. \1 requires changes to all files across the codebase. \2 means keeping additional files in sync with the codebase.

Using the monkeytype and pyannotate libraries recommended by mypy I have made a first pass at creating stub files for the public API of iris, to give a sense of how it could be useful. These libraries infer as much as they can from the existing untyped code e.g. if a class is created and then returned, mypy is clever enough to work out the signature of that function and monkeytype will create the appropriate definition stub.

iris-stubs can be pip-installed in an environment alongside your iris code, either a release version or development version. Tools such as mypy, which is built into the Python extensions of common editors such as vscode and pycharm, will know how to use these stubs to provide richer intelligence in the editor.

I find this especially useful for two main reasons when editing code:

When chaining together function calls e.g. a.strip().split()..etc as it understands the return type at each stage, your editor can provide better autocomplete for the next possible calls.
Warning you when your function call is incomplete. e.g. imagine this pointless function:

def get_attr(obj: dict[str,str], name: str) -> str:
    if name in obj:
        return obj[name]

your editor will warn you that this is a possible error. As the function could also return None, but your definition as written here guarantees the function will return a string.

Take a look at the stubs - they are very likely to be incomplete and possibly wrong so please edit if you find a problem. If anyone wants a tutorial in reading and writing stub files let me know.

trexfeathers · 2021-08-05T14:54:03Z

trexfeathers
Aug 5, 2021
Maintainer

I'm strongly for type annotations for the reasons you have provided, and also because they make it easier to understand what unfamiliar code is doing.

For that same reason, I'm strongly for annotation via type signatures. Code is at its best when it is self describing - you look at it and you can see what it does. Type annotations within the code itself provide a great opportunity to improve this.

I'm very uncomfortable with annotations via stubs; there is already a lot of complexity around Iris to make the development experience smoother - CI, linting, setup scripts... I would much rather provide type annotations via the existing code than add a new element of complexity in the form of stubs.

4 replies

jamesp Aug 5, 2021
Maintainer Author

I agree that in the source code would be my preference too.

It's worth also noting that as types are optional in Python, this does not need to be a big-bang approach, nor complete. We can add type annotations one line at a time and everything will still work.

Similarly it's not necessary to type every function and method. We could just provide annotations to public facing API calls, for example

jamesp Aug 5, 2021
Maintainer Author

Stubs don't have to be in a separate repository - they can be placed alongside the code module. e.g. cube.pyi next to cube.py

trexfeathers Aug 5, 2021
Maintainer

Thanks for the clarification. It's still a 'thing' if you get my meaning!

jamesp Aug 5, 2021
Maintainer Author

Yep, I do and I agree :)

jonseddon · 2021-08-06T17:29:17Z

jonseddon
Aug 6, 2021
Collaborator

From my playing with them, I do think that it would be useful to slowly add annotations into the source code. They can be incrementally added as code is touched and there's no need to add all of them in immediately. I agree that stubs would be extra complexity to maintain and they're not immediately obvious to users looking at the code because they are in a separate file.

The only minor downside is that code starts to look more complex to beginners, for example:

    def assertNetcdfEqual(
        self,
        actual_path: str,
        expected_global: str,
        globals_ignore: Optional[List[str]] = ["directory"],
    ) -> None:

Could look scarier and more confusing to a beginner than the old style without annotations:

    def assertNetcdfEqual(self, actual_path, expected_global, globals_ignore=["directory"]):

Although I guess that it can also be argued that the extra information conveyed should help beginners. However the benefits of using them outweigh the negative of this complexity, but it is worth bearing the additional complexity for beginners in mind to see if it can be minimised.

3 replies

jamesp Aug 9, 2021
Maintainer Author

Great points @jonseddon. There is a risk of alienating new python programmers by including type annotations.

Also the current sphinx documentation system doesn't have particularly graceful support for them, regurgitating the type annotations in the API documentation makes it very hard to read (see here, for example.

Note that xarray have type annotations in the source code, but do not have the type annotations shown in the docs. If we begin to add type hints we should consider how we want them to be handled in the API docs.

tkknight Aug 9, 2021
Maintainer

Good example: http://xarray.pydata.org/en/stable/generated/xarray.merge.html

rcomer Nov 19, 2021
Collaborator

See #4294 (comment) for my first attempt's effect on the coord docs 😬

rcomer · 2023-03-31T10:44:05Z

rcomer
Mar 31, 2023
Collaborator

For info, Matplotlib just put type stubs into main matplotlib/matplotlib#24976. As I understand it, the stubs are step towards eventually doing the typing inline - there's discussion in the PR OP about the advantages of the two approaches.

2 replies

rcomer Mar 31, 2023
Collaborator

That PR also links to Dask's typing guidelines - dask/community#255, and in particular quotes "if the return type is a union of things, don't actually type it, because that can be more burdensome to downstream users than just saying Any". That seems relevant for Iris: when I tried using @jamesp's stubs, vscode flagged a bunch of "problems" when I had used cube.extract and assumed the output is a cube but vscode points out that it could be None.

jamesp Mar 31, 2023
Maintainer Author

I think I can get onboard with the dask guidelines eschewing weird complex union types for Any. Not so sure about optional returns that could be None | KnownType.

In the specific example of cube.extract which has the return type None|Cube. If the idomatic Iris approach is "this could be None but we don't worry about that because I always know it won't be" then I would argue a better pragmatic signature would be Cube, not Any.

bjlittle · 2023-06-07T10:29:10Z

bjlittle
Jun 7, 2023
Maintainer

Ping @hdyson

We've started the discussion on typing here, so wade in to the conversation with your opinion. I think the extra context would be super useful, and it would help us understand the general benefits to the wider community.

Personally, I'd love to bump up the priority on this piece of work and make it happen.

1 reply

hdyson Jun 7, 2023

So we're very early days in considering type hinting - in other words, I don't know if what we're hoping to gain from type hinting is practical. There's 2 potential use cases where I think type hinting would give us benefits in ANTS (which is kind of predicated on typing being available for iris since we rely on iris functionality to a degree that type hinting without iris would not be a productive use of time).

The first benefit is that we have some pretty hefty processing pipelines, followed by saving data. This final save is in a variety of file formats, which have different data requirements. If we can use type hinting to identify when someone is using an invalid save routine prior to them spending the time and resources to run the expensive processing, then we can speed up the process of fixing user errors and generally improve the user experience.

The second benefit is that we're going to soon have processing that works on mesh cubes, processing that works on regular lat/lon cubes, and processing that will work with either. If the type hinting in iris enabled us to check for this up front (and that's a big if - I'm not sure if cubes with and without meshes can be identified in this way?), then I think that will give us the opportunity to reduce end user confusion.

ETA: I forgot a third case of recurring confusion in ANTS - masked arrays. Most routines work with and without masked arrays; some require masked arrays; and some require unmasked arrays. If we can identify where applications are not using the correct masked or unmasked array up front, this will make it easier to identify and fix user errors.

pp-mo · 2023-06-09T09:24:56Z

pp-mo
Jun 9, 2023
Maintainer

I'm not sure if cubes with and without meshes can be identified in this way?

Just on a point of fact.. I'm afraid this isn't possible as we have it. We didn't see a need to subclass the cube, but have just extended properties and behaviours so that a cubes "may" have a mesh + related properties.

Likewise with the masked array thing : I think you are hoping for too much here, as virtually anything in Iris (or numpy) that returns an array might return a masked one, and vice versa.

I think there's also a general point here about what typing can do for you + the way it works regarding subclassing...
If code requires or produces a specialised subclass at some point, you can check for that, but you can't state or require that a particular object is a "plain" one and not some specialised subclass -- e.g. to specify an "unmasked array" or a "non-mesh cube" (even if that were a subclass).
I think those requirements are logically inconsistent as, in principle, an object of a subclassed type should possess all the behaviours of (be able to function as) a parent type object. So I think that imposes a rigid limitation on what can be achieved by type hinting.

4 replies

hdyson Jun 9, 2023

Likewise with the masked array thing : I think you are hoping for too much here, as virtually anything in Iris (or numpy) that returns an array might return a masked one, and vice versa.

So I think this does work as I'd expect for the case where we're checking for a masked array. Running mypy on the following snippet:

#!/usr/bin/env python
import numpy as np


def return_expected_masked_array() -> np.ma.masked_array:
    return np.ma.ones((2, 2))


def return_expected_unmasked_array() -> np.ndarray:
    return np.ones((2, 2))


def return_unexpected_masked_array() -> np.ndarray:
    return np.ma.ones((2, 2))


def return_unexpected_unmasked_array() -> np.ma.masked_array:
    return np.ones((2, 2))


def main() -> None:
    a = return_expected_masked_array()
    b = return_expected_unmasked_array()
    c = return_unexpected_masked_array()
    d = return_unexpected_unmasked_array()


if __name__ == '__main__':
    main()

does yield:

numpy_mask_test.py:18: error: Incompatible return value type (got "ndarray[Any, dtype[floating[_64Bit]]]", expected "MaskedArray[Any, Any]")  [return-value]
Found 1 error in 1 file (checked 1 source file)

Okay, it would be better if the unexpected unmasked array was also flagged, but something is better than nothing, even if it's not everything (I also wouldn't rule out user error on my part here - this was a 10 minute exploration...).

I can see the other side of this too though: because that unexpected unmasked array is not flagged, does type hinting introduce a false degree of confidence in the end user - e.g. "I have the typing checked, should I bother with the unit test to enforce this in code"?

jamesp Jun 13, 2023
Maintainer Author

I had an idea back in 2021 when I started this thread that it would be nice for an iris user if the type system understood the structure of their loaded cube. e.g. it would know that the cube has dimensions time x lat x lon so that you could infer the resulting cube going through e.g. collapse.

That wasn't possible at the time. But it might be in the future (3.11) from https://peps.python.org/pep-0646/.

With Variadic Generics you wouldn't need to subclass Cube to expose type information about what the cube contains.

pp-mo Nov 3, 2023
Maintainer

it would be nice ... if the type system understood ... that the cube has dimensions.
wasn't possible at the time. But it might be in the future (3.11)

This is now a real thing in typing support, numpy will probably implement it (but not there yet)

jamesp Nov 3, 2023
Maintainer Author

Wow, the future comes around fast! exciting to see the things that PEP646 enables, especially in our world of multidim arrays

hdyson · 2023-11-02T15:28:00Z

hdyson
Nov 2, 2023

I've recently set up a new codebase built on top of iris, with the opportunity to set new coding standards. We're implementing comprehensive coding standards checks as part of the integration testing. We've had to delay introducing type annotations to the testing for the new project due to the lack of type annotations in iris.

From my point of view of wanting to create a robust and easy to maintain codebase, the selling point for type annotations is that it enables the usage of tools like mypy to check that the type information provided to the end user is accurate. Because iris does not have type annotations, mypy falls back to assuming anything that touches iris is the Any type, and hence does not get effectively checked.

This creates the awkward scenario where developers would see useful error messages when giving incorrect type information for methods that worked purely on numpy arrays, but would not get errors for methods that used an iris data stucture. This leads into the case where developers assume the type information provided is accurate, because there's no error message generated. Given we're leaning very heavily on iris, this effectively bars us from taking advantage of type annotations.

The iris code base is large. Annotating everything would be a gargantuan undertaking - instead, could I propose breaking the task into smaller pieces? Type annotations for the public methods of a Cube would be a more manageable task, and would be a substantial benefit for end users.

I will also flag that Napoleon support for combining numpydoc style docstrings and type annotations isn't in a great state at the moment. So even though I'm advocating for more type annotations in iris, there is definitely a pragmatic argument that it makes sense to wait for the infrastructure to catch up first.

4 replies

pp-mo Nov 3, 2023
Maintainer

Napoleon support for combining numpydoc style docstrings and type annotations isn't in a great state at the moment

Well from my experiments I think it's okay ...
Example (see here for sourcecode that produces it) :

Except for some remaining problems / exceptions ...
Notably (1): that 'np.typing.ArrayLike' is rendered as a horrific expression instead of a link
(but for the little that I know, there may well be a way to fix that)
-- that is also fixable by adding : :class:'numpy.typing.ArrayLike' in the docstring, which seems to act as an override

Notably (2): the results section is a bit weird. It doesn't combine the placeholder name and type info as it does for the parameter sections, but produces separate "Returns" and "Return type" sections.
This despite my having napoleon_use_rtype = False which this doc seem to indicate should fix that
-- problem is unresolved for me (but I personaly can live with it)
( UPDATE: this config setting might be of use here, but it seems a bit naff to have a global definition for these )

hdyson Nov 3, 2023

I don't think ndarray is the right option here (e.g. masked arrays, dask arrays; and it fails when used with mypy --strict).

But that's getting into the details more than I was wanting to. I wanted to raise a concrete use case where type annotations within a part of iris could povide a tangible benefit to end users to potentially inform future iris discussions.

rcomer Nov 3, 2023
Collaborator

@hdyson have you tried @jamesp's stubs? https://github.com/jamesp/iris-stubs

hdyson Nov 3, 2023

@rcomer That's a really good shout - from a little experimentation, it looks like the stubs do give us helpful error cases. I'll experiment a bit further and see if that's something we want to run with.

pp-mo · 2023-11-03T09:15:17Z

pp-mo
Nov 3, 2023
Maintainer

Just to cross-link ideas, we previously concluded that we couldn't make good use of type hinting in API docs, but I think that the ground may have shifted on that : see comment for a suggestion that now seems to work a lot better

As such, I wonder about now reversing #4510 .
IMO it now looks good + the DRY aspect is attractive as ever --- i.e. use type hints instead of numpy-docs type language in docstrings, thus definitely not both (as some of Iris now has)

Support for that, anyone ?

0 replies

pp-mo · 2023-11-08T10:19:33Z

pp-mo
Nov 8, 2023
Maintainer

I did a quick search for advice on the motivation + utility of type hinting generally.

Here's come links I found which I thought were useful on the type hinting question :
General discussion on purposes + value : https://realpython.com/python-type-checking/
4 different tools discussed : https://www.infoworld.com/article/3575079/4-python-type-checkers-to-keep-your-code-clean.html

0 replies

trexfeathers · 2024-04-25T13:04:02Z

trexfeathers
Apr 25, 2024
Maintainer

Closing following the creation of #5924 - we do want to do this one way or another

0 replies

Type Annotations #4272

jamesp Aug 5, 2021 Maintainer

Replies: 9 comments · 18 replies

trexfeathers Aug 5, 2021 Maintainer

jamesp Aug 5, 2021 Maintainer Author

jamesp Aug 5, 2021 Maintainer Author

trexfeathers Aug 5, 2021 Maintainer

jamesp Aug 5, 2021 Maintainer Author

jonseddon Aug 6, 2021 Collaborator

jamesp Aug 9, 2021 Maintainer Author

tkknight Aug 9, 2021 Maintainer

rcomer Nov 19, 2021 Collaborator

rcomer Mar 31, 2023 Collaborator

rcomer Mar 31, 2023 Collaborator

jamesp Mar 31, 2023 Maintainer Author

bjlittle Jun 7, 2023 Maintainer

hdyson Jun 7, 2023

pp-mo Jun 9, 2023 Maintainer

hdyson Jun 9, 2023

jamesp Jun 13, 2023 Maintainer Author

pp-mo Nov 3, 2023 Maintainer

jamesp Nov 3, 2023 Maintainer Author

hdyson Nov 2, 2023

pp-mo Nov 3, 2023 Maintainer

hdyson Nov 3, 2023

rcomer Nov 3, 2023 Collaborator

hdyson Nov 3, 2023

pp-mo Nov 3, 2023 Maintainer

pp-mo Nov 8, 2023 Maintainer

trexfeathers Apr 25, 2024 Maintainer

jamesp
Aug 5, 2021
Maintainer

Replies: 9 comments 18 replies

trexfeathers
Aug 5, 2021
Maintainer

jamesp Aug 5, 2021
Maintainer Author

jamesp Aug 5, 2021
Maintainer Author

trexfeathers Aug 5, 2021
Maintainer

jamesp Aug 5, 2021
Maintainer Author

jonseddon
Aug 6, 2021
Collaborator

jamesp Aug 9, 2021
Maintainer Author

tkknight Aug 9, 2021
Maintainer

rcomer Nov 19, 2021
Collaborator

rcomer
Mar 31, 2023
Collaborator

rcomer Mar 31, 2023
Collaborator

jamesp Mar 31, 2023
Maintainer Author

bjlittle
Jun 7, 2023
Maintainer

pp-mo
Jun 9, 2023
Maintainer

jamesp Jun 13, 2023
Maintainer Author

pp-mo Nov 3, 2023
Maintainer

jamesp Nov 3, 2023
Maintainer Author

hdyson
Nov 2, 2023

pp-mo Nov 3, 2023
Maintainer

rcomer Nov 3, 2023
Collaborator

pp-mo
Nov 3, 2023
Maintainer

pp-mo
Nov 8, 2023
Maintainer

trexfeathers
Apr 25, 2024
Maintainer