Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More precisely type pipe methods #10038

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

chuckwondo
Copy link

@chuckwondo chuckwondo commented Feb 7, 2025

Improved precision of type annotations on pipe methods to address shortcomings described in #9997.

  • Added pytest plugin that supports testing type annotations, along with relevant tests that use the plugin

  • Bumped mypy version, which includes a fix to something that previously required a few "type: ignore" comments, which I was able to remove

  • Combined mypy and mypy-min jobs into a single matrix job to avoid duplication

  • Added pytest mypy tests to mypy job

  • Improved mypy pre-commit hook, by reducing nearly 100 errors/warnings (when hook was run manually) to 6, but didn't manage to resolve the final 6 errors, as I was spending too much time on it. I suspect this hook has been "broken" for quite some time, but unnoticed, given that the hook must be run manually (and is not triggered in CI), so I'm wondering if this hook should be removed, particularly since 'act' can now be used to locally run the mypy github job (per changes added to this PR)

  • Closes Improve type signatures of pipe methods to enable type checkers to flag erroneous usages #9997

  • Tests added

Copy link

welcome bot commented Feb 7, 2025

Thank you for opening this pull request! It may take us a few days to respond here, so thank you for being patient.
If you have questions, some answers may be found in our contributing guidelines.

Copy link
Collaborator

@max-sixty max-sixty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks really good, new testing approach looks super; thank you!

Left a couple of comments

@@ -33,6 +33,15 @@ jobs:
with:
keyword: "[skip-ci]"

detect-act:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this for?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is to detect whether or not act is running. When act is running, it sets the ACT env var to true. In this case, it is so that we can get past the detect-ci-trigger gate on the mypy job. When running locally via act, we cannot otherwise run the mypy job.

I'm a big proponent of using act to be able to run github workflow jobs locally, so I'll update this bit of the file to include some clarifying comments.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, thanks. I wasn't familiar with act so some more context for those in my shoes would be great

(I also think we could consider this in a different PR, there is a lot going on here — all of it seems good! — but also a bit tougher to review)

@@ -116,68 +133,23 @@ jobs:
python xarray/util/print_versions.py
- name: Install mypy
run: |
python -m pip install "mypy==1.13" --force-reinstall
python -m pip install mypy pytest-mypy-plugins
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should add required plugins to our default environment, so installing that lets folks run the tests locally

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy to move this wherever makes the most sense. Since I'm a new contributor, and there are multiple dependencies files, it is not clear to me where this should go, if not here, so I started out by putting it here to avoid possibly interfering with other things.

Do you want me to move this dep to ci/requirements/environment.yml and eliminate the additional line here?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you want me to move this dep to ci/requirements/environment.yml and eliminate the additional line here?

Thanks! We can add it wherever mypy currently is!

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main reason I put things into CI is because running the new mypy tests via pytest requires that mypy installs missing types. However, in the context of running pytest, the missing types are not installed, and thus the new mypy tests all fail (due to the missing types not having been installed).

In other words, we must run python -m mypy --install-types --non-interactive in order for the missing types to be installed before we can then successfully run pytest with the new tests I've added that rely upon pytest-mypy-plugins. Otherwise, when the tests run, the new tests will fail and will break the build.

Alternatively, I can explicitly add the missing type dependencies to ci/requirements/environment.yml and also to the dev dependencies in pyproject.toml so that we do not need to run mypy --install-types as a prerequisite for the tests to pass.

Would adding the type dependencies be acceptable?

- name: Run mypy
run: |
python -m mypy --install-types --non-interactive --cobertura-xml-report mypy_report
python -m pytest -v --mypy-only-local-stub --mypy-pyproject-toml-file=pyproject.toml xarray/**/test_*.yml
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we move these args into a config file, so it's extremely easy to run locally? Ideally when just running pytest...

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I originally planned to add these to the pytest addopts setting in pyproject.toml, but wasn't sure if that would end up interfering with something I'm not yet aware of. Is that where you'd like me to move these?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes please!

.github/workflows/ci-additional.yaml Outdated Show resolved Hide resolved
.github/workflows/ci-additional.yaml Outdated Show resolved Hide resolved
@@ -1607,7 +1607,7 @@ def reduce_array(ar: DataArray) -> DataArray:


# https://github.com/python/mypy/issues/9031
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# https://github.com/python/mypy/issues/9031

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, thanks. Missed one :-)

Copy link
Collaborator

@headtr1ck headtr1ck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bit of a scope creep going on, but it's a good direction, thanks!

def pipe(
self,
func: Callable[..., T] | tuple[Callable[..., T], str],
func: tuple[Callable[..., T], str],
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not this as well?

Suggested change
func: tuple[Callable[..., T], str],
func: tuple[Callable[P, T], str]

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We cannot do that because when we pass the function as part of a tuple we don't have enough information to precisely type the function's parameters.

In your suggested change, the ParamSpec P represents all of the function's parameters, but we need to know all of the parameters excluding one (the one that takes the data value, as identified by the name given in the second value of the tuple).

If that's as clear as mud, let me try to clarify with more detail.

In the first form, where we pass only a function as the first argument to pipe, we expect the function to take the data value as the first argument, meaning that we know exactly where in the function's list of parameters the data parameter is: the first position.

This means we can type the function more precisely, like so, indicating that the data parameter is first, concatenated with zero or more positional and keyword parameters (and returning a value of some type T):

# Self is a DataWithCoords (DataArray, Dataset, others?), or DataTree
# P represents all parameters to func *excluding* Self
func: Callable[Concatenate[Self, P], T]

Therefore, after passing func to pipe, we must pass all arguments except for the data (self) argument, and this is represented by P, which excludes Self:

# pipe expects a function followed by all arguments to pass to the function,
# *except* for the data argument, which pipe will *implicitly* pass.
def pipe(f: Callable[Concatenate[Self, P], T], *args: P.args, **kwargs: P.kwargs) -> T: ...

However, when we pass a function/keyword 2-tuple as the first argument to pipe, we have no idea what position the keyword parameter is in func's type signature. We only know that it's not the first parameter.

This means, we cannot do the following, as you suggest, because in this case P includes the data parameter, but it must not, and there's no way of omitting it without knowing precisely what position it's in:

# We don't know where Self is within P, so we have no way of defining P
# as meaning all of func's parameters *except* for Self. Therefore, this
# signature indicates that we can *explicity* pass a data argument, but that's
# not correct.
def pipe(func: tuple[Callable[P, T], str], *args: P.args, **kwargs: P.kwargs) -> T: ...

To clarify, although Callable[P, T] is valid for the function itself within the tuple, using *args: P.args, **kwargs: P.kwargs for the rest of the parameters to pipe in this case is incorrect, because it means that we can explicitly pass the data argument in the mix there (because P includes Self), but the whole point of pipe, of course, is to implicitly pass the data argument, and thus not allow it to be passed explicitly.

Technically speaking, as far as mypy is concerned, your suggestion probably make no difference from what I propose, but in terms of the information it conveys to the reader, it is incorrect.

This is why I discourage the use of the tuple form, and instead recommend the use of a lambda (or another function def with args reordered such that the data arg is first). Even using a lambda to reorder things allows mypy to be more helpful than is possible with the tuple form.

For example (taken from the test cases in xarray/tests/test_dataset_typing.yml in this PR):

    from xarray import Dataset

    def f(arg: int, ds: Dataset) -> Dataset:
        return ds

    # Since we cannot provide a precise type annotation when passing a tuple to
    # pipe, there's not enough information for type analysis to indicate that
    # we are missing an argument for parameter `arg`, so we get no error here.

    ds = Dataset().pipe((f, "ds"))
    reveal_type(ds)  # N: Revealed type is "xarray.core.dataset.Dataset"

    # Rather than passing a tuple, passing a lambda that calls `f` with args in
    # the correct order allows for proper type analysis, indicating (perhaps
    # somewhat cryptically) that we failed to pass an argument for `arg`.

    ds = Dataset().pipe(lambda data, arg: f(arg, data))

    # mypy produces the following output for the line above, as it should,
    # indicating that we forgot to pass an argument to pipe, which pipe needs
    # to pass to `f` in the 2nd position.
    error: No overload variant of "pipe" of "DataWithCoords" matches argument type "Callable[[Any, Any], Dataset]"  [call-overload]
    note: Possible overload variants:
    note:     def [P`9, T] pipe(self, func: Callable[[Dataset, **P], T], *args: P.args, **kwargs: P.kwargs) -> T
    note:     def [T] pipe(self, func: tuple[Callable[..., T], str], *args: Any, **kwargs: Any) -> T

IMHO, the tuple form should not be supported, for this very reason, but I don't expect that deprecating that form would get much traction from anybody else.

@chuckwondo
Copy link
Author

Bit of a scope creep going on, but it's a good direction, thanks!

Sorry, I'm not following. Would you mind clarifying?

@headtr1ck
Copy link
Collaborator

Bit of a scope creep going on, but it's a good direction, thanks!

Sorry, I'm not following. Would you mind clarifying?

Ah sorry, just that the initial issue was only about the pipe typing and now mainly ci stuff was changed that probably should go into it's own PR for easier reviewing.

@headtr1ck
Copy link
Collaborator

Are we sure about pytest-mypy? There has not been a release for over 2 years. Is there any other larger project using this?

@chuckwondo
Copy link
Author

Bit of a scope creep going on, but it's a good direction, thanks!

Sorry, I'm not following. Would you mind clarifying?

Ah sorry, just that the initial issue was only about the pipe typing and now mainly ci stuff was changed that probably should go into it's own PR for easier reviewing.

Thanks for clarifying. In part, that was because I wasn't sure how else to wire up these changes and related tests, and didn't want to write duplicate lines in the ci file.

@chuckwondo
Copy link
Author

Are we sure about pytest-mypy? There has not been a release for over 2 years. Is there any other larger project using this?

That's a different one. I've pulled in this one, which was most recently released Dec 2024: https://pypi.org/project/pytest-mypy-plugins/

Copy link
Collaborator

@max-sixty max-sixty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks really good. I would still like to hold the line on ensuring that running basic commands works, I left one comment. But otherwise, great to merge from me!

Comment on lines +157 to +164
# As noted in the comment on the previous step, we must run type annotation tests
# separately, and we will run them even if the preceding tests failed. Further,
# we must restrict these tests to run only when matrix.env is empty, as this is
# the only case when all of the necessary dependencies are included such that
# spurious mypy errors due to missing packages are eliminated.
- name: Run mypy tests
if: ${{ always() && matrix.env == '' }}
run: python -m pytest xarray/tests/test_*.yml
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm quite committed to ensuring that running a simple pytest works! I really think it's important that running tests doesn't require looking up various commands in .github/workflows/ci.yaml

If we want to have tests that don't run by default, then we can use pytest marks and run them with an additional options; check out --run-nightly for an example...

numpy,
]
additional_dependencies:
# Type stubs plus additional dependencies from ci/requirements/environment.yml
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding these, that's helpful

Comment on lines 47 to +49
- id: mypy
# Copied from setup.cfg
exclude: "properties|asv_bench"
files: "^xarray"
exclude: "^xarray/util/generate_.*\\.py$"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we keep # Copied from pyproject.toml? Ideally this would not be on different settings.

But pre-commit oddities are slowing you down, it might be OK fine to turn this off / push to another PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improve type signatures of pipe methods to enable type checkers to flag erroneous usages
4 participants