-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
More precisely type pipe
methods
#10038
base: main
Are you sure you want to change the base?
Conversation
Thank you for opening this pull request! It may take us a few days to respond here, so thank you for being patient. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks really good, new testing approach looks super; thank you!
Left a couple of comments
.github/workflows/ci-additional.yaml
Outdated
@@ -33,6 +33,15 @@ jobs: | |||
with: | |||
keyword: "[skip-ci]" | |||
|
|||
detect-act: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is this for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is to detect whether or not act is running. When act
is running, it sets the ACT
env var to true
. In this case, it is so that we can get past the detect-ci-trigger
gate on the mypy
job. When running locally via act
, we cannot otherwise run the mypy
job.
I'm a big proponent of using act
to be able to run github workflow jobs locally, so I'll update this bit of the file to include some clarifying comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, thanks. I wasn't familiar with act
so some more context for those in my shoes would be great
(I also think we could consider this in a different PR, there is a lot going on here — all of it seems good! — but also a bit tougher to review)
.github/workflows/ci-additional.yaml
Outdated
@@ -116,68 +133,23 @@ jobs: | |||
python xarray/util/print_versions.py | |||
- name: Install mypy | |||
run: | | |||
python -m pip install "mypy==1.13" --force-reinstall | |||
python -m pip install mypy pytest-mypy-plugins |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should add required plugins to our default environment, so installing that lets folks run the tests locally
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm happy to move this wherever makes the most sense. Since I'm a new contributor, and there are multiple dependencies files, it is not clear to me where this should go, if not here, so I started out by putting it here to avoid possibly interfering with other things.
Do you want me to move this dep to ci/requirements/environment.yml
and eliminate the additional line here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you want me to move this dep to
ci/requirements/environment.yml
and eliminate the additional line here?
Thanks! We can add it wherever mypy
currently is!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The main reason I put things into CI is because running the new mypy tests via pytest requires that mypy installs missing types. However, in the context of running pytest, the missing types are not installed, and thus the new mypy tests all fail (due to the missing types not having been installed).
In other words, we must run python -m mypy --install-types --non-interactive
in order for the missing types to be installed before we can then successfully run pytest
with the new tests I've added that rely upon pytest-mypy-plugins
. Otherwise, when the tests run, the new tests will fail and will break the build.
Alternatively, I can explicitly add the missing type dependencies to ci/requirements/environment.yml
and also to the dev
dependencies in pyproject.toml
so that we do not need to run mypy --install-types
as a prerequisite for the tests to pass.
Would adding the type dependencies be acceptable?
.github/workflows/ci-additional.yaml
Outdated
- name: Run mypy | ||
run: | | ||
python -m mypy --install-types --non-interactive --cobertura-xml-report mypy_report | ||
python -m pytest -v --mypy-only-local-stub --mypy-pyproject-toml-file=pyproject.toml xarray/**/test_*.yml |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we move these args into a config file, so it's extremely easy to run locally? Ideally when just running pytest
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I originally planned to add these to the pytest addopts setting in pyproject.toml, but wasn't sure if that would end up interfering with something I'm not yet aware of. Is that where you'd like me to move these?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes please!
xarray/core/groupby.py
Outdated
@@ -1607,7 +1607,7 @@ def reduce_array(ar: DataArray) -> DataArray: | |||
|
|||
|
|||
# https://github.com/python/mypy/issues/9031 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# https://github.com/python/mypy/issues/9031 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, thanks. Missed one :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bit of a scope creep going on, but it's a good direction, thanks!
def pipe( | ||
self, | ||
func: Callable[..., T] | tuple[Callable[..., T], str], | ||
func: tuple[Callable[..., T], str], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not this as well?
func: tuple[Callable[..., T], str], | |
func: tuple[Callable[P, T], str] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We cannot do that because when we pass the function as part of a tuple we don't have enough information to precisely type the function's parameters.
In your suggested change, the ParamSpec P
represents all of the function's parameters, but we need to know all of the parameters excluding one (the one that takes the data value, as identified by the name given in the second value of the tuple).
If that's as clear as mud, let me try to clarify with more detail.
In the first form, where we pass only a function as the first argument to pipe
, we expect the function to take the data value as the first argument, meaning that we know exactly where in the function's list of parameters the data parameter is: the first position.
This means we can type the function more precisely, like so, indicating that the data parameter is first, concatenated with zero or more positional and keyword parameters (and returning a value of some type T
):
# Self is a DataWithCoords (DataArray, Dataset, others?), or DataTree
# P represents all parameters to func *excluding* Self
func: Callable[Concatenate[Self, P], T]
Therefore, after passing func
to pipe
, we must pass all arguments except for the data (self) argument, and this is represented by P
, which excludes Self
:
# pipe expects a function followed by all arguments to pass to the function,
# *except* for the data argument, which pipe will *implicitly* pass.
def pipe(f: Callable[Concatenate[Self, P], T], *args: P.args, **kwargs: P.kwargs) -> T: ...
However, when we pass a function/keyword 2-tuple as the first argument to pipe
, we have no idea what position the keyword parameter is in func
's type signature. We only know that it's not the first parameter.
This means, we cannot do the following, as you suggest, because in this case P
includes the data parameter, but it must not, and there's no way of omitting it without knowing precisely what position it's in:
# We don't know where Self is within P, so we have no way of defining P
# as meaning all of func's parameters *except* for Self. Therefore, this
# signature indicates that we can *explicity* pass a data argument, but that's
# not correct.
def pipe(func: tuple[Callable[P, T], str], *args: P.args, **kwargs: P.kwargs) -> T: ...
To clarify, although Callable[P, T]
is valid for the function itself within the tuple, using *args: P.args, **kwargs: P.kwargs
for the rest of the parameters to pipe
in this case is incorrect, because it means that we can explicitly pass the data argument in the mix there (because P
includes Self
), but the whole point of pipe
, of course, is to implicitly pass the data argument, and thus not allow it to be passed explicitly.
Technically speaking, as far as mypy is concerned, your suggestion probably make no difference from what I propose, but in terms of the information it conveys to the reader, it is incorrect.
This is why I discourage the use of the tuple form, and instead recommend the use of a lambda
(or another function def with args reordered such that the data arg is first). Even using a lambda
to reorder things allows mypy to be more helpful than is possible with the tuple form.
For example (taken from the test cases in xarray/tests/test_dataset_typing.yml
in this PR):
from xarray import Dataset
def f(arg: int, ds: Dataset) -> Dataset:
return ds
# Since we cannot provide a precise type annotation when passing a tuple to
# pipe, there's not enough information for type analysis to indicate that
# we are missing an argument for parameter `arg`, so we get no error here.
ds = Dataset().pipe((f, "ds"))
reveal_type(ds) # N: Revealed type is "xarray.core.dataset.Dataset"
# Rather than passing a tuple, passing a lambda that calls `f` with args in
# the correct order allows for proper type analysis, indicating (perhaps
# somewhat cryptically) that we failed to pass an argument for `arg`.
ds = Dataset().pipe(lambda data, arg: f(arg, data))
# mypy produces the following output for the line above, as it should,
# indicating that we forgot to pass an argument to pipe, which pipe needs
# to pass to `f` in the 2nd position.
error: No overload variant of "pipe" of "DataWithCoords" matches argument type "Callable[[Any, Any], Dataset]" [call-overload]
note: Possible overload variants:
note: def [P`9, T] pipe(self, func: Callable[[Dataset, **P], T], *args: P.args, **kwargs: P.kwargs) -> T
note: def [T] pipe(self, func: tuple[Callable[..., T], str], *args: Any, **kwargs: Any) -> T
IMHO, the tuple form should not be supported, for this very reason, but I don't expect that deprecating that form would get much traction from anybody else.
Sorry, I'm not following. Would you mind clarifying? |
Ah sorry, just that the initial issue was only about the pipe typing and now mainly ci stuff was changed that probably should go into it's own PR for easier reviewing. |
Are we sure about |
Thanks for clarifying. In part, that was because I wasn't sure how else to wire up these changes and related tests, and didn't want to write duplicate lines in the ci file. |
That's a different one. I've pulled in this one, which was most recently released Dec 2024: https://pypi.org/project/pytest-mypy-plugins/ |
Mypy 1.15 includes fix for <python/mypy#9031>, allowing several "type: ignore" comments to be removed.
In addition, enhance mypy job configuration to support running it locally via `act`. Fixes pydata#9997
9a3f95e
to
95084f1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks really good. I would still like to hold the line on ensuring that running basic commands works, I left one comment. But otherwise, great to merge from me!
# As noted in the comment on the previous step, we must run type annotation tests | ||
# separately, and we will run them even if the preceding tests failed. Further, | ||
# we must restrict these tests to run only when matrix.env is empty, as this is | ||
# the only case when all of the necessary dependencies are included such that | ||
# spurious mypy errors due to missing packages are eliminated. | ||
- name: Run mypy tests | ||
if: ${{ always() && matrix.env == '' }} | ||
run: python -m pytest xarray/tests/test_*.yml |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm quite committed to ensuring that running a simple pytest
works! I really think it's important that running tests doesn't require looking up various commands in .github/workflows/ci.yaml
If we want to have tests that don't run by default, then we can use pytest marks and run them with an additional options; check out --run-nightly
for an example...
numpy, | ||
] | ||
additional_dependencies: | ||
# Type stubs plus additional dependencies from ci/requirements/environment.yml |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding these, that's helpful
- id: mypy | ||
# Copied from setup.cfg | ||
exclude: "properties|asv_bench" | ||
files: "^xarray" | ||
exclude: "^xarray/util/generate_.*\\.py$" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we keep # Copied from pyproject.toml
? Ideally this would not be on different settings.
But pre-commit
oddities are slowing you down, it might be OK fine to turn this off / push to another PR
Improved precision of type annotations on
pipe
methods to address shortcomings described in #9997.Added pytest plugin that supports testing type annotations, along with relevant tests that use the plugin
Bumped mypy version, which includes a fix to something that previously required a few "type: ignore" comments, which I was able to remove
Combined
mypy
andmypy-min
jobs into a single matrix job to avoid duplicationAdded pytest mypy tests to
mypy
jobImproved mypy pre-commit hook, by reducing nearly 100 errors/warnings (when hook was run manually) to 6, but didn't manage to resolve the final 6 errors, as I was spending too much time on it. I suspect this hook has been "broken" for quite some time, but unnoticed, given that the hook must be run manually (and is not triggered in CI), so I'm wondering if this hook should be removed, particularly since 'act' can now be used to locally run the mypy github job (per changes added to this PR)
Closes Improve type signatures of
pipe
methods to enable type checkers to flag erroneous usages #9997Tests added