Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add deduplication of types #1004

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

walter9388
Copy link

@walter9388 walter9388 commented Feb 20, 2025

Relating to #982

Took me a while to get round to this, but here we are....

I think there are three different levels of approaching this issue:

  1. Remove extra None types only:
    -def f(x: Optional[Union[int, None]]): pass
    +def f(x: int | None): pass
    -def g(x: Union[Optional[int], None]): pass
    +def g(x: int | None): pass
  2. Remove any duplicated scalar types at the same depth by name in Union blocks:
    -def f(x: Union[Union[Union[Union[a, b], c], d], a]): pass
    +def f(x: a | b | c | d): pass
    -def g(x: Union[a.b | a.c, a.b, list[str], str]): pass
    +def g(x: a.b | a.c | list[str] | str): pass
  3. General deduplication at any depth on any block:
    -def f(x: Union[list[Union[int, str]], list[Union[str, int]]]): pass
    +def f(x: list[int | str]): pass

I settled on level 2 as this was still possible with a single pass and seemed more useful than just focusing on None types.
I couldn't see a way of approaching the general problem (level 3) without recursively making a tree structure and then assessing the leaf nodes. However, I am not deeply familiar with the standard python libraries for parsing ASTs etc., so if there are simple built in methods for problems like this I would be interested to know!

I used the existing scan in _fix_union to determine the delimitators at depth==1 between the types. This seemed to work well, but I definitely ran into some interesting edge cases when it came to handling comments, whitespace and multilines.

I have managed to get this working for a variety of test cases, and I would be interested to hear your feedback.

Btw I enjoy your YouTube content! I have learnt a lot of niche things I would have struggled to pick up otherwise. So thank you for that!

Copy link
Owner

@asottile asottile left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just a quick first pass -- will look more closely later

Copy link
Author

@walter9388 walter9388 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just highlighting a few things to be aware of.

from pyupgrade._token_helpers import find_op
from pyupgrade._token_helpers import is_close
from pyupgrade._token_helpers import is_open


def _fix_optional(i: int, tokens: list[Token]) -> None:
j = find_op(tokens, i, '[')
k = find_closing_bracket(tokens, j)
k, contains_none = _find_closing_bracket_and_if_contains_none(tokens, j)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Modified the general find_closing_bracket function to also check for whether the optional block already contains None in the same pass.

Comment on lines +30 to +34
tokens[k:k + 1] = [
Token('UNIMPORTANT_WS', ' '),
Token('CODE', '| '),
Token('CODE', 'None'),
]
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason for changing the single token containing | None to explicit whitespace, | and None is for the deduplication and whitespace removal functions used in _fix_union. This also applies to the multiline version a few lines below.

Comment on lines +137 to +139
to_delete += _remove_consecutive_unimportant_ws(
tokens, [x for x in range(j, k) if x not in to_delete],
)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not convinced this is the best approach to remove whitespace, but not sure about what to do in situations where lines are completely deleted other than comments. I have written a niche test for this situation in test id='duplicated types in multi-line nested unions or optionals'.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants