Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] #411 introduced even worse side effect than it fixed. #433

Closed
healthmatrice opened this issue Jul 12, 2023 · 22 comments
Closed

[BUG] #411 introduced even worse side effect than it fixed. #433

healthmatrice opened this issue Jul 12, 2023 · 22 comments
Assignees
Labels
bug Something isn't working

Comments

@healthmatrice
Copy link
Contributor

healthmatrice commented Jul 12, 2023

Description

#411 try to fix the pylance cannot resolve to_json(), etc.

However, this breaks all auto-completion of dataclass_json decorated classes.

I think 99% use cases, we need the IDE to treat the decorated class as dataclass instead of DataClassJsonMixin.
Because we all know what kind of method it adds. Thus # type: ignore is totally acceptable.

Moreover, the to_json kind of method is used far less frequent that the dataclass __init__ and its other methods.

I think we should prioritize the IDE support from dataclass itself than the DataClassJsonMixin.

Expected type expression but received "((cls: Unknown) -> type[DataClassJsonMixin]) | type[DataClassJsonMixin]"
  "(cls: Unknown) -> type[DataClassJsonMixin]" is not a class

Expected no arguments to "DataClassJsonMixin" constructor

Can we please revert that change till we found a better solution about that?

The thing is the returned class should be a sub class of both original cls and the DataClassjsonMixin, not just the ABC class.

Code snippet that reproduces the issue

from dataclasses import dataclass
from dataclasses_json import dataclass_json, DataClassJsonMixin
from typing import List

@dataclass_json
@dataclass
class Person:
    x: List[int]

def test(a: Person):
    pass

p = Person([1,])

Describe the results you expected

pylance should now show error and should auto-complete the Person class.

Python version you are using

Python 3.10.6

Environment description

dataclasses-json==0.5.9
marshmallow==3.19.0
marshmallow-enum==1.5.1
mypy-extensions==1.0.0
packaging==23.1
typing-inspect==0.9.0
typing_extensions==4.7.1

@healthmatrice healthmatrice added the bug Something isn't working label Jul 12, 2023
@healthmatrice
Copy link
Contributor Author

The first error message is because these lines:

    if _cls is None:
        return wrap

So the dataclass_json() can actually return a function instead of a class. This confuse static type checker.
Because it can be a function and the static type checker does not know when the return value will be the actual class.

The reason of 2nd error is that the type checker does not know the _process_class() returned sublcass is also derived from Person.

@george-zubrienko
Copy link
Collaborator

george-zubrienko commented Jul 13, 2023

Is this a problem specific to pylance?

However, this breaks all auto-completion of dataclass_json decorated classes.

Untrue, I just tested this on PyCharm with 0.5.9 and auto-complete works as expected when using either annotation or the base class

Also, could you please elaborate from the example, what should be autocompleted? It is a valid code piece.

@george-zubrienko george-zubrienko added triage and removed bug Something isn't working labels Jul 13, 2023
@healthmatrice
Copy link
Contributor Author

nope it is not just pylance. The pycharm has the exactly same problem.
here is a file:

from dataclasses import dataclass
from dataclasses_json import dataclass_json, DataClassJsonMixin
from typing import List


@dataclass_json
@dataclass
class Person:
    x: List[int]


@dataclass
class Person2(DataClassJsonMixin):
    x: List[int]


p2 = Person2([2])


def test(a: Person):
    print(a.x)


p = Person([1])

print(p.x)
print(p.to_json())

mypy actually report error like previous dataclasses_json version, it seems fail to realize it is subclass of DataClassJsonMixin. Here is mypy result:

bug.py:27: error: "Person" has no attribute "to_json"  [attr-defined]
Found 1 error in 1 file (checked 1 source file)

In pycharm it did not report any issue with the code inspection. However,
p2 has proper auto-completion about p2.x, but p doesn't, because the code inspector think p2 is just DataClassJsonMixin.
Here is some screenshot prove it:

Screenshot from 2023-07-14 15-11-45

Screenshot from 2023-07-14 14-03-26

The only reason why pycharm inspection does not report error/warning because:

  1. it seem ignore the error might originated from other imported modules.
  2. they are more relaxed on some rules.
  3. they relies less on static type check.

So I try to create a minimal example to show the problem, once the base class and decorator defined in the same module it starts to report problem:

Screenshot from 2023-07-14 15-47-04

from dataclasses import dataclass
from typing import List, Type
import abc


class Base1(abc.ABC):
    def foo(self, *, x: str) -> str:
        msg = "Base1:" + x
        print(msg)
        return msg


def decorator_factory(_cls=None, *, y: str = "param1"):
    print(f"{_cls=}, {y=}")

    def wrap(cls):
        print(f"wrap {cls=}, {y=}")
        return class_factory(cls, y)

    if _cls is None:
        return wrap

    return wrap(_cls)


def class_factory(cls, y: str) -> Type[Base1]:
    print(f"wrap {cls=}, {y=}")
    cls._y = y
    cls.foo = Base1.foo
    Base1.register(cls)
    return cls


@decorator_factory
@dataclass
class Person:
    x: List[int]


@decorator_factory(y="B")
@dataclass
class PersonB:
    x: List[int]


def test(a: Person):
    print(a.x)


p = Person([1])
pB = PersonB([1])

print(p.x)
print(p.foo(x="test"))

print(pB.x)
print(pB.foo(x="testB"))

Here the error on p = Person([1]) is Type 'list[int]' doesn't have expected attributes '_y', 'foo'
so Apparently it think the Person is a callable wrap returned by the decorator_factory and think the [1] is the cls arg of it.

Interesting part is, if we remove the -> Type[Base1] the pycharm inspector won't report any issue, though it also does not auto-complete the p.foo or p2.foo

pylance give more errors, it implicates that it does not know what class is coming out from the decorator at all:

pylance-test/bug2.py
  pylance-test/bug2.py:40:2 - error: Expected no arguments to "Base1" constructor (reportGeneralTypeIssues)
  pylance-test/bug2.py:46:13 - error: Expected type expression but received "((cls: Unknown) -> type[Base1]) | type[Base1]"
    "(cls: Unknown) -> type[Base1]" is not a class (reportGeneralTypeIssues)
  pylance-test/bug2.py:50:5 - error: Expected no arguments to "Base1" constructor (reportGeneralTypeIssues)
  pylance-test/bug2.py:51:6 - error: Expected no arguments to "Base1" constructor (reportGeneralTypeIssues)
  pylance-test/bug2.py:51:6 - error: Object of type "Base1" is not callable (reportGeneralTypeIssues)
  pylance-test/bug2.py:53:9 - error: Cannot access member "x" for type "type[Base1]"
    Member "x" is unknown (reportGeneralTypeIssues)
  pylance-test/bug2.py:53:9 - error: Cannot access member "x" for type "Base1"
    Member "x" is unknown (reportGeneralTypeIssues)
  pylance-test/bug2.py:54:7 - error: Argument missing for parameter "self" (reportGeneralTypeIssues)
  pylance-test/bug2.py:56:10 - error: Cannot access member "x" for type "Base1"
    Member "x" is unknown (reportGeneralTypeIssues)
9 errors, 0 warnings, 0 informations 

if we remove the annotation, it reports on the memeber foo.

I think the code inspection won't get very deep into the dynamic generated types.
So the missing of the members from DataClassJsonMixin is basically unsolvable if we use decorator.
If we really want this to work, we probably have to change the api. Instead of using decorator,
We would need the user to subclass from DataClassJsonMixin. thought the undefine param handling in __init__

We may have an interface like this:

# in api

DataClassJsonMixin_factory(letter_case, undefined)->Type[DataClassJsonMixin]:
    # add dataclass_json_config and the missing value handler

    return the_subclass

# in user code:
@dataclass
class Person(DataClassJsonMixin_factory(<whatever the parameters>)):
    pass

@george-zubrienko
Copy link
Collaborator

Alright I see the problem now. Decorator function shadows the actual return type, I'll investigate what we can do about this. We don't use decorators internally that's why our CI never saw a problem there. Thanks for clarifications!

@george-zubrienko george-zubrienko added bug Something isn't working and removed triage labels Jul 14, 2023
@george-zubrienko george-zubrienko self-assigned this Jul 14, 2023
@george-zubrienko
Copy link
Collaborator

george-zubrienko commented Jul 17, 2023

So, I researched this a bit and I'm starting to think in another direction here a bit. It is quite clear that if you want proper IDE support you are forced to use a DataClassJsonMixin. Also, if you use annotation, IDE's display method/field information incorrectly, because annotation has some runtime code that only makes sense if you are running in a console.

This resonates a bit with #243 for me. Thus, theoretically, we can make a breaking change to the library (and maybe graduate it to 1.0 afterwards? who knows :)) and rethink the way serialization methods are activated. If we go for the approach of:

  • Remove the subclass completely
  • Automatic creation of a class serializer via annotation @dataclass_json - can be a simple object we attach as a _serializer property
  • Move from_json, to_json, from_dict, to_dict from individual instances to something like functions module and change their behaviours to accept your object with a serializer injected by the annotation:
# current
v = MyClass.from_json("...")

# proposed
v = from_json[MyClass]("...")
# or
v = from_json(MyClass, "...")

This would resolve all IDE/static type checker errors because injected serializer will always return (and hint) the correct type to the caller. Moreover, this would remove the ambiguity of doing either annotation or the subclass and remove the need to support two APIs essentially. I'd argue people would get less confused as well, AND we get a big plus by allowing overwriting of the injected serializer with a homebrewed one, in case somebody wants - or if we want to support some exotic stuff.

As a downside, this would break all code that uses pre-1.0 library, as people would be required to either remove subclassing (like our corp team does) or replace MyClass.from_json everywhere. However, that is not that big of a change.

@healthmatrice @matt035343 @s-vitaliy @lidatong @andersk and others I forgot to tag - could you please let me know your thoughts :)

As for the commit in question, I'd put this to vote - @matt035343 , @s-vitaliy - should we revert?

@healthmatrice
Copy link
Contributor Author

healthmatrice commented Jul 19, 2023

@george-zubrienko I am really appreciate that you are willing to discuss about the potential change of the interface.

I like the function module. Actually I think a lot of users already did this by themselves, for example:

def to_dict(obj, encode_json=True, **kwargs)->str:
    return obj.to_dict(**kwargs) # type: ignore

T = TypeVar("T")
def from_dict(cls: Type[T], dct, **kwargs)->T:
    return cls.from_dict(dct, **kwargs) # type: ignore

That is why I said the to/from_json/dict type error may not hit most users, but the current notation breaks the dataclass autocompletion is a big problem.
I actually almost forgot about this problem years ago since the utility module containing these functions become the fundamental part of most of our code already. but of course the IDE won't tell user what is the actual args expected.

If you officially provide these utilities I can simply directly import them instead of define them.

We also used to hide the dataclass json interface completely inside the json encoder and object_hook.

Then we can basically forget about the dataclass_json interface completely. Our users simply just call json.dump and json.load without knowing the detail internally.
So personally, to me, the break changes you mentioned does not really affect much.

However, I think we should still be very careful about the breaking change in generally.
Maybe people just casually call to/from_json instead of hiding everything behind the scene like me.
If it breaks the old code, people might just consider to switch to another libs (for example pydantic or mashumaro).
So probably it is better to always keep the deprecated injected methods while you remove the subclass completely.

We also need to think about our merit over other libraries (pydantic, etc) and emphasize them.
Personally I choose this one because:

  • it is in pure python and has very small code base. Thus I can really dig into the code to see what cause the behavior I feel strange and also found out the proper configuration/parameter I might need.
  • Only aim for json so there is no burden to support other kind of representation styles.
  • straightforward interface, sane defaults for normal json serialization. It used to take me hours to dig into the documents of other library to figure out the proper configuration parameters for my dataclass.

@artificial-aidan
Copy link

Can reverting the change be considered? This has now broken all over our codebase in a way that wasn't expected.

@george-zubrienko
Copy link
Collaborator

george-zubrienko commented Jul 20, 2023

Can reverting the change be considered? This has now broken all over our codebase in a way that wasn't expected.

Yes, we'll revert this as it seems to negatively interact with certain code analysis tools. However, it would be nice if you provide some context to 'broken all over our codebase' - it would help to plan next steps here - or maybe something else is broken and we need to take a look at that.

@george-zubrienko
Copy link
Collaborator

In 0.5.13 the change in question will be reverted. I'll afterwards close this issue and open a new one for a potential 1.0 breaking API change that should in theory help with issues like this one in the future. Please take a look at it once it appears and let us know your thoughts.

@george-zubrienko
Copy link
Collaborator

Re

If it breaks the old code, people might just consider to switch to another libs (for example pydantic or mashumaro).

I'm personally fine with some users switching from v1, given that we keep the core majority happy. This is not in any way a commercial project, our main goal is to keep the lib healthy, useful and up-to-date with latest Python features/changes. As you also pointed out, this lib provides much less fluff than others, among other things.
That being said, if we decide to make a v1 release, we want the community around the lib to be on the same page with contributors as to why/when/how.

@artificial-aidan
Copy link

Can reverting the change be considered? This has now broken all over our codebase in a way that wasn't expected.

Yes, we'll revert this as it seems to negatively interact with certain code analysis tools. However, it would be nice if you provide some context to 'broken all over our codebase' - it would help to plan next steps here - or maybe something else is broken and we need to take a look at that.

We use this pattern quite often @dataclass_json(letter_case=LetterCase.CAMEL) and pyright. Which now reports the error: Expected no arguments to "DataClassJsonMixin" constructor

Happy to be involved in the breaking API change if I can, please share the issue here and I will subscribe to it.

I think there's nothing wrong with a refactor into v1. Especially if it can meet all the modern typing expectations that have showed up with python. And I know that technically it is 0ver, so breaking changes are "allowed" without a major bump. But I think it should at least bump to 0.6.0.

@george-zubrienko
Copy link
Collaborator

I think there's nothing wrong with a refactor into v1. Especially if it can meet all the modern typing expectations that have showed up with python. And I know that technically it is 0ver, so breaking changes are "allowed" without a major bump. But I think it should at least bump to 0.6.0.

I think if we are going to do this, it will be 1.0 with certain API stability guarantees from that point.

We use this pattern quite often @dataclass_json(letter_case=LetterCase.CAMEL) and pyright. Which now reports the error: Expected no arguments to "DataClassJsonMixin" constructor

Quite curious, this is different from the OP issue, but falls under the same category. We use pylint and it doesn't complain about this. I think this is another example as to why we should consider a refactor

@george-zubrienko
Copy link
Collaborator

0.5.13 with #411 reverted and one more bugfix-to-bugifx has been released. I'll work on a new issue with broader scope and transfer some comments from the OP and others there. Feel free to reopen if you still encounter some issues with static code analysis that you didn't have before.

@george-zubrienko
Copy link
Collaborator

Also, a small note @artificial-aidan and @healthmatrice, if any of you wants to contribute adding checks using pylance/pyright, so we can catch stuff like this in CI - I'd be happy to review the PR :)

@artificial-aidan
Copy link

Do you have any current way of testing type checkers? I can maybe take a look at it. From a brief glance it would involve installing the type checker standalone and running it in CI?

@george-zubrienko
Copy link
Collaborator

george-zubrienko commented Jul 20, 2023

I think for pyright it is pretty simple - if I google correctly, it is a bit like mypy. So, to add it you need a few simple steps:

  • Add an appropriate version here to dev dependencies
  • Run poetry update (remember to include lockfile commit in the PR)
  • Add whatever configs that tool needs: pyrightconfig.json etc.
  • Add a step to code-quality workflow similar to mypy

As an alternative, we can add it as a separate workflow - but that we can figure out if the PR is there. If you are unfamiliar with Poetry, check this part of our README

@artificial-aidan
Copy link

Ok thanks, i missed the mypy step in code-quality.

@george-zubrienko
Copy link
Collaborator

george-zubrienko commented Jul 20, 2023

Ok thanks, i missed the mypy step in code-quality.

Well, to be honest, I am not very satisfied with how it works, and if you look at the annotations' code, lots of stuff is excluded from mypy checks, which is another flag indicating this is not the optimal way to do things.

@healthmatrice
Copy link
Contributor Author

Also, a small note @artificial-aidan and @healthmatrice, if any of you wants to contribute adding checks using pylance/pyright, so we can catch stuff like this in CI - I'd be happy to review the PR :)

I can try when I get some free time. I've already have some pipeline about pyright, but I guess I need to add some unit test first.

@artificial-aidan
Copy link

Would it be possible to yank 0.5.12? (https://peps.python.org/pep-0592/)

I'm still having developers somehow accidentally get 0.5.12 installed

@george-zubrienko
Copy link
Collaborator

Would it be possible to yank 0.5.12? (https://peps.python.org/pep-0592/)

I'm still having developers somehow accidentally get 0.5.12 installed

There will be 0.5.14 end of this week :)
We don't have access to PyPI account, but @lidatong can yank a release for sure, though I don't really see why bother, as we are aiming at release/1-2w, it will become obsolete pretty soon
How do they manage to get 0.5.12 in? Version locks?

@artificial-aidan
Copy link

How do they manage to get 0.5.12 in? Version locks?

Your guess is as good as mine. I didn't bother debugging, just upgraded them and moved on. I figured if it wasn't a lot of work, could save some confusion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants