Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add dataclass_factory argument to dataclasses.make_dataclass for custom dataclass transformation support #118974

Open
XuehaiPan opened this issue May 12, 2024 · 2 comments
Labels
type-feature A feature request or enhancement

Comments

@XuehaiPan
Copy link
Contributor

XuehaiPan commented May 12, 2024

Feature or enhancement

Proposal:

typing.dataclass_transform (PEP 681 – Data Class Transforms) allows users define their own dataclass decorator that can be recognized by the type checker.

Here is a real-world example use case:

Also, dataclasses.asdict and dataclasses.astuple allow users pass an extra argument for the factory of the returned instance.

cpython/Lib/dataclasses.py

Lines 1299 to 1317 in 0fb18b0

def asdict(obj, *, dict_factory=dict):
"""Return the fields of a dataclass instance as a new dictionary mapping
field names to field values.
Example usage::
@dataclass
class C:
x: int
y: int
c = C(1, 2)
assert asdict(c) == {'x': 1, 'y': 2}
If given, 'dict_factory' will be used instead of built-in dict.
The function applies recursively to field values that are
dataclass instances. This will also look into built-in containers:
tuples, lists, and dicts. Other objects are copied with 'copy.deepcopy()'.
"""

cpython/Lib/dataclasses.py

Lines 1380 to 1397 in 0fb18b0

def astuple(obj, *, tuple_factory=tuple):
"""Return the fields of a dataclass instance as a new tuple of field values.
Example usage::
@dataclass
class C:
x: int
y: int
c = C(1, 2)
assert astuple(c) == (1, 2)
If given, 'tuple_factory' will be used instead of built-in tuple.
The function applies recursively to field values that are
dataclass instances. This will also look into built-in containers:
tuples, lists, and dicts. Other objects are copied with 'copy.deepcopy()'.
"""

However, the make_dataclass function does not support third-party dataclass factory (e.g., flax.struct.dataclass):

cpython/Lib/dataclasses.py

Lines 1441 to 1528 in 0fb18b0

def make_dataclass(cls_name, fields, *, bases=(), namespace=None, init=True,
repr=True, eq=True, order=False, unsafe_hash=False,
frozen=False, match_args=True, kw_only=False, slots=False,
weakref_slot=False, module=None):
"""Return a new dynamically created dataclass.
The dataclass name will be 'cls_name'. 'fields' is an iterable
of either (name), (name, type) or (name, type, Field) objects. If type is
omitted, use the string 'typing.Any'. Field objects are created by
the equivalent of calling 'field(name, type [, Field-info])'.::
C = make_dataclass('C', ['x', ('y', int), ('z', int, field(init=False))], bases=(Base,))
is equivalent to::
@dataclass
class C(Base):
x: 'typing.Any'
y: int
z: int = field(init=False)
For the bases and namespace parameters, see the builtin type() function.
The parameters init, repr, eq, order, unsafe_hash, frozen, match_args, kw_only,
slots, and weakref_slot are passed to dataclass().
If module parameter is defined, the '__module__' attribute of the dataclass is
set to that value.
"""
if namespace is None:
namespace = {}
# While we're looking through the field names, validate that they
# are identifiers, are not keywords, and not duplicates.
seen = set()
annotations = {}
defaults = {}
for item in fields:
if isinstance(item, str):
name = item
tp = 'typing.Any'
elif len(item) == 2:
name, tp, = item
elif len(item) == 3:
name, tp, spec = item
defaults[name] = spec
else:
raise TypeError(f'Invalid field: {item!r}')
if not isinstance(name, str) or not name.isidentifier():
raise TypeError(f'Field names must be valid identifiers: {name!r}')
if keyword.iskeyword(name):
raise TypeError(f'Field names must not be keywords: {name!r}')
if name in seen:
raise TypeError(f'Field name duplicated: {name!r}')
seen.add(name)
annotations[name] = tp
# Update 'ns' with the user-supplied namespace plus our calculated values.
def exec_body_callback(ns):
ns.update(namespace)
ns.update(defaults)
ns['__annotations__'] = annotations
# We use `types.new_class()` instead of simply `type()` to allow dynamic creation
# of generic dataclasses.
cls = types.new_class(cls_name, bases, {}, exec_body_callback)
# For pickling to work, the __module__ variable needs to be set to the frame
# where the dataclass is created.
if module is None:
try:
module = sys._getframemodulename(1) or '__main__'
except AttributeError:
try:
module = sys._getframe(1).f_globals.get('__name__', '__main__')
except (AttributeError, ValueError):
pass
if module is not None:
cls.__module__ = module
# Apply the normal decorator.
return dataclass(cls, init=init, repr=repr, eq=eq, order=order,
unsafe_hash=unsafe_hash, frozen=frozen,
match_args=match_args, kw_only=kw_only, slots=slots,
weakref_slot=weakref_slot)

It can only apply dataclasses.dataclass (see the return statement above).

This feature request issue will discuss the possibility of adding a new dataclass_factory argument to the dataclasses.make_dataclass to support third-party dataclasss transformation, similar to dict_factory for dataclasses.asdict.

# dataclasses.py

def make_dataclass(cls_name, fields, *, bases=(), namespace=None, init=True,
                   repr=True, eq=True, order=False, unsafe_hash=False,
                   frozen=False, match_args=True, kw_only=False, slots=False,
                   weakref_slot=False, module=None,
                   dataclass_factory=dataclass):
    ...

    # Apply the normal decorator.
    return dataclass_factory(cls, init=init, repr=repr, eq=eq, order=order,
                             unsafe_hash=unsafe_hash, frozen=frozen,
                             match_args=match_args, kw_only=kw_only, slots=slots,
                             weakref_slot=weakref_slot)

Has this already been discussed elsewhere?

https://discuss.python.org/t/add-dataclass-factory-argument-to-dataclasses-make-dataclass-for-custom-dataclass-transformation-support/53188

Links to previous discussion of this feature:

No response

@XuehaiPan XuehaiPan added the type-feature A feature request or enhancement label May 12, 2024
@ericvsmith
Copy link
Member

That doesn't seem unreasonable to me. But as the issue template says, please discuss this on Discourse first: https://discuss.python.org/c/ideas/6

@XuehaiPan
Copy link
Contributor Author

XuehaiPan commented May 13, 2024

Thanks for the hint. I opened a new thread and let's discuss this there first.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type-feature A feature request or enhancement
Projects
None yet
Development

No branches or pull requests

2 participants