Skip to content

Add dataclass_factory argument to dataclasses.make_dataclass for custom dataclass transformation support #118974

Open
@XuehaiPan

Description

@XuehaiPan

Feature or enhancement

Proposal:

typing.dataclass_transform (PEP 681 – Data Class Transforms) allows users define their own dataclass decorator that can be recognized by the type checker.

Here is a real-world example use case:

Also, dataclasses.asdict and dataclasses.astuple allow users pass an extra argument for the factory of the returned instance.

cpython/Lib/dataclasses.py

Lines 1299 to 1317 in 0fb18b0

def asdict(obj, *, dict_factory=dict):
"""Return the fields of a dataclass instance as a new dictionary mapping
field names to field values.
Example usage::
@dataclass
class C:
x: int
y: int
c = C(1, 2)
assert asdict(c) == {'x': 1, 'y': 2}
If given, 'dict_factory' will be used instead of built-in dict.
The function applies recursively to field values that are
dataclass instances. This will also look into built-in containers:
tuples, lists, and dicts. Other objects are copied with 'copy.deepcopy()'.
"""

cpython/Lib/dataclasses.py

Lines 1380 to 1397 in 0fb18b0

def astuple(obj, *, tuple_factory=tuple):
"""Return the fields of a dataclass instance as a new tuple of field values.
Example usage::
@dataclass
class C:
x: int
y: int
c = C(1, 2)
assert astuple(c) == (1, 2)
If given, 'tuple_factory' will be used instead of built-in tuple.
The function applies recursively to field values that are
dataclass instances. This will also look into built-in containers:
tuples, lists, and dicts. Other objects are copied with 'copy.deepcopy()'.
"""

However, the make_dataclass function does not support third-party dataclass factory (e.g., flax.struct.dataclass):

cpython/Lib/dataclasses.py

Lines 1441 to 1528 in 0fb18b0

def make_dataclass(cls_name, fields, *, bases=(), namespace=None, init=True,
repr=True, eq=True, order=False, unsafe_hash=False,
frozen=False, match_args=True, kw_only=False, slots=False,
weakref_slot=False, module=None):
"""Return a new dynamically created dataclass.
The dataclass name will be 'cls_name'. 'fields' is an iterable
of either (name), (name, type) or (name, type, Field) objects. If type is
omitted, use the string 'typing.Any'. Field objects are created by
the equivalent of calling 'field(name, type [, Field-info])'.::
C = make_dataclass('C', ['x', ('y', int), ('z', int, field(init=False))], bases=(Base,))
is equivalent to::
@dataclass
class C(Base):
x: 'typing.Any'
y: int
z: int = field(init=False)
For the bases and namespace parameters, see the builtin type() function.
The parameters init, repr, eq, order, unsafe_hash, frozen, match_args, kw_only,
slots, and weakref_slot are passed to dataclass().
If module parameter is defined, the '__module__' attribute of the dataclass is
set to that value.
"""
if namespace is None:
namespace = {}
# While we're looking through the field names, validate that they
# are identifiers, are not keywords, and not duplicates.
seen = set()
annotations = {}
defaults = {}
for item in fields:
if isinstance(item, str):
name = item
tp = 'typing.Any'
elif len(item) == 2:
name, tp, = item
elif len(item) == 3:
name, tp, spec = item
defaults[name] = spec
else:
raise TypeError(f'Invalid field: {item!r}')
if not isinstance(name, str) or not name.isidentifier():
raise TypeError(f'Field names must be valid identifiers: {name!r}')
if keyword.iskeyword(name):
raise TypeError(f'Field names must not be keywords: {name!r}')
if name in seen:
raise TypeError(f'Field name duplicated: {name!r}')
seen.add(name)
annotations[name] = tp
# Update 'ns' with the user-supplied namespace plus our calculated values.
def exec_body_callback(ns):
ns.update(namespace)
ns.update(defaults)
ns['__annotations__'] = annotations
# We use `types.new_class()` instead of simply `type()` to allow dynamic creation
# of generic dataclasses.
cls = types.new_class(cls_name, bases, {}, exec_body_callback)
# For pickling to work, the __module__ variable needs to be set to the frame
# where the dataclass is created.
if module is None:
try:
module = sys._getframemodulename(1) or '__main__'
except AttributeError:
try:
module = sys._getframe(1).f_globals.get('__name__', '__main__')
except (AttributeError, ValueError):
pass
if module is not None:
cls.__module__ = module
# Apply the normal decorator.
return dataclass(cls, init=init, repr=repr, eq=eq, order=order,
unsafe_hash=unsafe_hash, frozen=frozen,
match_args=match_args, kw_only=kw_only, slots=slots,
weakref_slot=weakref_slot)

It can only apply dataclasses.dataclass (see the return statement above).

This feature request issue will discuss the possibility of adding a new dataclass_factory argument to the dataclasses.make_dataclass to support third-party dataclasss transformation, similar to dict_factory for dataclasses.asdict.

# dataclasses.py

def make_dataclass(cls_name, fields, *, bases=(), namespace=None, init=True,
                   repr=True, eq=True, order=False, unsafe_hash=False,
                   frozen=False, match_args=True, kw_only=False, slots=False,
                   weakref_slot=False, module=None,
                   dataclass_factory=dataclass):
    ...

    # Apply the normal decorator.
    return dataclass_factory(cls, init=init, repr=repr, eq=eq, order=order,
                             unsafe_hash=unsafe_hash, frozen=frozen,
                             match_args=match_args, kw_only=kw_only, slots=slots,
                             weakref_slot=weakref_slot)

Has this already been discussed elsewhere?

https://discuss.python.org/t/add-dataclass-factory-argument-to-dataclasses-make-dataclass-for-custom-dataclass-transformation-support/53188

Links to previous discussion of this feature:

No response

Linked PRs

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions