Skip to content

[BUG] @tensorclass-decorated dataclasses violate @beartype-based runtime type-checking 😭 #1243

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
leycec opened this issue Mar 1, 2025 · 4 comments
Assignees
Labels
enhancement New feature or request

Comments

@leycec
Copy link

leycec commented Mar 1, 2025

Hybrid runtime-static type-checking @beartype maintainer @leycec here. Because we live on the worst timeline, @beartype users like @pablovela5620 pour one out for Austin Texas are now futilely attempting to integrate @beartype with @tensorclass-decorated types.

Predictably, @beartype refuses. Minimal-working example or it didn't happen:

from beartype import beartype
from tensordict.tensorclass import tensorclass
from torch import Tensor

@beartype
@tensorclass
class MyData(object):
    floatdata: Tensor
    intdata: Tensor
    non_tensordata: str

...which raises the unreadable exception:

Traceback (most recent call last):
  File "/home/leycec/tmp/mopy.py", line 8, in <module>
    @beartype
     ^^^^^^^^
  File "/home/leycec/py/beartype/beartype/_decor/decorcache.py", line 74, in beartype
    return beartype_object(obj, conf)
  File "/home/leycec/py/beartype/beartype/_decor/decorcore.py", line 87, in beartype_object
    _beartype_object_fatal(obj, conf=conf, **kwargs)
    ~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/leycec/py/beartype/beartype/_decor/decorcore.py", line 133, in _beartype_object_fatal
    beartype_type(obj, **kwargs)  # type: ignore[return-value]
    ~~~~~~~~~~~~~^^^^^^^^^^^^^^^
  File "/home/leycec/py/beartype/beartype/_decor/_decortype.py", line 224, in beartype_type
    attr_value_beartyped = beartype_object(  # type: ignore[type-var]
        obj=attr_value, conf=conf, cls_stack=cls_stack)
  File "/home/leycec/py/beartype/beartype/_decor/decorcore.py", line 87, in beartype_object
    _beartype_object_fatal(obj, conf=conf, **kwargs)
    ~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/leycec/py/beartype/beartype/_decor/decorcore.py", line 137, in _beartype_object_fatal
    beartype_nontype(obj, **kwargs)  # type: ignore[return-value]
    ~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^
  File "/home/leycec/py/beartype/beartype/_decor/_decornontype.py", line 249, in beartype_nontype
    return beartype_func(obj, **kwargs)  # type: ignore[return-value]
  File "/home/leycec/py/beartype/beartype/_decor/_decornontype.py", line 336, in beartype_func
    func_wrapper_code = generate_code(decor_meta)
  File "/home/leycec/py/beartype/beartype/_decor/wrap/wrapmain.py", line 126, in generate_code
    code_check_return = _code_check_return(decor_meta)
  File "/home/leycec/py/beartype/beartype/_decor/wrap/_wrapreturn.py", line 253, in code_check_return
    reraise_exception_placeholder(
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^
        exception=exception,
        ^^^^^^^^^^^^^^^^^^^^
    ...<3 lines>...
        ),
        ^^
    )
    ^
  File "/home/leycec/py/beartype/beartype/_util/error/utilerrraise.py", line 137, in reraise_exception_placeholder
    raise exception.with_traceback(exception.__traceback__)
  File "/home/leycec/py/beartype/beartype/_decor/wrap/_wrapreturn.py", line 135, in code_check_return
    hint_or_sane = sanify_hint_root_func(
        decor_meta=decor_meta,
    ...<2 lines>...
        exception_prefix=EXCEPTION_PLACEHOLDER,
    )
  File "/home/leycec/py/beartype/beartype/_check/convert/convsanify.py", line 200, in sanify_hint_root_func
    hint_or_sane = reduce_hint(
        hint=hint,
    ...<5 lines>...
        exception_prefix=exception_prefix,
    )
  File "/home/leycec/py/beartype/beartype/_check/convert/_reduce/redhint.py", line 379, in reduce_hint
    hint_or_sane = _reduce_hint_cached(hint, conf, exception_prefix)
  File "/home/leycec/py/beartype/beartype/_util/cache/utilcachecall.py", line 249, in _callable_cached
    raise exception
  File "/home/leycec/py/beartype/beartype/_util/cache/utilcachecall.py", line 241, in _callable_cached
    return_value = args_flat_to_return_value[args_flat] = func(
                                                          ~~~~^
        *args)
        ^^^^^^
  File "/home/leycec/py/beartype/beartype/_check/convert/_reduce/redhint.py", line 511, in _reduce_hint_cached
    hint = hint_reducer(  # type: ignore[call-arg]
        hint=hint,  # pyright: ignore[reportGeneralTypeIssues]
        conf=conf,
        exception_prefix=exception_prefix,
    )
  File "/home/leycec/py/beartype/beartype/_check/convert/_reduce/_nonpep/rednonpeptype.py", line 103, in reduce_hint_nonpep_type
    die_unless_hint(hint=hint, exception_prefix=exception_prefix)
    ~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/leycec/py/beartype/beartype/_util/hint/utilhinttest.py", line 103, in die_unless_hint
    die_unless_hint_nonpep(hint=hint, exception_prefix=exception_prefix)
    ~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/leycec/py/beartype/beartype/_util/hint/nonpep/utilnonpeptest.py", line 204, in die_unless_hint_nonpep
    raise exception_cls(
    ...<6 lines>...
    )
beartype.roar.BeartypeDecorHintNonpepException: Method tensordict.tensorclass._eq()
return type hint <function _wrap_td_method.<locals>.wrapped_func at 0x726f56d64e00> 
either PEP-noncompliant or PEP-compliant but currently unsupported by @beartype. You 
suddenly feel encouraged to submit a feature request for this hint to our friendly 
issue tracker at:
	https://github.com/beartype/beartype/issues

I'm... outta my depth here, guys. There's so much badness happening all it once that it's non-trivial to track all of the badness.

Let's start with the obvious.

TensorDict Return Type Hints Appear to be Madness Incarnate

The tensordict.tensorclass._eq() method appears to be annotated by a return type hint that is (...waitforit) a <function _wrap_td_method.<locals>.wrapped_func at 0x726f56d64e00> closure function.

No idea, guys. @beartype doesn't generally lie about these things. Very well. I admit it. @beartype often lies. So, that's possibly what is happening. Alternately, @beartype could be as confused as we all are right now. Let's generously assume that what @beartype is thinking is happening is happening. It's unclear why anybody would want that to happen, though. Also, this is why type-checking actually is important. It's not something that TensorDict should be intentionally ignoring. Type-checking could have caught issues like this earlier.

Because something definitely smells in the TensorDict codebase. Very well. It might be the @beartype codebase that actually smells here. Functions aren't valid type hints, obviously. So why is TensorDict possibly using functions as type hints? Alternately, why does @beartype mistakenly believe that this is what is happening?

No idea, guys. It's probably just a trivial typo somewhere five layers deep in the bowels of some private submodule dynamically defined in-memory, just because. Still, this is badness.

Which leads us to...

type(MyData) is type

The type of @tensorclass-decorated dataclasses inexplicably appears to be (...waitforit) the ambiguous type superclass that conveys no meaningful semantics:

# Uhh... wat? Tensorclasses don't even have a distinguishable class?
# You've gotta be friggin' kidding me here. What is this ugly madness?
>>> print(type(MyData))
type  # <-- ...you don't say

Uhh. Wat? Because these dataclasses inherit from no TensorDict-specific superclasses or metaclasses, downstream third-party consumers like @beartype have no sane means of detecting and thus supporting these dataclasses at runtime.

Like, seriously. Do something to distinguish the method-resolution orders (MROs) of these dataclasses from standard @dataclasses.dataclass-decorated dataclasses. Doing nothing definitely isn't cutting it.

Which leads us to...

That Ship Has Already Sailed

So. I get it. The @tensorclass ship has already sailed. There's already millions of lines of code in the wild using this API. @beartype can hope for a better API all it likes – but that doesn't particularly change the grim reality of the situation on the ground.

So. How should downstream third-party consumers like @beartype detect these dataclasses then? I note a public tensordict.is_tensorclass() tester function that seems particularly pertinent. The issue there, of course, is that calling that function requires @beartype to import that function which requires @beartype to then require TensorDict as a mandatory dependency – which @beartype is absolutely not doing. @beartype has no mandatory dependencies and that's never changing.

So, What now? Even importing TensorDict with import tensordict and then immediately terminating the active Python process under python -v shows an extreme number of extremely costly imports. In other words, @beartype is absolutely also not doing something like this:

try:
    from tensordict import is_tensorclass
except ImportError:
    def is_tensorclass(obj: object) -> bool: return False

Although trivial, that would make importing @beartype itself equally costly. The whole point of @beartype is to be "cost-free." Which leads us to...

dir(MyData) is Truly a Nightmare on Earth

Seriously. I cannot believe how utterly intense the TensorDict API is. This is the most outrageously verbose dir() output I've ever seen Python spit out:

# My eyes and your eyes are now both bleeding.
>>> print(dir(MyData))
['__abs__', '__add__', '__and__', '__annotations__', '__bool__', '__class__',
'__dataclass_fields__', '__dataclass_params__', '__delattr__', '__dict__',
'__dir__', '__doc__', '__enter__', '__eq__', '__exit__', '__expected_keys__',
'__firstlineno__', '__format__', '__ge__', '__getattr__', '__getattribute__', 
'__getitem__', '__getitems__', '__getstate__', '__gt__', '__hash__', '__iadd__', 
'__imul__', '__init__', '__init_subclass__', '__invert__', '__ipow__', '__isub__',
'__itruediv__', '__le__', '__len__', '__lt__', '__match_args__', '__module__', 
'__mul__', '__ne__', '__neg__', '__new__', '__or__', '__pow__', '__radd__', 
'__rand__', '__reduce__', '__reduce_ex__', '__replace__', '__repr__', '__rmul__', 
'__ror__', '__rpow__', '__rsub__', '__rtruediv__', '__rxor__', '__setattr__', 
'__setitem__', '__setstate__', '__sizeof__', '__static_attributes__', '__str__', 
'__sub__', '__subclasshook__', '__torch_function__', '__truediv__', '__weakref__',
'__xor__', '_add_batch_dim', '_apply_nest', '_autocast', '_check_batch_size', 
'_check_device', '_check_dim_name', '_check_unlock', '_clone', '_clone_recurse', 
'_data', '_default_get', '_erase_names', '_exclude', '_fast_apply', 
'_flatten_keys_inplace', '_flatten_keys_outplace', '_from_dict_validated', 
'_from_module', '_from_tensordict', '_frozen', '_get_at_str', '_get_at_tuple', 
'_get_names_idx', '_get_str', '_get_sub_tensordict', '_get_tuple', 
'_get_tuple_maybe_non_tensor', '_grad', '_has_names', '_is_non_tensor', 
'_is_tensorclass', '_items_list', '_load_memmap', '_map', '_maybe_names', 
'_maybe_remove_batch_dim', '_memmap_', '_multithread_apply_flat', 
'_multithread_apply_nest', '_multithread_rebuild', '_new_unsafe', '_nocast', 
'_permute', '_propagate_lock', '_propagate_unlock', '_reduce_get_metadata', 
'_remove_batch_dim', '_repeat', '_select', '_set_at_str', '_set_at_tuple', 
'_set_str', '_set_tuple', '_shadow', '_to_module', '_type_hints', '_unbind', 
'_values_list', 'abs', 'abs_', 'acos', 'acos_', 'add', 'add_', 'addcdiv', 
'addcdiv_', 'addcmul', 'addcmul_', 'all', 'amax', 'amin', 'any', 'apply', 
'apply_', 'as_tensor', 'asin', 'asin_', 'atan', 'atan_', 'auto_batch_size_', 
'auto_device_', 'batch_dims', 'batch_size', 'bfloat16', 'bitwise_and', 'bool', 
'bytes', 'cat', 'cat_from_tensordict', 'cat_tensors', 'ceil', 'ceil_', 'chunk', 
'clamp', 'clamp_max', 'clamp_max_', 'clamp_min', 'clamp_min_', 'clear', 
'clear_device_', 'clear_refs_for_compile_', 'clone', 'complex128', 'complex32', 
'complex64', 'consolidate', 'contiguous', 'copy', 'copy_', 'copy_at_', 'cos', 
'cos_', 'cosh', 'cosh_', 'cpu', 'create_nested', 'cuda', 'cummax', 'cummin', 
'data', 'data_ptr', 'del_', 'densify', 'depth', 'detach', 'detach_', 'device', 
'dim', 'div', 'div_', 'double', 'dtype', 'dumps', 'empty', 'entry_class', 'erf', 
'erf_', 'erfc', 'erfc_', 'exclude', 'exp', 'exp_', 'expand', 'expand_as', 'expm1', 
'expm1_', 'fields', 'fill_', 'filter_empty_', 'filter_non_tensor_data', 'flatten', 
'flatten_keys', 'float', 'float16', 'float32', 'float64', 'floor', 'floor_', 
'frac', 'frac_', 'from_any', 'from_consolidated', 'from_dataclass', 'from_dict', 
'from_dict_instance', 'from_h5', 'from_module', 'from_modules', 'from_namedtuple', 
'from_pytree', 'from_struct_array', 'from_tensordict', 'from_tuple', 'fromkeys', 
'gather', 'gather_and_stack', 'get', 'get_at', 'get_item_shape', 'get_non_tensor', 
'grad', 'half', 'int', 'int16', 'int32', 'int64', 'int8', 'irecv', 
'is_consolidated', 'is_contiguous', 'is_cpu', 'is_cuda', 'is_empty', 
'is_floating_point', 'is_locked', 'is_memmap', 'is_meta', 'is_shared', 'isend', 
'isfinite', 'isnan', 'isneginf', 'isposinf', 'isreal', 'items', 'keys', 
'lazy_stack', 'lerp', 'lerp_', 'lgamma', 'lgamma_', 'load', 'load_', 
'load_memmap', 'load_memmap_', 'load_state_dict', 'lock_', 'log', 'log10', 
'log10_', 'log1p', 'log1p_', 'log2', 'log2_', 'log_', 'logical_and', 'logsumexp', 
'make_memmap', 'make_memmap_from_storage', 'make_memmap_from_tensor', 'map', 
'map_iter', 'masked_fill', 'masked_fill_', 'masked_select', 'max', 'maximum', 
'maximum_', 'maybe_dense_stack', 'mean', 'memmap', 'memmap_', 'memmap_like', 
'memmap_refresh_', 'min', 'minimum', 'minimum_', 'mul', 'mul_', 'named_apply', 
'names', 'nanmean', 'nansum', 'ndim', 'ndimension', 'neg', 'neg_', 'new_empty', 
'new_full', 'new_ones', 'new_tensor', 'new_zeros', 'non_tensor_items', 'norm', 
'numel', 'numpy', 'param_count', 'permute', 'pin_memory', 'pin_memory_', 'pop', 
'popitem', 'pow', 'pow_', 'prod', 'qint32', 'qint8', 'quint4x2', 'quint8', 
'reciprocal', 'reciprocal_', 'record_stream', 'recv', 'reduce', 'refine_names', 
'rename', 'rename_', 'rename_key_', 'repeat', 'repeat_interleave', 'replace', 
'requires_grad', 'requires_grad_', 'reshape', 'round', 'round_', 'save', 
'saved_path', 'select', 'send', 'separates', 'set', 'set_', 'set_at_', 
'set_non_tensor', 'setdefault', 'shape', 'share_memory_', 'sigmoid', 'sigmoid_', 
'sign', 'sign_', 'sin', 'sin_', 'sinh', 'sinh_', 'size', 'softmax', 'sorted_keys', 
'split', 'split_keys', 'sqrt', 'sqrt_', 'squeeze', 'stack', 
'stack_from_tensordict', 'stack_tensors', 'state_dict', 'std', 'sub', 'sub_', 
'sum', 'tan', 'tan_', 'tanh', 'tanh_', 'to', 'to_dict', 'to_h5', 'to_module', 
'to_namedtuple', 'to_padded_tensor', 'to_pytree', 'to_struct_array', 
'to_tensordict', 'transpose', 'trunc', 'trunc_', 'type', 'uint16', 'uint32', 
'uint64', 'uint8', 'unbind', 'unflatten', 'unflatten_keys', 'unlock_', 
'unsqueeze', 'update', 'update_', 'update_at_', 'values', 'var', 'view', 'where', 
'zero_', 'zero_grad']

I mean... just... WTTTTTTTTTTTTTTTTF!? There's even an attribute called cummin up there, which just raises even more questions than it answers. 🤣

I genuinely have no idea where to start with this beastly nightmare. Technically, I do note a private _is_tensorclass attribute in the above output. The existence of that attribute definitely suggests the current type to be a tensordict.tensorclass. There's no guarantee of that, though. Other third-party types in the wild might very well define the same private _is_tensorclass attribute.

But... I guess that's what I gotta roll with, huh? Sucks, guys. This pretty much sucks.

So. Are You Actually Suggesting Anything Useful?

Yeah. I'm not here to just complain. That's only one of the reasons I'm here. 😉

Basically, feature request #663 really needs to happen to make TensorDict usable at runtime by everybody else in the Python ecosystem. If #663 happens, then @beartype can just trivially compare the obj.__class__.__module__ and obj.__class__.__name__ dunder attributes guaranteed to exist on any arbitrary object to decide whether that object is a TensorDict dataclass or not.

This trivial test just operates on strings and thus doesn't require @beartype to import TensorDict in a global scope: e.g.,

def is_tensorclass(obj: object) -> bool:
    return (
        obj.__class__.__module__ == 'tensordict' and  # <-- yay!
        obj.__class__.__name__ == 'TensorDict'  # <-------- yay x 2!!
    )

Until that glorious release day happens, I guess the best we can do is (as suggested above):

def is_tensorclass(obj: object) -> bool:
    return type(obj) is type and hasattr(obj, '_is_tensorclass')  # <-- yikes

That's... not great. But that's what we rollin' with. I sigh! I sigh so hard. 😩

@leycec leycec added the enhancement New feature or request label Mar 1, 2025
@vmoens
Copy link
Contributor

vmoens commented Mar 1, 2025

Hey!
Thanks for writing this.
There's the TensorClass class now that allows you to do just that.

https://pytorch.org/tensordict/stable/reference/generated/tensordict.TensorClass.html#tensordict.TensorClass

Can I suggest a more neutral and direct style next time?
I'm having a hard time reading through this issue. It's also slightly... offensive?

@leycec
Copy link
Author

leycec commented Mar 4, 2025

There's the TensorClass class now that allows you to do just that.

Sure. That's cool, I guess? But... what about @tensorclass-decorated classes? They still exist. They violate type-checking, have no metaclass, and have no superclass. @beartype users want to use them but they play poorly with @beartype. I'm inclined to just roll with:

  1. A poor-man's detection scheme for @tensorclass-decorated classes: e.g.,

    def is_tensorclass(obj: object) -> bool:
        return type(obj) is type and hasattr(obj, '_is_tensorclass')
  2. Unconditionally ignoring all @tensorclass-decorated classes.

Obviously, ignoring stuff that should be type-checked isn't great. I suppose my next question would inevitably be:

"Why don't @tensorclass-decorated classes have a sane superclass like tensordict.TensorClass?"

As currently defined, users can't even annotate functions, attributes, or variables that accept any arbitrary @tensorclass-decorated class – because @tensorclass-decorated classes don't even have a common type: e.g.,

def use_tensorclass(tensorclass: object) -> object: ...

Ideally, the object type hints above would instead be something like tensordict.TensorClass. But... tensordict.TensorClass doesn't cover @tensorclass-decorated classes. Or does it? Is there a magical __instancecheck__() or __subclasscheck__() dunder method stuffed away in some metaclass somewhere that extends type-checking to @tensorclass-decorated classes? Probably... not, huh?

Are @tensorclass-decorated classes just some vestigial and increasingly obsolete API that nobody should use? Should everybody use tensordict.TensorClass subclasses instead? That'd kinda be nice from my perspective – but my perspective doesn't particularly overlap with yours. I'm just the killjoy here. 😆

Can I suggest a more neutral and direct style next time?

Possibly ideal, but pragmatically infeasible. I'm autistic. Intensely autistic. I live in a remote cabin in the Canadian wilderness. Imagine the most neurodivergent INTP you've ever met (or possibly are). Then magnify and ramify that neurodivergence by the exponentiated power of Guido.

That's why I do @beartype. It's like autism personified. 😅

I'm having a hard time reading through this issue.

tl;dr:

  1. @tensorclass-decorated dataclasses violate type-checking, which is bad. Notably, the tensordict.tensorclass._eq() method is annotated by a return type hint that is a closure <function _wrap_td_method.<locals>.wrapped_func at 0x726f56d64e00> rather than a valid type hint.
  2. @tensorclass-decorated dataclasses are unusable in app stacks type-checked by runtime type-checkers like @beartype or typeguard, which is even worse. @beartype has been integrated into PyTorch. So, ignoring this issue (and type-checking in general) wouldn't be a great look for TensorDict. But... hey. It's your package. You do you, you know?
  3. @tensorclass-decorated dataclasses have no identifiable metaclass or superclass, which is even worse. The lack of a meaningful metaclass or superclass means that no other packages can even identify (let alone actually use) @tensorclass-decorated dataclasses.
  4. @tensorclass-decorated dataclasses basically seem unusable outside of TensorDict itself.

It's also slightly... offensive?

Guh! My apologies. I absolutely don't intend offense. I just have little time to attend to third-party integration issues like this and wish that @tensorclass-decorated dataclasses behaved like... well, classes. But they don't.

I mean, technically they're classes. Sure. But nobody else in the Python ecosystem can identify or use them. Oh, well. Whatevaaahs. Definitely not here to argue. 2025 is already hard enough for Planet Earth. No need to harden matters even further with petty argumentation.

I sigh. 😩

@leycec
Copy link
Author

leycec commented Mar 6, 2025

Okay. I dug deep into the TensorDict codebase. The type-checking violation that @beartype currently raises when confronted with @tensorclass-decorated types is massively weird. I wanted to know more. I wanted to know whether @beartype or TensorDict is to blame for what's happening here.

It's TensorDict. @beartype is correct. There are critical unresolved issues in the tensordict package. Some of these issues are subtle; others, less so. But all of these issues are critical. Nobody wants to hear criticism like this. Thus, this is my concluding comment. The messenger usually gets thrown into the nearest pit. This is probably no exception.

Let's start with the obvious: "What is @beartype complaining about now?"

TensorDict Is Shadowing Builtin Types

Unfortunately, TensorDict has committed the cardinal sin of Pythonic API design. TensorDict is shadowing builtin types. In fact, TensorDict isn't shadowing just a single builtin type. TensorDict is shadowing many builtin types. This includes Python's core C-based scalar types: bool, int, and float.

Shadowing builtin types promotes obscure bugs that are non-trivial to identify, let alone debug. Shadowing builtin types is always an abysmal idea. This is no exception.

To demonstrate exactly why TensorDict's shadowing of builtin types is so problematic, I'd like to first exhibit a simple thought experiment: a class declaring just two class attributes in a manner vaguely similar to the @tensorclass decorator.

class OhGods(object):
    def bool(self) -> bool: return False
    def __eq__(self, other: object) -> bool: return False

@tensorclass-decorated types declare both the bool() and __eq__() methods with signatures similar to those above. Clearly, the OhGods.bool() method shadows the builtin bool type. Does this shadowing negatively interact with the bool return type hints annotating these two methods?

Yes. Tragically, it does:

# What is the return type hint annotating the bool() method defined above?
>>> OhGods.bool.__annotations__['return']
<class 'bool'>  # <-- *GOOD.* That's exactly what we expect.

# What is the return type hint annotating the __eq__() method defined above?
>>> OhGods.__eq__.__annotations__['return']
<function OhGods.bool at 0x7fd72d75c360>  # <-- *HOLY SHIT WTFFFFFFFFFFFF!*

As expected, the return type hint annotating the OhGods.bool() method is just bool. That's good. Unexpectedly, however, the return type hint annotating the __eq__() method is... wait. Isn't that the bool() method!?

Indeed, it is. Since the OhGods.bool() method has shadowed the builtin bool type within the lexical context of the body of the OhGods class, CPython implicitly replaces all references to bool throughout the remainder of the body of that class with references to the OhGods.bool() method instead. This includes the bool return type hint annotating the OhGods.__eq__() dunder method, which now refers to the OhGods.bool() method rather than the standard bool type.

But wait! You're now thinking:

"But none of the above applies to TensorDict. The tensordict.tensorclass submodule doesn't define a global bool attribute. The __eq__() method is defined as a global private tensordict.tensorclass._eq() function. Since that function is defined at global scope and since no bool attribute is defined at global scope, TensorDict isn't actually shadowing the bool type. Ah-hah! Checkmate. Truly, you are dumb."

Indeed, your thoughts would have been correct. Sadly...

TensorDict Enabled PEP 563, The Ultimate Typing Satan

Every TensorDict submodule leads with from __future__ import annotations, enabling PEP 563 across the entire codebase. PEP 563 is extraordinarily problematic from many perspectives. PEP 563 is so problematic it delayed CPython 3.10 by several months, had to be conditionally disabled by default, and will be permanently removed in either CPython 3.14 or 3.15.

Generally speaking, PEP 563 should never be enabled. There are no genuine use cases for doing so and innumerable use cases for not doing so. This is the latter.

By enabling PEP 563, TensorDict is instructing type-checkers to defer the evaluation of its type hints to their point of use rather than their point of declaration. Notably:

# PEP 563 implicitly transforms this declaration of your _eq() function...
def _eq(self, other: object) -> bool:

# ...into this equivalent declaration whose type hints are all stringified.
def _eq(self, other: 'object') -> 'bool':

You monkey-patch the above globally scoped _eq() function into each @tensorclass-decorated type as a new class-scoped __eq__() function. Naturally, the stringified return type hint 'bool' is still a string at that point.

Type-checkers then resolve that stringified return type hint by dynamically evaluating that hint within the lexical context of the body of that @tensorclass-decorated type, which shadows the bool type with a TensorDict-specific bool() method.

Because you have enabled PEP 563, all of the above applies to TensorDict. Woops.

Clearly, methods are not valid type hints. When confronted with invalid type hints, type-checkers raise exceptions. This is exactly what @beartype is doing when confronted with @tensorclass-decorated types – and @beartype is correct in doing so.

TensorDict has committed the cardinal sin of Pythonic API design. Can TensorDict repent of its sins? Numerous solutions exist. Let's examine a few.

Solution 1: Stop Shadowing Builtin Types

The most obvious solution is to simply stop shadowing builtin types. Consider NumPy, a comparable tensor-centric API. NumPy intentionally avoids shadowing builtin types by suffixing NumPy-specific types sharing similar names as those of builtin types with underscore characters. This disambiguation avoids conflicts, ambiguities, and type shadowing. The numpy.bool_ and numpy.int_ types do not shadow the builtin bool and int types.

This obvious solution is also the most painful solution from the perspective of downstream consumers and reverse dependencies, who would correctly consider this to be a backward compatibility-breaking change. One responsible approach would be to:

  • Define new bool_(), int_(), and float_() methods.
  • Emit non-fatal DeprecationWarning warnings from TensorDict's existing bool(), int(), and float() methods encouraging users to switch to their _-suffixed alternatives.
  • Remove the problematic bool(), int(), and float() methods in a future release.

But suppose you hate the most obvious solution. Is there anything else you can do? There is plenty. Consider, for example...

Solution 3: Disable PEP 563

Removing the from __future__ import annotations statement prefixing each TensorDict submodule is another obvious solution. This might or might not be a valid long-term solution. It's certainly an easy solution. But does that solution actually hold up against PEP 649 and PEP 749 in Python 3.14?

No idea. Maybe. Maybe not. It's kinda hard to say. PEPs 649 and 749 are still moving targets subject to vitriolic (and I do mean vitriolic) ongoing debate.

Under PEPs 649 and 749, Python ≥ 3.14 will evaluate annotations lazily – just like under PEP 563. It's probably best not to assume that annotations will be evaluated in a global scope as you'd like them to be. Which leads us to...

Solution 3: Stop Annotating Methods with Shadowed Types

This is a weird one. Yet, it might actually work. Instead of annotating methods as returning the shadowed bool type, you could instead annotate methods as returning an unshadowed bool_ type: e.g.,

# Yes, this is insane. Yes, this *MIGHT* just work.
bool_ = bool

def _eq(self, other: object) -> bool_:  # <-- note "bool_" rather than "bool"

Of course, that's completely untested. Still, that's insane enough that it might just work. The disadvantages are numerous, however:

  • Fragility. You'll absolutely have to use bool_ rather than bool everywhere. If you use even a single instance of bool anywhere, then you're shadowing the builtin bool type yet again. So, right back to square one.
  • Unmaintainability. Good luck explaining to anyone why you need to use bool_ rather than bool everywhere. 🤣

Even worse solutions exist, including...

Solution 4: Just Remove All Your Type Hints, Because They're Busted

Seriously. This is an awful solution – but it's better than what TensorDict currently is.

TensorDict currently resides in the Uncanny QA Valley: you've annotated a lot of different things with type hints everywhere, but you haven't actually tested any of those type hints with type-checking anywhere. The result? Massive issues.

Are you aware of just how many type-checking errors TensorDict currently has? It's bad. It's beyond bad. I couldn't believe my eyes when I saw just how bad the existing QA situation in TensorDict is. I've never seen something similar. I don't even know how you'd begin to go about repairing this badness. Maybe just burn the entire thing down to the ground and start over fresh?

No idea. But this is sheer madness:

$ pwd
/tmp/tensordict
$ ls
_C.so		_lazy.py	_reductions.py	_unbatched.py  nn	     return_types.py  utils.py
__init__.py	_nestedkey.py	_td.py		base.py        persistent.py  tensorclass.py   version.py
__pycache__	_nestedkey.pyi	_tensordict	functional.py  prototype      tensorclass.pyi
_contextlib.py	_pytree.py	_torch_func.py	memmap.py      py.typed       tensordict.py
$ mypy .
...
Found 2148 errors in 29 files (checked 36 source files)
$ pyright
...
2701 errors, 493 warnings, 0 informations

Seriously. mypy uncovered over 2000 type-checking errors. pyright, over 2500. Although many of those so-called "errors" are actually just false positives, just as many or more are actually valid errors.

@beartype bailed out on the first of these errors. Perhaps TensorDict should do so, too. The only thing worse than no type hints is untested type hints. If you remove all of the type hints across the entire TensorDict codebase, then runtime type-checking issues go away.

It's magical, but also horrible. Obviously, nobody wants to remove type hints. It took you time and effort to wrote those type hints. You're certainly not walking that back. Right?

No worries. I agree. Still, open-source volunteerism is only finite. We can't QA everything.

We Hate All of Your Solutions

...yeah. Kinda thought you might say that. From my perspective, @beartype is at a bit of a loss as to what to do with TensorDict. We either:

  • Keep trying to type-check TensorDict like we currently do. Type-checking is what @beartype does. That's good. But type-checking TensorDict is guaranteed to fail. That's bad. If we keep doing this, then @beartype users will be unable to use TensorDict. That's also bad. But those same users might be inclined to complain to you about this. That's good. You might then be inclined to begin type-checking TensorDict yourselves and resolving these issues. That's even better.
  • Ignore TensorDict. Also, possibly emit a non-fatal warning advising @beartype users to avoid TensorDict entirely by preferring alternative third-party packages that actually play nicely with type-checking. Clearly, that's kinda hostile. But... you're kinda leaving @beartype no choice here. If I felt that TensorDict were genuinely interested in resolving its type-checking woes, @beartype would probably just silently ignore TensorDict without emitting warnings. But... I'm concerned that TensorDict's litany of outstanding type-checking issues will never be resolved. That's a compelling QA issue. @beartype users are as obsessed with QA as I am. My users would definitely be interested in learning about QA issues in TensorDict. A warning from @beartype is how they'd learn about that.

There's a lot for me to chew on here. Unsurprisingly, I'm currently leaning towards the latter: bitterly complain about TensorDict QA but otherwise avoid type-checking anything in TensorDict.

Thanks so much for reading this far. Of course, nobody read this far. In the unlikely chance that someone did read this far, I humbly apologize for squandering your entire week. We should probably have just played video games instead. At least I'd finally have defeated the Dread Lord Albalos then! 😄

@vmoens
Copy link
Contributor

vmoens commented Mar 6, 2025

We appreciate your feedback, but we currently have other priorities that need to be addressed. We may revisit this issue in the future.

If you want to help us solve these pain points, feel free to submit small scale PRs! This is an open-source project, contributions are more than welcome.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants