-
Notifications
You must be signed in to change notification settings - Fork 190
Open
Labels
bugSomething isn't workingSomething isn't working
Description
Using the latest nequip develop (private) with pytorch_2.9, I see the following warnings in my training output:
Firstly, TF32 related warnings:
/n/home03/skavanagh/miniconda3/envs/pytorch_2.9/lib/python3.13/site-packages/torch/__init__.py:1551: UserWarning: Please use the new API settings to control TF32 behavior, such as torch.backends.cudnn.conv.fp32_precision = 'tf32' or torch.backends.cuda.matmul.fp32_precision = 'ieee'. Old settings, e.g, torch.backends.cuda.matmul.allow_tf32 = True, torch.backends.cudnn.allow_tf32 = True, allowTF32CuDNN() and allowTF32CuBLAS() will be deprecated after Pytorch 2.9. Please see https://pytorch.org/docs/main/notes/cuda.html#tensorfloat-32-tf32-on-ampere-and-later-devices (Triggered internally at /pytorch/aten/src/ATen/Context.cpp:80.)
return _C._get_float32_matmul_precision()
You are using a CUDA device ('NVIDIA A100-SXM4-40GB MIG 3g.20gb') that has Tensor Cores. To properly utilize them, you should set `torch.set_float32_matmul_precision('medium' | 'high')` which will trade-off precision for performance. For more details, read https://pytorch.org/docs/stable/generated/torch.set_float32_matmul_precision.html#torch.set_float32_matmul_precision
Here it's implying that TF32 is not being used, but I am using TF32Scheduler with 0: true, which worked fine with previous versions (and did not show any of these warnings), and the latest version of nequip develop I thought had the required PT 2.9 TF32 backend updates. This shows after VAL RUN START and before Initializing distributed....
Then I also get these warnings, before the first validation/training loop:
/n/home03/skavanagh/miniconda3/envs/pytorch_2.9/lib/python3.13/site-packages/pydantic/_internal/_generate_schema.py:2249: UnsupportedFieldAttributeWarning: The 'repr' attribute with value False was provided to the `Field()` function, which has no effect in the context it was used. 'repr' is field-specific metadata, and can only be attached to a model field using `Annotated` metadata or by assignment. This may have happened because an `Annotated` type alias using the `type` statement was used, or if the `Field()` function was attached to a single member of a union type.
warnings.warn(
/n/home03/skavanagh/miniconda3/envs/pytorch_2.9/lib/python3.13/site-packages/pydantic/_internal/_generate_schema.py:2249: UnsupportedFieldAttributeWarning: The 'frozen' attribute with value True was provided to the `Field()` function, which has no effect in the context it was used. 'frozen' is field-specific metadata, and can only be attached to a model field using `Annotated` metadata or by assignment. This may have happened because an `Annotated` type alias using the `type` statement was used, or if the `Field()` function was attached to a single member of a union type.
warnings.warn(
...
Validation DataLoader 0: 0%| | 0/6916 [00:00<?, ?it/s][rank0]:W1020 16:19:48.288000 3814862 site-packages/torch/fx/experimental/symbolic_shapes.py:6833] _maybe_guard_rel() was called on non-relation expression Eq(s52, s86) | Eq(s86, 1)
[rank0]:W1020 16:19:59.949000 3814862 site-packages/torch/fx/experimental/symbolic_shapes.py:6833] _maybe_guard_rel() was called on non-relation expression Eq(s52, s86) | Eq(s86, 1)
/n/home03/skavanagh/miniconda3/envs/pytorch_2.9/lib/python3.13/site-packages/torch/_inductor/compile_fx.py:312: UserWarning: TensorFloat32 tensor cores for float32 matrix multiplication available but not enabled. Consider setting `torch.set_float32_matmul_precision('high')` for better performance.
warnings.warn(
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working