Skip to content

CT cast_to_fp4 torch recompile issues #734

Description

@kylesayrs

When running basic generation with NVFP4 models, I sometimes see torch recompilation issues.

Occurred while running examples/quantization_w4a4_fp4/qwen_30b_a3b.py

========== SAMPLE GENERATION ==============
[transformers] The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
W0612 01:46:22.922000 871313 torch/_dynamo/convert_frame.py:1853] [0/8] torch._dynamo hit config.recompile_limit (8)
W0612 01:46:22.922000 871313 torch/_dynamo/convert_frame.py:1853] [0/8]    function: 'cast_to_fp4' (/home/kylesayrs/compressed-tensors/src/compressed_tensors/quantization/quant_args.py:55)
W0612 01:46:22.922000 871313 torch/_dynamo/convert_frame.py:1853] [0/8]    last reason: 0/7: tensor 'x' rank mismatch. expected 4, actual 3
W0612 01:46:22.922000 871313 torch/_dynamo/convert_frame.py:1853] [0/8] To log all recompilation reasons, use TORCH_LOGS="recompiles".
W0612 01:46:22.922000 871313 torch/_dynamo/convert_frame.py:1853] [0/8] To diagnose recompilation issues, see https://docs.pytorch.org/docs/main/user_guide/torch_compiler/compile/programming_model.recompilation.html

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions