Support `extra_state` attributes in from_pretrained #38154

pstjohn · 2025-05-15T14:29:52Z

System Info

transformers main branch, python 3.12.

Who can help?

No response

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction

Using TransformerEngine layers as an example, which add fp8 metadata to the _extra_state key:

from transformers import PretrainedConfig, PreTrainedModel
from transformer_engine.pytorch import TransformerLayer


class SimpleTEConfig(PretrainedConfig):
    def __init__(self, **kwargs):
        super().__init__(**kwargs)
        self.hidden_size = 320
        self.intermediate_size = 1024
        self.num_attention_heads = 16


class SimpleTEModel(PreTrainedModel):
    config_class = SimpleTEConfig

    def __init__(self, config: SimpleTEConfig):
        super().__init__(config)
        self.te_layer = TransformerLayer(
            hidden_size=config.hidden_size,
            ffn_hidden_size=config.intermediate_size,
            num_attention_heads=config.num_attention_heads,
        )

    def forward(self, hidden_states, attention_mask):
        return self.te_layer(hidden_states, attention_mask)


def test_simple_te_model(tmp_path):
    config = SimpleTEConfig()
    model = SimpleTEModel(config)

    model.save_pretrained(tmp_path / "simple_te_model")
    del model
    model = SimpleTEModel.from_pretrained(tmp_path / "simple_te_model")
    assert isinstance(model.te_layer, TransformerLayer)

Expected behavior

from_pretrained should pass the deserialized extra_state value to the nn.Module's from_state_dict method; which will then call into set_extra_state. https://docs.pytorch.org/docs/stable/generated/torch.nn.Module.html#torch.nn.Module.get_extra_state.

Instead, the loading fails on get_parameter_or_buffer:

>       raise AttributeError(f"`{target}` is neither a parameter nor a buffer.")
E       AttributeError: `te_layer.layernorm_mlp._extra_state` is neither a parameter nor a buffer.

The text was updated successfully, but these errors were encountered:

Rocketknight1 · 2025-05-16T12:22:33Z

Flagging @Cyrilvallez who worked on that file recently! See also the PR at #38155

pstjohn added the bug label May 15, 2025

pstjohn linked a pull request May 15, 2025 that will close this issue

[core] support tensor-valued _extra_state values in from_pretrained #38155

Open

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support `extra_state` attributes in from_pretrained #38154

Support `extra_state` attributes in from_pretrained #38154

pstjohn commented May 15, 2025

Rocketknight1 commented May 16, 2025

Support extra_state attributes in from_pretrained #38154

Support extra_state attributes in from_pretrained #38154

Comments

pstjohn commented May 15, 2025

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

Rocketknight1 commented May 16, 2025

Support `extra_state` attributes in from_pretrained #38154

Support `extra_state` attributes in from_pretrained #38154