T5 model: There were missing keys in the checkpoint model loaded: ['encoder.embed_tokens.weight', 'decoder.embed_tokens.weight', 'lm_head.weight']. #27972

alexcoca · 2023-12-12T11:32:47Z

System Info

transformers version: 4.35.2
Platform: Linux-5.4.0-148-generic-x86_64-with-glibc2.27
Python version: 3.10.11
Huggingface_hub version: 0.19.4
Safetensors version: 0.4.0
Accelerate version: 0.24.1
Accelerate config: not found
PyTorch version (GPU?): 1.13.1+cu117 (True)
Tensorflow version (GPU?): not installed (NA)
Flax version (CPU?/GPU?/TPU?): not installed (NA)
Jax version: not installed
JaxLib version: not installed
Using GPU in script?: yes (RTX3090)
Using distributed or parallel set-up in script? no

Who can help?

No response

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction

Steps to reproduce.

Run any transformer example fine-tuning a t5 model (I am using Salesforce/codet5p-220m but the issue can probably be reproduced with other T5 models (certainly FlanT5)
Stop the trainer
Restart the training using the restart_from_chekpoint=True CLI option and setting output_dir to be the checkpoint directory (ie where the checkpoint-[step] directories are created)
Observe the warning:

[WARNING|trainer.py:2231] 2023-12-12 11:09:58,921 >> There were missing keys in the checkpoint model loaded: ['encoder.embed_tokens.weight', 'decoder.embed_tokens.weight', 'lm_head.weight'].

Expected behavior

Either there is no warning or the warning message tells the user if the warning applies to them. My intuition here is that nothing is wrong: I am using T5ForConditionlGeneration out of the box (so no lm_head) and the encoder and decoder enmbedings are tied (and hopefully loaded ?!). Is this a case of extending the warning to be more explicit?

@younesbelkada

The text was updated successfully, but these errors were encountered:

wuyuhan111111 · 2023-12-14T08:06:14Z

I also have this issue, and once prompted, the training will be terminated directly. Have you resolved it?

When this happens, the training will be immediately terminated. What problem would it be? Thank you first。

amyeroberts · 2023-12-14T15:50:25Z

cc @muellerzr @pacman100 as the warning seems to be coming from trainer

valentas-kurauskas · 2024-01-03T11:24:33Z

I also get with run_summarizaton.py and --model_name_or_path "google/mt5-base"

.. missing keys in the checkpoint model loaded: ['encoder.embed_tokens.weight', 'decoder.embed_tokens.weight']

But fine-tuning continues from the last checkpoint rather than crashing. However, eval_loss increases for the next checkpoint after restart, suggesting these weights are important and are really not saved/reloaded.

muellerzr · 2024-01-05T14:46:04Z

Related to #27293

valentas-kurauskas · 2024-01-05T15:17:16Z

@muellerzr thanks for linking to the issue. But the solution mentioned there is for accelerate, and in this case I have a problem with checkpoints saved by Trainer.

NeuralNimbus · 2024-01-24T19:16:00Z

Facing the same issue for all T5 as well as RoBERTa models. Any solution yet?

alexcoca · 2024-01-24T19:28:27Z

@muellerzr and @pacman100 - it's slightly concerning that this warning still appears. Is there any understanding of what transformers release guarantees correct checkpoint saving & loading? I have (natively) used the library to implement my next research paper, but I don't know whether or not I can actually use any of the models given the warning on model loading? Let's chat and see how we can get to the bottom of this.

muellerzr · 2024-01-24T20:22:19Z

@alexcoca can you give us a full clean reproducer please? That's the best way we can help. (Cc @Narsil)

github-actions · 2024-02-18T08:04:19Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

AwaysAbdiwahid · 2024-10-17T13:49:46Z

also, I had a similar issue with training the Bart model for abstractive-based text summarization.
but I thought that the problem was caused by how I initialized my model because I configured the model using BartConfig class and also ByteLevelBPETokenizer for text tokenization. but I am wonder if this issue has an impact of the model performance

TopCoder2K · 2024-12-26T09:46:17Z

I'm facing a similar warning with DistilBertForMaskedLM:

There were missing keys in the checkpoint model loaded: ['vocab_projector.weight'].

I'm saving with

trainer.save_model(dir)

and loading with

AutoModelForMaskedLM.from_pretrained(dir).to(self.device)

I have encountered several times that setting save_safetensors=False helps. Is this the right solution if I don't need the safetensors format, @muellerzr?

thistlillo · 2025-01-31T09:37:53Z

+1 here on January 31, 2025.

sebastian-montero · 2025-02-05T13:49:08Z

Closed? Im facing this issue as well..

JamePeng · 2025-02-23T22:02:43Z

+2 here on Feb. 24, 2025.

muellerzr self-assigned this Jan 5, 2024

muellerzr mentioned this issue Jan 5, 2024

Accelerate FSDP always removed {'model.norm.weight'} layer of model when saving them huggingface/accelerate#2155

Closed

4 tasks

Aisuko mentioned this issue Feb 20, 2024

bart-large-xsum model: There were missing keys in the checkpoint model loaded: ['model.encoder.embed_tokens.weight', 'model.decoder.embed_tokens.weight', 'lm_head.weight']. #29128

Closed

4 tasks

github-actions bot closed this as completed Feb 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

T5 model: There were missing keys in the checkpoint model loaded: ['encoder.embed_tokens.weight', 'decoder.embed_tokens.weight', 'lm_head.weight']. #27972

T5 model: There were missing keys in the checkpoint model loaded: ['encoder.embed_tokens.weight', 'decoder.embed_tokens.weight', 'lm_head.weight']. #27972

alexcoca commented Dec 12, 2023 •

edited

Loading

wuyuhan111111 commented Dec 14, 2023 •

edited

Loading

amyeroberts commented Dec 14, 2023

valentas-kurauskas commented Jan 3, 2024

muellerzr commented Jan 5, 2024

valentas-kurauskas commented Jan 5, 2024

NeuralNimbus commented Jan 24, 2024

alexcoca commented Jan 24, 2024 •

edited

Loading

muellerzr commented Jan 24, 2024

github-actions bot commented Feb 18, 2024

AwaysAbdiwahid commented Oct 17, 2024

TopCoder2K commented Dec 26, 2024

thistlillo commented Jan 31, 2025

sebastian-montero commented Feb 5, 2025

JamePeng commented Feb 23, 2025

T5 model: There were missing keys in the checkpoint model loaded: ['encoder.embed_tokens.weight', 'decoder.embed_tokens.weight', 'lm_head.weight']. #27972

T5 model: There were missing keys in the checkpoint model loaded: ['encoder.embed_tokens.weight', 'decoder.embed_tokens.weight', 'lm_head.weight']. #27972

Comments

alexcoca commented Dec 12, 2023 • edited Loading

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

wuyuhan111111 commented Dec 14, 2023 • edited Loading

amyeroberts commented Dec 14, 2023

valentas-kurauskas commented Jan 3, 2024

muellerzr commented Jan 5, 2024

valentas-kurauskas commented Jan 5, 2024

NeuralNimbus commented Jan 24, 2024

alexcoca commented Jan 24, 2024 • edited Loading

muellerzr commented Jan 24, 2024

github-actions bot commented Feb 18, 2024

AwaysAbdiwahid commented Oct 17, 2024

TopCoder2K commented Dec 26, 2024

thistlillo commented Jan 31, 2025

sebastian-montero commented Feb 5, 2025

JamePeng commented Feb 23, 2025

alexcoca commented Dec 12, 2023 •

edited

Loading

wuyuhan111111 commented Dec 14, 2023 •

edited

Loading

alexcoca commented Jan 24, 2024 •

edited

Loading