Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

T5 model: There were missing keys in the checkpoint model loaded: ['encoder.embed_tokens.weight', 'decoder.embed_tokens.weight', 'lm_head.weight']. #27972

Closed
2 of 4 tasks
alexcoca opened this issue Dec 12, 2023 · 14 comments
Assignees

Comments

@alexcoca
Copy link

alexcoca commented Dec 12, 2023

System Info

  • transformers version: 4.35.2
  • Platform: Linux-5.4.0-148-generic-x86_64-with-glibc2.27
  • Python version: 3.10.11
  • Huggingface_hub version: 0.19.4
  • Safetensors version: 0.4.0
  • Accelerate version: 0.24.1
  • Accelerate config: not found
  • PyTorch version (GPU?): 1.13.1+cu117 (True)
  • Tensorflow version (GPU?): not installed (NA)
  • Flax version (CPU?/GPU?/TPU?): not installed (NA)
  • Jax version: not installed
  • JaxLib version: not installed
  • Using GPU in script?: yes (RTX3090)
  • Using distributed or parallel set-up in script? no

Who can help?

No response

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

Steps to reproduce.

  1. Run any transformer example fine-tuning a t5 model (I am using Salesforce/codet5p-220m but the issue can probably be reproduced with other T5 models (certainly FlanT5)
  2. Stop the trainer
  3. Restart the training using the restart_from_chekpoint=True CLI option and setting output_dir to be the checkpoint directory (ie where the checkpoint-[step] directories are created)
  4. Observe the warning:

[WARNING|trainer.py:2231] 2023-12-12 11:09:58,921 >> There were missing keys in the checkpoint model loaded: ['encoder.embed_tokens.weight', 'decoder.embed_tokens.weight', 'lm_head.weight'].

Expected behavior

Either there is no warning or the warning message tells the user if the warning applies to them. My intuition here is that nothing is wrong: I am using T5ForConditionlGeneration out of the box (so no lm_head) and the encoder and decoder enmbedings are tied (and hopefully loaded ?!). Is this a case of extending the warning to be more explicit?

@younesbelkada

@wuyuhan111111
Copy link

wuyuhan111111 commented Dec 14, 2023

I also have this issue, and once prompted, the training will be terminated directly. Have you resolved it?
image
When this happens, the training will be immediately terminated. What problem would it be? Thank you first。

@amyeroberts
Copy link
Collaborator

cc @muellerzr @pacman100 as the warning seems to be coming from trainer

@valentas-kurauskas
Copy link

I also get with run_summarizaton.py and --model_name_or_path "google/mt5-base"

.. missing keys in the checkpoint model loaded: ['encoder.embed_tokens.weight', 'decoder.embed_tokens.weight']

But fine-tuning continues from the last checkpoint rather than crashing. However, eval_loss increases for the next checkpoint after restart, suggesting these weights are important and are really not saved/reloaded.

@muellerzr
Copy link
Contributor

Related to #27293

@valentas-kurauskas
Copy link

@muellerzr thanks for linking to the issue. But the solution mentioned there is for accelerate, and in this case I have a problem with checkpoints saved by Trainer.

@NeuralNimbus
Copy link

Facing the same issue for all T5 as well as RoBERTa models. Any solution yet?

@alexcoca
Copy link
Author

alexcoca commented Jan 24, 2024

@muellerzr and @pacman100 - it's slightly concerning that this warning still appears. Is there any understanding of what transformers release guarantees correct checkpoint saving & loading? I have (natively) used the library to implement my next research paper, but I don't know whether or not I can actually use any of the models given the warning on model loading? Let's chat and see how we can get to the bottom of this.

@muellerzr
Copy link
Contributor

@alexcoca can you give us a full clean reproducer please? That's the best way we can help. (Cc @Narsil)

Copy link

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

@AwaysAbdiwahid
Copy link

also, I had a similar issue with training the Bart model for abstractive-based text summarization.
but I thought that the problem was caused by how I initialized my model because I configured the model using BartConfig class and also ByteLevelBPETokenizer for text tokenization. but I am wonder if this issue has an impact of the model performance

@TopCoder2K
Copy link

I'm facing a similar warning with DistilBertForMaskedLM:

There were missing keys in the checkpoint model loaded: ['vocab_projector.weight'].

I'm saving with

trainer.save_model(dir)

and loading with

AutoModelForMaskedLM.from_pretrained(dir).to(self.device)

I have encountered several times that setting save_safetensors=False helps. Is this the right solution if I don't need the safetensors format, @muellerzr?

@thistlillo
Copy link

+1 here on January 31, 2025.

@sebastian-montero
Copy link

Closed? Im facing this issue as well..

@JamePeng
Copy link

+2 here on Feb. 24, 2025.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests