Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

训练的时候报错ValueError: The current device_map had weights offloaded to the disk. #153

Open
SKY-ZW opened this issue Aug 24, 2023 · 11 comments
Labels

Comments

@SKY-ZW
Copy link

SKY-ZW commented Aug 24, 2023

Traceback (most recent call last):
File "/content/zero_nlp/chatglm_v2_6b_lora/main.py", line 470, in
main()
File "/content/zero_nlp/chatglm_v2_6b_lora/main.py", line 133, in main
model = AutoModel.from_pretrained(
File "/usr/local/lib/python3.10/dist-packages/transformers/models/auto/auto_factory.py", line 479, in from_pretrained
return model_class.from_pretrained(
File "/usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py", line 2881, in from_pretrained
) = cls._load_pretrained_model(
File "/usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py", line 2980, in _load_pretrained_model
raise ValueError(
ValueError: The current device_map had weights offloaded to the disk. Please provide an offload_folder for them. Alternatively, make sure you have safetensors installed if the model you are using offers the weights in this format

@yuanzhoulvpi2017
Copy link
Owner

  1. 是否改了代码?
  2. transformers版本是不是最新的,最好更新到最新的。

@SKY-ZW
Copy link
Author

SKY-ZW commented Aug 24, 2023

没有修改代码
Name: transformers
Version: 4.30.2

@yuanzhoulvpi2017
Copy link
Owner

“The current device_map had weights offloaded to the disk” 这句话值得是设备没有选好,可能是被分配到disk(也就是硬盘上了)。你检查一下

@SKY-ZW
Copy link
Author

SKY-ZW commented Aug 24, 2023

我用的是COLAB ,是不是需要本地的显卡 ?

@yuanzhoulvpi2017
Copy link
Owner

原理上来说,在colab里面也是可以跑的。但是这个报错就很奇怪。

我估计是transformers包、accelerate包、peft包的版本问题,你都更新到最新的版本试一试。

如果再出错,我也就不清楚了

@SKY-ZW
Copy link
Author

SKY-ZW commented Aug 24, 2023

我在colab里面全部pip install --upgrade 下面几个, 版本如下,但是还是一样的错误,需要指定版本吗?

Name: transformers
Version: 4.30.2
Summary: State-of-the-art Machine Learning for JAX, PyTorch and TensorFlow
Home-page: https://github.com/huggingface/transformers
Author: The Hugging Face team (past and future) with the help of all our contributors (https://github.com/huggingface/transformers/graphs/contributors)
Author-email: [[email protected]](mailto:[email protected])
License: Apache 2.0 License
Location: /usr/local/lib/python3.10/dist-packages
Requires: filelock, huggingface-hub, numpy, packaging, pyyaml, regex, requests, safetensors, tokenizers, tqdm
Required-by: peft
---
Name: accelerate
Version: 0.22.0
Summary: Accelerate
Home-page: https://github.com/huggingface/accelerate
Author: The HuggingFace team
Author-email: [[email protected]](mailto:[email protected])
License: Apache
Location: /usr/local/lib/python3.10/dist-packages
Requires: numpy, packaging, psutil, pyyaml, torch
Required-by: peft
---
Name: peft
Version: 0.5.0
Summary: Parameter-Efficient Fine-Tuning (PEFT)
Home-page: https://github.com/huggingface/peft
Author: The HuggingFace team
Author-email: [[email protected]](mailto:[email protected])
License: Apache
Location: /usr/local/lib/python3.10/dist-packages
Requires: accelerate, numpy, packaging, psutil, pyyaml, safetensors, torch, tqdm, transformers
Required-by:

@yuanzhoulvpi2017
Copy link
Owner

我也搞不懂了~

@SKY-ZW
Copy link
Author

SKY-ZW commented Aug 24, 2023

我只是改了一下train.sh里面的--model_name_or_path /content/zero_nlp/chatglm_v2_6b_lora/chatglm2-6b 这个路径,其他都没动过

@yuanzhoulvpi2017
Copy link
Owner

这个不影响的,不清楚了

@SKY-ZW
Copy link
Author

SKY-ZW commented Aug 25, 2023

我在133行加了offload_folder = "offload_folder" 以后就不报上面错了 ,但是又出现另外一个问题: Loading checkpoint shards: 71% 5/7 [01:03<00:23, 11.53s/it]train.sh: line 24: 5399 Killed

@SKY-ZW
Copy link
Author

SKY-ZW commented Aug 25, 2023

感觉是内存用多了 被COLAB给KILL了

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants