-
Notifications
You must be signed in to change notification settings - Fork 328
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
训练的时候报错ValueError: The current device_map
had weights offloaded to the disk.
#153
Comments
|
没有修改代码 |
“The current device_map had weights offloaded to the disk” 这句话值得是设备没有选好,可能是被分配到disk(也就是硬盘上了)。你检查一下 |
我用的是COLAB ,是不是需要本地的显卡 ? |
原理上来说,在colab里面也是可以跑的。但是这个报错就很奇怪。 我估计是transformers包、accelerate包、peft包的版本问题,你都更新到最新的版本试一试。 如果再出错,我也就不清楚了 |
我在colab里面全部pip install --upgrade 下面几个, 版本如下,但是还是一样的错误,需要指定版本吗? Name: transformers
Version: 4.30.2
Summary: State-of-the-art Machine Learning for JAX, PyTorch and TensorFlow
Home-page: https://github.com/huggingface/transformers
Author: The Hugging Face team (past and future) with the help of all our contributors (https://github.com/huggingface/transformers/graphs/contributors)
Author-email: [[email protected]](mailto:[email protected])
License: Apache 2.0 License
Location: /usr/local/lib/python3.10/dist-packages
Requires: filelock, huggingface-hub, numpy, packaging, pyyaml, regex, requests, safetensors, tokenizers, tqdm
Required-by: peft
---
Name: accelerate
Version: 0.22.0
Summary: Accelerate
Home-page: https://github.com/huggingface/accelerate
Author: The HuggingFace team
Author-email: [[email protected]](mailto:[email protected])
License: Apache
Location: /usr/local/lib/python3.10/dist-packages
Requires: numpy, packaging, psutil, pyyaml, torch
Required-by: peft
---
Name: peft
Version: 0.5.0
Summary: Parameter-Efficient Fine-Tuning (PEFT)
Home-page: https://github.com/huggingface/peft
Author: The HuggingFace team
Author-email: [[email protected]](mailto:[email protected])
License: Apache
Location: /usr/local/lib/python3.10/dist-packages
Requires: accelerate, numpy, packaging, psutil, pyyaml, safetensors, torch, tqdm, transformers
Required-by:
|
我也搞不懂了~ |
我只是改了一下train.sh里面的--model_name_or_path /content/zero_nlp/chatglm_v2_6b_lora/chatglm2-6b 这个路径,其他都没动过 |
这个不影响的,不清楚了 |
我在133行加了offload_folder = "offload_folder" 以后就不报上面错了 ,但是又出现另外一个问题: Loading checkpoint shards: 71% 5/7 [01:03<00:23, 11.53s/it]train.sh: line 24: 5399 Killed |
感觉是内存用多了 被COLAB给KILL了 |
Traceback (most recent call last):
File "/content/zero_nlp/chatglm_v2_6b_lora/main.py", line 470, in
main()
File "/content/zero_nlp/chatglm_v2_6b_lora/main.py", line 133, in main
model = AutoModel.from_pretrained(
File "/usr/local/lib/python3.10/dist-packages/transformers/models/auto/auto_factory.py", line 479, in from_pretrained
return model_class.from_pretrained(
File "/usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py", line 2881, in from_pretrained
) = cls._load_pretrained_model(
File "/usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py", line 2980, in _load_pretrained_model
raise ValueError(
ValueError: The current
device_map
had weights offloaded to the disk. Please provide anoffload_folder
for them. Alternatively, make sure you havesafetensors
installed if the model you are using offers the weights in this formatThe text was updated successfully, but these errors were encountered: