Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

deepseek-v3微调报错 #2842

Open
elimsjxr opened this issue Jan 2, 2025 · 3 comments
Open

deepseek-v3微调报错 #2842

elimsjxr opened this issue Jan 2, 2025 · 3 comments

Comments

@elimsjxr
Copy link

elimsjxr commented Jan 2, 2025

Describe the bug
What the bug is, and how to reproduce, better with screenshots(描述bug以及复现过程,最好有截图)

升腾npu跑deepseek-v3 lora微调,用的v3.0.1分支代码:
命令:
swift sft
--model /work/share/weights/DeepSeek-V3
--dataset AI-ModelScope/alpaca-gpt4-data-en
--train_type lora
--output_dir output \

报错:
[INFO:swift] Loading the model using model_dir: /work/share/weights/DeepSeek-V3
[WARNING:swift] torch_dtype: torch.bfloat16, but support_bf16: False.
[INFO:swift] model_kwargs: {'device_map': 'npu:0'}
Traceback (most recent call last):
File "/work/share/ms-swift-3.0.1/swift/cli/sft.py", line 5, in
sft_main()
File "/work/share/ms-swift-3.0.1/swift/llm/train/sft.py", line 272, in sft_main
return SwiftSft(args).main()
File "/work/share/ms-swift-3.0.1/swift/llm/train/sft.py", line 30, in init
self._prepare_model_tokenizer()
File "/work/share/ms-swift-3.0.1/swift/llm/train/sft.py", line 62, in _prepare_model_tokenizer
self.model, self.processor = args.get_model_processor()
File "/work/share/ms-swift-3.0.1/swift/llm/argument/base_args/base_args.py", line 255, in get_model_processor
model, processor = get_model_tokenizer(**kwargs)
File "/work/share/ms-swift-3.0.1/swift/llm/model/register.py", line 503, in get_model_tokenizer
model, processor = get_function(model_dir, model_info, model_kwargs, load_model, **kwargs)
File "/work/share/jxr/ms-swift-3.0.1/swift/llm/model/model/deepseek.py", line 53, in get_model_tokenizer_deepseek_moe
model, tokenizer = get_model_tokenizer_with_flash_attn(model_dir, model_info, model_kwargs, load_model, kwargs)
File "/work/share/ms-swift-3.0.1/swift/llm/model/register.py", line 265, in get_model_tokenizer_with_flash_attn
return get_model_tokenizer_from_local(model_dir, model_info, model_kwargs, load_model, kwargs)
File "/work/share/ms-swift-3.0.1/swift/llm/model/register.py", line 188, in get_model_tokenizer_from_local
model = automodel_class.from_pretrained(
File "/usr/local/lib/python3.10/dist-packages/transformers/models/auto/auto_factory.py", line 559, in from_pretrained
return model_class.from_pretrained(
File "/usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py", line 3442, in from_pretrained
config.quantization_config = AutoHfQuantizer.merge_quantization_configs(
File "/usr/local/lib/python3.10/dist-packages/transformers/quantizers/auto.py", line 169, in merge_quantization_configs
quantization_config = AutoQuantizationConfig.from_dict(quantization_config)
File "/usr/local/lib/python3.10/dist-packages/transformers/quantizers/auto.py", line 93, in from_dict
raise ValueError(
ValueError: Unknown quantization type, got fp8 - supported types are: ['awq', 'bitsandbytes_4bit', 'bitsandbytes_8bit', 'gptq', 'aqlm', 'quanto', 'eetq', 'hqq', 'compressed-tensors', 'fbgemm_fp8', 'torchao']
[ERROR] 2025-01-02-17:23:14 (PID:6941, Device:0, RankID:-1) ERR99999 UNKNOWN application exceptionYour hardware and system info

Write your system info like CUDA version/system/GPU/torch version here(在这里给出硬件信息和系统信息,如CUDA版本,系统,GPU型号和torch版本等)
python 3.10
torch 2.1.0
ms-swift 3.0.1

@Jintao-Huang
Copy link
Collaborator

需要将fp8 -> bf16 才能训练的

@elimsjxr
Copy link
Author

elimsjxr commented Jan 2, 2025

需要将fp8 -> bf16 才能训练的

这是需要修改模型配置文件吗

@Jintao-Huang
Copy link
Collaborator

https://modelscope.cn/models/deepseek-ai/DeepSeek-V3

这里有转的例子

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants