Closed
Description
System Info
Following the recent merged PR and release note, I try to load the DeepSeek R1 model using code snippet below on single P5EN (8 H200) GPUs.
- The first issue I have is that it took very long time to load. Estimation is ~10 hours.
- Then I tried to modify the
config.json
andmodel.safetensors.index.json
to try to load first 10 layers including embed_token and lm.head modules. However, it hit following error. The issue is gone if I used DeepSeek conversion script to convert the checkpoint from FP8 to BF16.
Some parameters are on the meta device because they were offloaded to the cpu.
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:1 for open-end generation.
The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Traceback (most recent call last):
File "/iofsx/sds3/models/DeepSeekV3/test.py", line 18, in <module>
outputs = model.generate(inputs, max_new_tokens=50)
File "/opt/conda/envs/fix/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
File "/opt/conda/envs/fix/lib/python3.10/site-packages/transformers/generation/utils.py", line 2370, in generate
result = self._sample(
File "/opt/conda/envs/fix/lib/python3.10/site-packages/transformers/generation/utils.py", line 3331, in _sample
outputs = self(**model_inputs, return_dict=True)
File "/opt/conda/envs/fix/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/opt/conda/envs/fix/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
return forward_call(*args, **kwargs)
File "/opt/conda/envs/fix/lib/python3.10/site-packages/accelerate/hooks.py", line 176, in new_forward
output = module._old_forward(*args, **kwargs)
File "/opt/conda/envs/fix/lib/python3.10/site-packages/transformers/utils/deprecation.py", line 172, in wrapped_func
return func(*args, **kwargs)
File "/opt/conda/envs/fix/lib/python3.10/site-packages/transformers/models/deepseek_v3/modeling_deepseek_v3.py", line 1025, in forward
outputs = self.model(
File "/opt/conda/envs/fix/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/opt/conda/envs/fix/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
return forward_call(*args, **kwargs)
File "/opt/conda/envs/fix/lib/python3.10/site-packages/transformers/models/deepseek_v3/modeling_deepseek_v3.py", line 773, in forward
layer_outputs = decoder_layer(
File "/opt/conda/envs/fix/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/opt/conda/envs/fix/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
return forward_call(*args, **kwargs)
File "/opt/conda/envs/fix/lib/python3.10/site-packages/accelerate/hooks.py", line 176, in new_forward
output = module._old_forward(*args, **kwargs)
File "/opt/conda/envs/fix/lib/python3.10/site-packages/transformers/models/deepseek_v3/modeling_deepseek_v3.py", line 513, in forward
hidden_states, self_attn_weights = self.self_attn(
File "/opt/conda/envs/fix/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/opt/conda/envs/fix/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
return forward_call(*args, **kwargs)
File "/opt/conda/envs/fix/lib/python3.10/site-packages/accelerate/hooks.py", line 176, in new_forward
output = module._old_forward(*args, **kwargs)
File "/opt/conda/envs/fix/lib/python3.10/site-packages/transformers/models/deepseek_v3/modeling_deepseek_v3.py", line 423, in forward
q_states = self.q_b_proj(self.q_a_layernorm(self.q_a_proj(hidden_states))).view(query_shape).transpose(1, 2)
File "/opt/conda/envs/fix/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/opt/conda/envs/fix/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
return forward_call(*args, **kwargs)
File "/opt/conda/envs/fix/lib/python3.10/site-packages/accelerate/hooks.py", line 176, in new_forward
output = module._old_forward(*args, **kwargs)
File "/opt/conda/envs/fix/lib/python3.10/site-packages/torch/nn/modules/linear.py", line 125, in forward
return F.linear(input, self.weight, self.bias)
RuntimeError: expected mat1 and mat2 to have the same dtype, but got: c10::BFloat16 != c10::Float8_e4m3fn
Who can help?
No response
Information
- The official example scripts
- My own modified scripts
Tasks
- An officially supported task in the
examples
folder (such as GLUE/SQuAD, ...) - My own task or dataset (give details below)
Reproduction
Code snippet to load and generate text example.
# `run_deepseek_v1.py`
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
torch.manual_seed(30)
model_path = "MYMODEL_PATH"
tokenizer = AutoTokenizer.from_pretrained(model_path)
chat = [
{"role": "user", "content": "Hello, how are you?"},
{"role": "assistant", "content": "I'm doing great. How can I help you today?"},
{"role": "user", "content": "I'd like to show off how chat templating works!"},
]
model = AutoModelForCausalLM.from_pretrained(model_path, device_map="auto", torch_dtype=torch.bfloat16)
inputs = tokenizer.apply_chat_template(chat, tokenize=True, add_generation_prompt=True, return_tensors="pt").to(model.device)
outputs = model.generate(inputs, max_new_tokens=50)
print(tokenizer.batch_decode(outputs))
Expected behavior
NA