Question about the shape of lm_head.weight #1850

Jay-zzcoder · 2025-03-12T02:00:51Z

Question

I found that if I don't pass the BitsAndBytesConfig args in LlavaLlamaForCausalLM.from_pretrained() like this:
`model = LlavaLlamaForCausalLM.from_pretrained( tokenizer_path, torch_dtype=torch.bfloat16 , #**bnb_model_from_pretrained_args )`
the shape of lm_head.weight is torch.Size([32000, 5120])

But if I pass BitsAndBytesConfig args in LlavaLlamaForCausalLM.from_pretrained() like:
model = LlavaLlamaForCausalLM.from_pretrained( tokenizer_path, torch_dtype=torch.bfloat16 , **bnb_model_from_pretrained_args )
the shape of lm_head.weight will change and become torch.Size([81920000, 1])

Why the shape of lm_head.weight changes?

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about the shape of lm_head.weight #1850

Question about the shape of lm_head.weight #1850

Jay-zzcoder commented Mar 12, 2025 •

edited

Loading

Question about the shape of lm_head.weight #1850

Question about the shape of lm_head.weight #1850

Comments

Jay-zzcoder commented Mar 12, 2025 • edited Loading

I found that if I don't pass the BitsAndBytesConfig args in LlavaLlamaForCausalLM.from_pretrained() like this: model = LlavaLlamaForCausalLM.from_pretrained( tokenizer_path, torch_dtype=torch.bfloat16 , #**bnb_model_from_pretrained_args ) the shape of lm_head.weight is torch.Size([32000, 5120])

Jay-zzcoder commented Mar 12, 2025 •

edited

Loading

I found that if I don't pass the BitsAndBytesConfig args in LlavaLlamaForCausalLM.from_pretrained() like this:
`model = LlavaLlamaForCausalLM.from_pretrained( tokenizer_path, torch_dtype=torch.bfloat16 , #**bnb_model_from_pretrained_args )`
the shape of lm_head.weight is torch.Size([32000, 5120])