Integrating and Fine-Tuning a Custom Llama 3.1 Model in Llama 3.2 #1799

Hyperakan · 2024-10-10T10:32:39Z

Hyperakan
Oct 10, 2024

How can I add my custom Llama 3.1 model into Llama 3.2 and train or fine-tune it?

Oct 18, 2024

@ebsmothers should work for you in this case. You can make this new checkpoint using the code below and then finetune it with any llama3_2_vision config you want.

from torchtune import models, training

with training.set_default_dtype("bf16"), "cpu":
    model = models.llama3_2_vision.llama3_2_vision_11b(...) # same args as in the config

# Load the 3.2 vision checkpoint (use HF or Meta checkpointer depending on your checkpoint)
model_checkpointer = training.FullModelMetaCheckpointer(...) # same args as in the config
state_dict = model_checkpointer.load_checkpoint()
model.load_state_dict(state_dict)

# Load the 3.1 text checkpoint (use HF or Meta checkpointer depending on your checkpoint)
t…

View full answer

ebsmothers · 2024-10-10T14:09:17Z

ebsmothers
Oct 10, 2024
Collaborator

Hi @Hyperakan thanks for the question. When you say "custom", have you made any changes to the architecture or anything like that? Or is it just a change in the weights themselves? If the latter, I think this is similar to how we load LoRA models today: load one subset of weights with strict=False, then load another set of disjoint weights with strict=False (see here). I assume you are still loading the 3.2 vision model's encoder weights, so actually you can probably just keep the existing state dict load logic and add an extra call to model.load_state_dict(my_custom_llama3_1_weights, strict=False) right after. You'll need to make sure your state dict is in the right format, you may be able to leverage the utility llama3_vision_hf_to_tune to help with this (see here).

Separately I think this is not an uncommon use case, so let me think about how we can support this more natively moving forward.

2 replies

Hyperakan Oct 11, 2024
Author

First of all, thank you for your quick and detailed response. I am a beginner, so I may not fully understand some technical aspects.

From what I understand from your explanation, the goal is to transfer the weights from the layers of the existing LLaMA 3.2 Vision model that are identical to those in the LLaMA 3.1 Language model, while leaving the weights of layers that are different unchanged.

However, I would like to clarify my problem again, as I may not have expressed it completely.

Hypothetically, I have a LLaMA 3.1 model that has been pretrained and finetuned on my custom datasets to better understand the Turkish language. From what I understand, the LLaMA 3.2 Vision model uses the pretrained LLaMA 3.1 model and adds additional components, such as cross-attention layers and vision-language model (VLM) components (e.g., a vision encoder).

What I want to do is replace the LLaMA 3.1 language model used in the LLaMA 3.2 Vision model with my own Turkish language model. This should improve the LLaMA 3.2 Vision model’s understanding of Turkish. After making this modification, I plan to further finetune the model using Turkish data.

pbontrager Oct 18, 2024
Collaborator

@ebsmothers should work for you in this case. You can make this new checkpoint using the code below and then finetune it with any llama3_2_vision config you want.

from torchtune import models, training

with training.set_default_dtype("bf16"), "cpu":
    model = models.llama3_2_vision.llama3_2_vision_11b(...) # same args as in the config

# Load the 3.2 vision checkpoint (use HF or Meta checkpointer depending on your checkpoint)
model_checkpointer = training.FullModelMetaCheckpointer(...) # same args as in the config
state_dict = model_checkpointer.load_checkpoint()
model.load_state_dict(state_dict)

# Load the 3.1 text checkpoint (use HF or Meta checkpointer depending on your checkpoint)
text_checkpointer = training.FullModelMetaCheckpointer(...) # same args as in the config
state_dict = text_checkpointer.load_checkpoint()
model.decoder.load_state_dict(state_dict, strict=False)

# Save merged model
ckpt_dict = {training.MODEL_KEY: model.state_dict()}
model_checkpointer.save_checkpoint(ckpt_dict, epoch=0, intermediate_checkpoint=False)

Answer selected by Hyperakan

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Integrating and Fine-Tuning a Custom Llama 3.1 Model in Llama 3.2 #1799

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 2 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Select a reply

Uh oh!

Integrating and Fine-Tuning a Custom Llama 3.1 Model in Llama 3.2 #1799

Uh oh!

Hyperakan Oct 10, 2024

Replies: 1 comment · 2 replies

Uh oh!

ebsmothers Oct 10, 2024 Collaborator

Uh oh!

Hyperakan Oct 11, 2024 Author

Uh oh!

Uh oh!

pbontrager Oct 18, 2024 Collaborator

Hyperakan
Oct 10, 2024

Replies: 1 comment 2 replies

ebsmothers
Oct 10, 2024
Collaborator

Hyperakan Oct 11, 2024
Author

pbontrager Oct 18, 2024
Collaborator