Skip to content

Train on completions only by fixing the collator inquiry #1396

@hessaAlawwad

Description

@hessaAlawwad

Hello,

I was wondering if I would be able to use the DataCollatorForCompletionOnlyLM to train Llama 3.2 vision model on the generated prompts only?
Something like passing a response template and the tokenizer in this code:

response_template = " ### Answer:"
collator = DataCollatorForCompletionOnlyLM(response_template, tokenizer=tokenizer)

I see that in the provided code they are using data_collator = UnslothVisionDataCollator(model, tokenizer) and indicating it is a must use. So can I see it and edit to serve my purpose of training which is computing the loss only on the generated token?

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions