-
Notifications
You must be signed in to change notification settings - Fork 298
Add QLoRA and FP8 to finetuning tutorial (part 2) #2542
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/2542
Note: Links to docs will display an error until the docs builds have been completed. ⏳ No Failures, 5 PendingAs of commit e6c8194 with merge base 2e2ce0b ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
self.weight.requires_grad_(False) | ||
if self.bias is not None: | ||
self.bias.requires_grad_(False) | ||
nf4_weight = to_nf4(self.weight, **quantization_kwargs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we extend this to support any quantization? I just added a AOBaseTensorConfig that might be able to help: #2463
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I'll add a note to say it's possible to extend to other quantization, but want to keep this example NF4 since that's what's used in the QLoRA paper
|
||
.. code:: | ||
|
||
tune run lora_finetune_single_device --config llama3_2/3B_qlora_single_device.yaml |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what about integration with the transformer peft library?
also isn't tune deprecated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wow I didn't realize we had an integration with peft, don't think this was documented in any of our docs? Will add a note here
For torchtune I don't think there's a mature replacement yet from pytorch so I feel it's OK. Also that's where most of our fine-tuning integrations live today
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah: https://huggingface.co/docs/peft/en/developer_guides/quantization#torchao-pytorch-architecture-optimization, I'll test this path a bit soon as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks great, thanks!
6ccc6b1
to
d04acdd
Compare
This is part 2 of the end-to-end tutorial. Previously we already had QAT. This commit also adds QLoRA and FP8. To preview, visit https://docs-preview.pytorch.org/pytorch/ao/2542/finetuning.html
d04acdd
to
e6c8194
Compare
This is part 2 of the end-to-end tutorial. Previously we already had QAT. This commit also adds QLoRA and FP8. To preview, visit https://docs-preview.pytorch.org/pytorch/ao/2542/finetuning.html