[WIP] enable QLoRA + FSDP2 #909

weifengpy · 2024-05-01T06:44:34Z

this PR is stacked on

TorchTune: LoRA + FSDP2 enable LoRA + FSDP2 #855
TorchAO: NF4Tensor with torch.chunk and ops needed by FSDP [FSDP2][NF4Tensor][2/n] implement torch.chunk and other ops ao#150
PyTorch: meta init + cpu offloading [FSDP2] support fully_shard(model_on_meta, cpu_offload) pytorch#126305

command:
tune run --nnodes 1 --nproc_per_node 8 lora_finetune_distributed --config recipes/configs/llama2/7B_qlora_single_device.yaml

QLoRA differs from LoRA in config

model._component_: torchtune.models.llama2.qlora_llama2_7b instead of lora_llama2_7b. LoRALinear(quantize_base=True/False)

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:

use torchao copy_

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:

enable saving checkpoint

pytorch-bot · 2024-05-01T06:44:37Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchtune/909

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 8 New Failures

As of commit b2fd531 with merge base 30c75d4 ():

NEW FAILURES - The following jobs have failed:

Multi-GPU Recipe Tests / recipe_test_multi_gpu (3.10, stable) (gh)
tests/recipes/test_lora_finetune_distributed.py::TestLoRAFinetuneDistributedRecipe::test_save_and_load_merged_weights
Multi-GPU Recipe Tests / recipe_test_multi_gpu (3.11, stable) (gh)
tests/recipes/test_lora_finetune_distributed.py::TestLoRAFinetuneDistributedRecipe::test_save_and_load_merged_weights
Multi-GPU Recipe Tests / recipe_test_multi_gpu (3.8, stable) (gh)
##[error]The operation was canceled.
Multi-GPU Recipe Tests / recipe_test_multi_gpu (3.9, stable) (gh)
Unit Test / unit_tests (3.10) (gh)
tests/torchtune/utils/test_seed.py::TestSeed::test_deterministic_false
Unit Test / unit_tests (3.11) (gh)
tests/torchtune/utils/test_seed.py::TestSeed::test_deterministic_false
Unit Test / unit_tests (3.8) (gh)
tests/torchtune/utils/test_seed.py::TestSeed::test_deterministic_false
Unit Test / unit_tests (3.9) (gh)
tests/torchtune/utils/test_seed.py::TestSeed::test_deterministic_false

This comment was automatically generated by Dr. CI and updates every 15 minutes.

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:

awgu · 2024-05-16T14:16:45Z

tests/torchtune/utils/test_distributed.py

+ inp = torch.randn((2, mlp_dim), device="cuda")
+ base_model(inp).sum().backward()
+ for param in base_model.parameters():
+ torch.distributed.all_reduce(param.grad)


nit: 😄 to divide with the all-reduce

Suggested change

torch.distributed.all_reduce(param.grad)

torch.distributed.all_reduce(param.grad, op=ReduceOp.AVG)

from torch.distributed.distributed_c10d import ReduceOp

updated LoRA PR to use ReduceOp.AVG. This PR will be stacked on it when landing

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:

weifengpy and others added 15 commits April 23, 2024 17:45

enable LoRA + FSDP2

e5826a1

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:

reset params for lora weights and rope

64fc870

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:

support lora weights checkpoint and checkpoint utils

0cd21c6

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:

fix lora meta device bug

589191e

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:

save optim state dict

c801f26

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:

mark TODO

19a2d70

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:

optimizer foreach=True for DTensor

441da10

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:

clip grad norm

750b9e5

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:

switch to ptd state dict api

3d632d5

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:

add profiler

cb3abb3

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:

qlora 7b config

dfcdde3

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:

use torchao copy_

e68804a

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:

Merge pull request #1 from weifengpy/fsdp2

b6fad93

use torchao copy_

enable saving checkpoint

d6af9a2

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:

Merge pull request #2 from weifengpy/fsdp2

7bbe522

enable saving checkpoint

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label May 1, 2024

weifengpy marked this pull request as draft May 1, 2024 06:45

weifengpy mentioned this pull request May 1, 2024

[In Progress] FSDP2 + NF4Tensor #651

Closed

weifengpy and others added 11 commits May 1, 2024 00:33

optimizer state dict: load on rank0 and broadcast

b616394

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:

import Optimizer

a400497

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:

resume training

e9de63c

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:

prepare for full test

05d3895

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:

prepare for full test

7a5bb80

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:

remove profiler

64bf49c

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:

passed integration test

cb1bba4

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:

remove uncesssary change

ac516e9

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:

Merge branch 'main' into fsdp2

bfde704

bring back state dict validation

102db31

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:

align indent on comment

0b66651

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:

weifengpy and others added 22 commits May 3, 2024 18:17

remove unused import

672aabb

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:

switch to ptd state dict and keep self implemented in record

6af2723

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:

clean unused code

42ad99c

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:

remove cuda value error

74f6175

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:

comment on to_empty

f1b8a5e

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:

fix memory issues by switching model state dict api

36e6829

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:

clean for review

08cd1fd

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:

Merge branch 'main' into fsdp2

559bc4d

fix linter

2333134

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:

fix checkpoint loading

49a0364

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:

expecttest CI depedency

dc2ce02

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:

ci depdencecy

0a604aa

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:

fix CI issue

fa83140

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:

Merge branch 'main' into qlora

6203a1f

Merge branch 'pytorch:main' into fsdp2

4b5a895

Merge branch 'fsdp2' into qlora

1080e2c

rebase qlora

1a70498

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:

rebase qlora

cb862e9

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:

sync lora changes

21f5458

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:

push qlora for perf measurement

33773bd

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:

fix meta init + cpu offloading

483028b

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:

init RotaryPositionalEmbeddings in both fresh training and resume

cf42618

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:

awgu reviewed May 16, 2024

View reviewed changes

weifengpy added 3 commits May 16, 2024 20:02

import cpu offloading when needed

b519d50

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:

FSDP(CheckpointWrapper(Model))

8600ced

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:

bring back cpu offloading

b2fd531

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] enable QLoRA + FSDP2 #909

[WIP] enable QLoRA + FSDP2 #909

weifengpy commented May 1, 2024 •

edited

pytorch-bot bot commented May 1, 2024 •

edited

awgu May 16, 2024 •

edited

weifengpy May 17, 2024

weifengpy May 17, 2024

	torch.distributed.all_reduce(param.grad)
	torch.distributed.all_reduce(param.grad, op=ReduceOp.AVG)

[WIP] enable QLoRA + FSDP2 #909

Are you sure you want to change the base?

[WIP] enable QLoRA + FSDP2 #909

Conversation

weifengpy commented May 1, 2024 • edited

pytorch-bot bot commented May 1, 2024 • edited

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchtune/909

❌ 8 New Failures

awgu May 16, 2024 • edited

Choose a reason for hiding this comment

weifengpy May 17, 2024

Choose a reason for hiding this comment

weifengpy May 17, 2024

Choose a reason for hiding this comment

weifengpy commented May 1, 2024 •

edited

pytorch-bot bot commented May 1, 2024 •

edited

awgu May 16, 2024 •

edited