-
Notifications
You must be signed in to change notification settings - Fork 749
[TorchAcc][Experimental] Integrate more model in torchacc #683
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from 39 commits
Commits
Show all changes
49 commits
Select commit
Hold shift + click to select a range
e2d3b44
[TorchAcc] Integrate TorchAcc and provide a sft example of qwen-72b-c…
baoleai 6c899c9
Enhance TorchAcc support for dynamic sequence. (#382)
baoleai 1321592
[TorchAcc] Add support for save/load checkpoint. (#444)
baoleai dba0c65
baichuan_patch
Zhikaiiii ef6e2d6
patch baichuan
dika-hhh 432c070
modify baichuan
Zhikaiiii 1d6f719
Merge branch 'torchacc' into torchacc2
Zhikaiiii a5a6fdc
[TorchAcc] Fix batch split when padding_to is not None. (#480)
baoleai 37c4787
Merge branch 'torchacc' of https://github.com/modelscope/swift into t…
Zhikaiiii 32cc090
metric warmup calculate
Zhikaiiii 64246d3
fix conflict
Zhikaiiii 290c2dd
fix
Zhikaiiii 7e6b197
model patch
Zhikaiiii f0c7c8a
add profiler
Zhikaiiii c8dbfc6
add yi
Zhikaiiii 669f5d9
[TorchAcc] Integrate TorchAcc and provide a sft example of qwen-72b-c…
baoleai b140759
Enhance TorchAcc support for dynamic sequence. (#382)
baoleai 6faa7b3
[TorchAcc] Add support for save/load checkpoint. (#444)
baoleai 1c3a258
fix patch
baoleai 160a9d5
fix lint
baoleai e0fe1d4
code clean
baoleai 0bb1797
add argument:fsdp num
Zhikaiiii f03aa00
[TorchAcc] rebase master
Zhikaiiii 661def1
[TorchAcc] Integrate TorchAcc and provide a sft example of qwen-72b-c…
baoleai da6c94a
Enhance TorchAcc support for dynamic sequence. (#382)
baoleai 73a843a
[TorchAcc] Add support for save/load checkpoint. (#444)
baoleai ee012b1
fix patch
baoleai f1b19a6
fix lint
baoleai 0457fa4
code clean
baoleai d10901f
fix comments
baoleai 30ad8c8
rebase
baoleai cd6e799
clean code
Zhikaiiii 4400ea5
Merge remote-tracking branch 'origin_balole/features/rebase_0401' int…
Zhikaiiii f92274c
clean code
Zhikaiiii 8e3cf24
Merge remote-tracking branch 'origin/main' into rebase_acc
Zhikaiiii 8ee4bbf
format code
Zhikaiiii c3284ed
[fix]add mark_step to optimize speed
Zhikaiiii e38fc2e
add script
Zhikaiiii aa61d6f
add torchacc trim graph
Zhikaiiii 40d18e9
remove useless code
Zhikaiiii 0a173f1
remove useless files
Zhikaiiii 6226edb
add qwen72b full script
Zhikaiiii 5da5649
Merge branch 'main' into rebase_acc
Zhikaiiii 6d68d29
fix bugs
Zhikaiiii a508e21
qwen15 and llama3 support
Zhikaiiii bf2a440
Merge branch 'main' into rebase_acc
Zhikaiiii c5c310a
remove prof callback
Zhikaiiii bd8d072
fix default value and add switch
Zhikaiiii df84f3f
update script
Zhikaiiii File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
35 changes: 35 additions & 0 deletions
35
examples/pytorch/llm/scripts/torchacc/baichuan2_13b_chat/acc_lora_dp_sft.sh
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,35 @@ | ||
# Experimental environment: 4 * A800 | ||
# 80GB GPU memory | ||
# Note: TorchAcc is currently only available internally. | ||
# torchacc dp | ||
export USE_TORCHACC=1 | ||
export XLA_FLAGS='--xla_gpu_force_compilation_parallelism=32 --xla_multiheap_size_constraint_per_heap=4831838208 --xla_disable_hlo_passes=all-gather-combiner,all-reduce-combiner,reduce-scatter-combiner,gpu-convert-async-collectives-to-sync,rematerialization' | ||
export XLA_IR_SHAPE_CACHE_SIZE=100000000 | ||
export XLA_ALLOCATOR_FRACTION=0.95 | ||
export XLA_EXPERIMENTAL=nonzero:masked_select | ||
|
||
NPROC_PER_NODE=2 \ | ||
CUDA_VISIBLE_DEVICES=2,3 \ | ||
MASTER_PORT=27829 \ | ||
swift sft \ | ||
--model_id_or_path baichuan-inc/Baichuan2-13B-Chat \ | ||
--model_layer_cls_name BaichuanLayer \ | ||
--dataset codefuse-python-en \ | ||
baoleai marked this conversation as resolved.
Show resolved
Hide resolved
|
||
--sft_type lora \ | ||
--output_dir output \ | ||
--num_train_epochs 1 \ | ||
--max_length 2048 \ | ||
--batch_size 12 \ | ||
--use_flash_attn true \ | ||
--gradient_accumulation_steps 1 \ | ||
--gradient_checkpointing no \ | ||
--tuner_backend 'peft' \ | ||
--dataset_test_ratio 0 \ | ||
--save_strategy no \ | ||
--eval_steps 2000000 \ | ||
--save_steps 2000000 \ | ||
--logging_steps 100 \ | ||
--preprocess_num_proc 1 \ | ||
--metric_warmup_step 0.1 \ | ||
--use_profiler false \ | ||
--report_to 'none' |
35 changes: 35 additions & 0 deletions
35
examples/pytorch/llm/scripts/torchacc/baichuan2_13b_chat/acc_lora_fsdp_sft.sh
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,35 @@ | ||
# Experimental environment: 4 * A800 | ||
baoleai marked this conversation as resolved.
Show resolved
Hide resolved
|
||
# 80GB GPU memory | ||
# Note: TorchAcc is currently only available internally. | ||
# torchacc fsdp | ||
export USE_TORCHACC=1 | ||
export XLA_FLAGS='--xla_gpu_force_compilation_parallelism=32 --xla_multiheap_size_constraint_per_heap=4831838208 --xla_disable_hlo_passes=all-gather-combiner,all-reduce-combiner,reduce-scatter-combiner,gpu-convert-async-collectives-to-sync,rematerialization' | ||
export XLA_IR_SHAPE_CACHE_SIZE=100000000 | ||
export XLA_ALLOCATOR_FRACTION=0.95 | ||
export XLA_EXPERIMENTAL=nonzero:masked_select | ||
|
||
NPROC_PER_NODE=2 \ | ||
CUDA_VISIBLE_DEVICES=0,1 \ | ||
swift sft \ | ||
--model_id_or_path baichuan-inc/Baichuan2-13B-Chat \ | ||
--model_layer_cls_name BaichuanLayer \ | ||
--dataset codefuse-python-en \ | ||
--sft_type lora \ | ||
--output_dir output \ | ||
--num_train_epochs 1 \ | ||
--max_length 2048 \ | ||
--batch_size 16 \ | ||
--use_flash_attn true \ | ||
--gradient_accumulation_steps 1 \ | ||
--gradient_checkpointing no \ | ||
--tuner_backend 'peft' \ | ||
--dataset_test_ratio 0 \ | ||
--save_strategy no \ | ||
--eval_steps 2000000 \ | ||
--save_steps 2000000 \ | ||
--logging_steps 100 \ | ||
--preprocess_num_proc 1 \ | ||
--metric_warmup_step 0.1 \ | ||
--use_profiler false \ | ||
--fsdp_num 2 \ | ||
--report_to 'none' |
28 changes: 28 additions & 0 deletions
28
examples/pytorch/llm/scripts/torchacc/baichuan2_13b_chat/swift_lora_sft.sh
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,28 @@ | ||
# Experimental environment: 4 * A800 | ||
# 80GB GPU memory | ||
# Note: TorchAcc is currently only available internally. | ||
|
||
# MASTER_ADDR=127.0.0.1 \ | ||
|
||
NPROC_PER_NODE=2 \ | ||
CUDA_VISIBLE_DEVICES=0,1 \ | ||
swift sft \ | ||
--model_id_or_path baichuan-inc/Baichuan2-13B-Chat \ | ||
--dataset codefuse-python-en \ | ||
--sft_type lora \ | ||
--dtype AUTO \ | ||
--output_dir output \ | ||
--num_train_epochs 1 \ | ||
--max_length 2048 \ | ||
--batch_size 2 \ | ||
--use_flash_attn true \ | ||
--gradient_accumulation_steps 1 \ | ||
--dataset_test_ratio 0 \ | ||
--save_strategy no \ | ||
--eval_steps 2000000 \ | ||
--save_steps 2000000 \ | ||
--logging_steps 100 \ | ||
--preprocess_num_proc 1 \ | ||
--metric_warmup_step 0.1 \ | ||
--use_profiler false \ | ||
--report_to 'none' |
36 changes: 36 additions & 0 deletions
36
examples/pytorch/llm/scripts/torchacc/chatglm3_6b/acc_lora_dp_sft.sh
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,36 @@ | ||
# Experimental environment: 4 * A800 | ||
# 80GB GPU memory | ||
# Note: TorchAcc is currently only available internally. | ||
# torchacc dp | ||
export USE_TORCHACC=1 | ||
export XLA_FLAGS='--xla_gpu_force_compilation_parallelism=32 --xla_multiheap_size_constraint_per_heap=4831838208 --xla_disable_hlo_passes=all-gather-combiner,all-reduce-combiner,reduce-scatter-combiner,gpu-convert-async-collectives-to-sync,rematerialization' | ||
export XLA_IR_SHAPE_CACHE_SIZE=100000000 | ||
export XLA_ALLOCATOR_FRACTION=0.95 | ||
export XLA_EXPERIMENTAL=nonzero:masked_select | ||
|
||
|
||
NPROC_PER_NODE=2 \ | ||
CUDA_VISIBLE_DEVICES=0,1 \ | ||
MASTER_PORT=27829 \ | ||
swift sft \ | ||
--model_id_or_path ZhipuAI/chatglm3-6b \ | ||
--model_layer_cls_name GLMBlock \ | ||
--dataset codefuse-python-en \ | ||
--sft_type lora \ | ||
--output_dir output \ | ||
--num_train_epochs 1 \ | ||
--max_length 2048 \ | ||
--batch_size 16 \ | ||
--use_flash_attn true \ | ||
--gradient_accumulation_steps 1 \ | ||
--gradient_checkpointing no \ | ||
--tuner_backend 'peft' \ | ||
--dataset_test_ratio 0 \ | ||
--save_strategy no \ | ||
--eval_steps 2000000 \ | ||
--save_steps 2000000 \ | ||
--logging_steps 100 \ | ||
--preprocess_num_proc 1 \ | ||
--metric_warmup_step 0.1 \ | ||
--use_profiler false \ | ||
--report_to 'none' |
36 changes: 36 additions & 0 deletions
36
examples/pytorch/llm/scripts/torchacc/chatglm3_6b/acc_lora_fsdp_sft.sh
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,36 @@ | ||
# Experimental environment: 4 * A800 | ||
# 80GB GPU memory | ||
# Note: TorchAcc is currently only available internally. | ||
# torchacc fsdp | ||
export USE_TORCHACC=1 | ||
export XLA_FLAGS='--xla_gpu_force_compilation_parallelism=32 --xla_multiheap_size_constraint_per_heap=4831838208 --xla_disable_hlo_passes=all-gather-combiner,all-reduce-combiner,reduce-scatter-combiner,gpu-convert-async-collectives-to-sync,rematerialization' | ||
export XLA_IR_SHAPE_CACHE_SIZE=100000000 | ||
export XLA_ALLOCATOR_FRACTION=0.95 | ||
export XLA_EXPERIMENTAL=nonzero:masked_select | ||
|
||
|
||
NPROC_PER_NODE=2 \ | ||
CUDA_VISIBLE_DEVICES=2,3 \ | ||
swift sft \ | ||
--model_id_or_path ZhipuAI/chatglm3-6b \ | ||
--model_layer_cls_name GLMBlock \ | ||
--dataset codefuse-python-en \ | ||
--sft_type lora \ | ||
--output_dir output \ | ||
--num_train_epochs 1 \ | ||
--max_length 2048 \ | ||
--batch_size 16 \ | ||
--use_flash_attn true \ | ||
--gradient_accumulation_steps 1 \ | ||
--gradient_checkpointing no \ | ||
--tuner_backend 'peft' \ | ||
--dataset_test_ratio 0 \ | ||
--save_strategy no \ | ||
--eval_steps 2000000 \ | ||
--save_steps 2000000 \ | ||
--logging_steps 100 \ | ||
--preprocess_num_proc 1 \ | ||
--metric_warmup_step 0.1 \ | ||
--use_profiler false \ | ||
--fsdp_num 2 \ | ||
--report_to 'none' |
27 changes: 27 additions & 0 deletions
27
examples/pytorch/llm/scripts/torchacc/chatglm3_6b/swift_lora_sft.sh
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,27 @@ | ||
# Experimental environment: 4 * A800 | ||
# 80GB GPU memory | ||
# Note: TorchAcc is currently only available internally. | ||
|
||
# MASTER_ADDR=127.0.0.1 \ | ||
# MASTER_PORT=12356 \ | ||
NPROC_PER_NODE=2 \ | ||
CUDA_VISIBLE_DEVICES=2,3 \ | ||
swift sft \ | ||
--model_id_or_path ZhipuAI/chatglm3-6b \ | ||
--dataset codefuse-python-en \ | ||
--sft_type lora \ | ||
--dtype AUTO \ | ||
--output_dir output \ | ||
--num_train_epochs 1 \ | ||
--max_length 2048 \ | ||
--batch_size 4 \ | ||
--use_flash_attn true \ | ||
--gradient_accumulation_steps 1 \ | ||
--dataset_test_ratio 0 \ | ||
--save_strategy no \ | ||
--eval_steps 2000000 \ | ||
--save_steps 2000000 \ | ||
--logging_steps 100 \ | ||
--preprocess_num_proc 1 \ | ||
--metric_warmup_step 0.1 \ | ||
--report_to 'none' |
36 changes: 36 additions & 0 deletions
36
examples/pytorch/llm/scripts/torchacc/llama2_13b_chat/acc_lora_dp_sft.sh
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,36 @@ | ||
# Experimental environment: 4 * A800 | ||
# 80GB GPU memory | ||
# Note: TorchAcc is currently only available internally. | ||
|
||
export USE_TORCHACC=1 | ||
export TORCHACC_TRIM_GRAPH=1 | ||
export XLA_FLAGS='--xla_gpu_force_compilation_parallelism=32 --xla_multiheap_size_constraint_per_heap=4831838208 --xla_disable_hlo_passes=all-gather-combiner,all-reduce-combiner,reduce-scatter-combiner,gpu-convert-async-collectives-to-sync,rematerialization' | ||
export XLA_IR_SHAPE_CACHE_SIZE=100000000 | ||
export XLA_ALLOCATOR_FRACTION=0.95 | ||
export XLA_EXPERIMENTAL=nonzero:masked_select | ||
|
||
NPROC_PER_NODE=2 \ | ||
CUDA_VISIBLE_DEVICES=0,1 \ | ||
swift sft \ | ||
--model_id_or_path modelscope/Llama-2-13b-chat-ms \ | ||
--model_layer_cls_name LlamaDecoderLayer \ | ||
--dataset codefuse-python-en \ | ||
--template_type llama \ | ||
--sft_type lora \ | ||
--output_dir output \ | ||
--num_train_epochs 1 \ | ||
--max_length 2048 \ | ||
--batch_size 16 \ | ||
--use_flash_attn true \ | ||
--gradient_accumulation_steps 1 \ | ||
--gradient_checkpointing no \ | ||
--tuner_backend 'peft' \ | ||
--dataset_test_ratio 0 \ | ||
--save_strategy no \ | ||
--eval_steps 2000000 \ | ||
--save_steps 2000000 \ | ||
--logging_steps 100 \ | ||
--preprocess_num_proc 1 \ | ||
--metric_warmup_step 0.1 \ | ||
--use_profiler false \ | ||
--report_to 'none' |
37 changes: 37 additions & 0 deletions
37
examples/pytorch/llm/scripts/torchacc/llama2_13b_chat/acc_lora_fsdp_sft.sh
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,37 @@ | ||
# Experimental environment: 4 * A800 | ||
# 80GB GPU memory | ||
# Note: TorchAcc is currently only available internally. | ||
export USE_TORCHACC=1 | ||
export TORCHACC_TRIM_GRAPH=1 | ||
export XLA_FLAGS='--xla_gpu_force_compilation_parallelism=32 --xla_multiheap_size_constraint_per_heap=4831838208 --xla_disable_hlo_passes=all-gather-combiner,all-reduce-combiner,reduce-scatter-combiner,gpu-convert-async-collectives-to-sync,rematerialization' | ||
export XLA_IR_SHAPE_CACHE_SIZE=100000000 | ||
export XLA_ALLOCATOR_FRACTION=0.95 | ||
export XLA_EXPERIMENTAL=nonzero:masked_select | ||
|
||
NPROC_PER_NODE=2 \ | ||
CUDA_VISIBLE_DEVICES=2,3 \ | ||
MASTER_PORT=27829 \ | ||
swift sft \ | ||
--model_id_or_path modelscope/Llama-2-13b-chat-ms \ | ||
--model_layer_cls_name LlamaDecoderLayer \ | ||
--dataset codefuse-python-en \ | ||
--template_type llama \ | ||
--sft_type lora \ | ||
--output_dir output \ | ||
--num_train_epochs 1 \ | ||
--max_length 2048 \ | ||
--batch_size 24 \ | ||
--use_flash_attn true \ | ||
--gradient_accumulation_steps 1 \ | ||
--gradient_checkpointing no \ | ||
--tuner_backend 'peft' \ | ||
--dataset_test_ratio 0 \ | ||
--save_strategy no \ | ||
--eval_steps 2000000 \ | ||
--save_steps 2000000 \ | ||
--logging_steps 100 \ | ||
--preprocess_num_proc 1 \ | ||
--metric_warmup_step 0.1 \ | ||
--use_profiler false \ | ||
--fsdp_num 2 \ | ||
--report_to 'none' |
28 changes: 28 additions & 0 deletions
28
examples/pytorch/llm/scripts/torchacc/llama2_13b_chat/swift_lora_sft.sh
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,28 @@ | ||
# Experimental environment: 4 * A800 | ||
# 80GB GPU memory | ||
# Note: TorchAcc is currently only available internally. | ||
|
||
# MASTER_ADDR=127.0.0.1 \ | ||
|
||
NPROC_PER_NODE=2 \ | ||
CUDA_VISIBLE_DEVICES=2,3 \ | ||
swift sft \ | ||
--model_id_or_path modelscope/Llama-2-13b-chat-ms \ | ||
--dataset codefuse-python-en \ | ||
--sft_type lora \ | ||
--dtype AUTO \ | ||
--output_dir output \ | ||
--num_train_epochs 1 \ | ||
--max_length 2048 \ | ||
--batch_size 16 \ | ||
--use_flash_attn true \ | ||
--gradient_accumulation_steps 1 \ | ||
--dataset_test_ratio 0 \ | ||
--save_strategy no \ | ||
--eval_steps 2000000 \ | ||
--save_steps 2000000 \ | ||
--logging_steps 100 \ | ||
--preprocess_num_proc 1 \ | ||
--metric_warmup_step 0.1 \ | ||
--use_profiler false \ | ||
--report_to 'none' |
32 changes: 32 additions & 0 deletions
32
examples/pytorch/llm/scripts/torchacc/qwen_72b_chat/acc_lora_fsdp_sft.sh
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,32 @@ | ||
# Experimental environment: 4 * A800 | ||
# 80GB GPU memory | ||
# Note: TorchAcc is currently only available internally. | ||
|
||
export USE_TORCHACC=1 | ||
export XLA_FLAGS='--xla_gpu_force_compilation_parallelism=32 --xla_multiheap_size_constraint_per_heap=4831838208 --xla_disable_hlo_passes=all-gather-combiner,all-reduce-combiner,reduce-scatter-combiner,gpu-convert-async-collectives-to-sync,rematerialization' | ||
export XLA_IR_SHAPE_CACHE_SIZE=100000000 | ||
export XLA_ALLOCATOR_FRACTION=0.95 | ||
export XLA_EXPERIMENTAL=nonzero:masked_select | ||
|
||
NPROC_PER_NODE=4 \ | ||
CUDA_VISIBLE_DEVICES=0,1,2,3 \ | ||
swift sft \ | ||
--model_type qwen-72b-chat \ | ||
--model_layer_cls_name QWenBlock \ | ||
--dataset codefuse-python-en \ | ||
--sft_type lora \ | ||
--output_dir output_qwen_72b \ | ||
--num_train_epochs 1 \ | ||
--max_length 2048 \ | ||
--batch_size 4 \ | ||
--use_flash_attn true \ | ||
--gradient_accumulation_steps 1 \ | ||
--gradient_checkpointing no \ | ||
--tuner_backend 'peft' \ | ||
--eval_steps 200 \ | ||
--save_steps 200 \ | ||
--logging_steps 100 \ | ||
--metric_warmup_step 0.1 \ | ||
--use_profiler false \ | ||
--report_to 'none' \ | ||
--fsdp_num 4 \ |
27 changes: 27 additions & 0 deletions
27
examples/pytorch/llm/scripts/torchacc/qwen_72b_chat/swift_lora_sft.sh
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,27 @@ | ||
# Experimental environment: 4 * A800 | ||
# 80GB GPU memory | ||
# Note: TorchAcc is currently only available internally. | ||
|
||
# MASTER_ADDR=127.0.0.1 \ | ||
|
||
NPROC_PER_NODE=2 \ | ||
CUDA_VISIBLE_DEVICES=0,1,2,3 \ | ||
swift sft \ | ||
--model_id_or_path qwen/Qwen-72B-Chat \ | ||
--dataset codefuse-python-en \ | ||
--sft_type lora \ | ||
--dtype AUTO \ | ||
--output_dir output \ | ||
--num_train_epochs 1 \ | ||
--max_length 2048 \ | ||
--batch_size 1 \ | ||
--use_flash_attn true \ | ||
--gradient_accumulation_steps 1 \ | ||
--dataset_test_ratio 0 \ | ||
--save_strategy no \ | ||
--eval_steps 2000000 \ | ||
--save_steps 2000000 \ | ||
--logging_steps 100 \ | ||
--preprocess_num_proc 1 \ | ||
--metric_warmup_step 0.1 \ | ||
--use_profiler false \ |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.