Skip to content

Commit ae3f44f

Browse files
authored
Add Smollm2 pipeline (#205)
* add smollm2 pipeline * update readme
1 parent e057d7f commit ae3f44f

File tree

6 files changed

+210
-0
lines changed

6 files changed

+210
-0
lines changed

README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,7 @@ However, we know from the [InstructGPT](https://huggingface.co/papers/2203.02155
1919
The Alignment Handbook aims to fill that gap by providing the community with a series of robust training recipes that span the whole pipeline.
2020

2121
## News 🗞️
22+
* **November 21, 2024**: We release the [recipe](recipes/smollm2/README.md) for finet-uning SmolLM2-Instruct.
2223
* **August 18, 2024**: We release SmolLM-Instruct v0.2, along with the [recipe](recipes/smollm/README.md) to fine-tuning small LLMs 💻
2324
* **April 12, 2024**: We release Zephyr 141B (A35B), in collaboration with Argilla and Kaist AI, along with the recipe to fine-tune Mixtral 8x22B with ORPO 🪁
2425
* **March 12, 2024:** We release StarChat2 15B, along with the recipe to train capable coding assistants 🌟

recipes/smollm2/README.md

Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,28 @@
1+
2+
# Instructions to train SmolLM2-1.7B-Instruct
3+
4+
We build the [SmolLM2-Instruct](https://huggingface.co/collections/HuggingFaceTB/smollm2-6723884218bcda64b34d7db9) by doing SFT on [SmolTalk](https://huggingface.co/datasets/HuggingFaceTB/smoltalk) and then DPO on [UltraFeedBack](https://huggingface.co/datasets/HuggingFaceH4/ultrafeedback_binarized).
5+
6+
## Setup
7+
8+
Follow the installation instructions in https://github.com/huggingface/alignment-handbook/tree/main?tab=readme-ov-file#installation-instructions
9+
10+
## Training
11+
We train the 1.7B on 8 GPUs using the following command:
12+
13+
```shell
14+
# SFT
15+
ACCELERATE_LOG_LEVEL=info accelerate launch --config_file recipes/accelerate_configs/deepspeed_zero3.yaml scripts/run_sft.py recipes/smollm2/sft/config.yaml
16+
17+
# DPO
18+
ACCELERATE_LOG_LEVEL=info accelerate launch --config_file recipes/accelerate_configs/deepspeed_zero3.yaml scripts/run_dpo.py recipes/smollm2/dpo/config.yaml
19+
```
20+
21+
For the 135M and 360M we use [smol-smoltalk](https://huggingface.co/datasets/HuggingFaceTB/smol-smoltalk) dataset for SFT and UltraFeedback for DPO:
22+
```shell
23+
# SFT
24+
ACCELERATE_LOG_LEVEL=info accelerate launch --config_file recipes/accelerate_configs/deepspeed_zero3.yaml scripts/run_sft.py recipes/smollm2/sft/config_smol.yaml
25+
26+
# DPO
27+
ACCELERATE_LOG_LEVEL=info accelerate launch --config_file recipes/accelerate_configs/deepspeed_zero3.yaml scripts/run_dpo.py recipes/smollm2/dpo/config_smol.yaml
28+
```

recipes/smollm2/dpo/config.yaml

Lines changed: 43 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,43 @@
1+
# Model arguments
2+
model_name_or_path: loubnabnl/smollm2-1.7B-sft
3+
torch_dtype: bfloat16
4+
5+
# Data training arguments
6+
dataset_mixer:
7+
HuggingFaceH4/ultrafeedback_binarized: 1.0
8+
9+
dataset_splits:
10+
- train_prefs
11+
- test_prefs
12+
preprocessing_num_workers: 12
13+
14+
# DPOTrainer arguments
15+
bf16: true
16+
beta: 0.5
17+
do_eval: true
18+
hub_private_repo: true
19+
eval_strategy: steps
20+
eval_steps: 100
21+
gradient_accumulation_steps: 8
22+
gradient_checkpointing: true
23+
gradient_checkpointing_kwargs:
24+
use_reentrant: False
25+
hub_model_id: smollm2-1.7B-dpo
26+
learning_rate: 1.0e-6
27+
log_level: info
28+
logging_steps: 10
29+
lr_scheduler_type: cosine
30+
max_length: 1024
31+
max_prompt_length: 512
32+
num_train_epochs: 3
33+
optim: adamw_torch
34+
output_dir: data/smollm2-1.7B-dpo
35+
per_device_train_batch_size: 2
36+
per_device_eval_batch_size: 4
37+
push_to_hub: true
38+
report_to:
39+
- tensorboard
40+
- wandb
41+
save_strategy: "no"
42+
seed: 42
43+
warmup_ratio: 0.1

recipes/smollm2/dpo/config_smol.yaml

Lines changed: 43 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,43 @@
1+
# Model arguments
2+
model_name_or_path: loubnabnl/smollm2-360M-sft # we use this script for the 135M model too
3+
torch_dtype: bfloat16
4+
5+
# Data training arguments
6+
dataset_mixer:
7+
HuggingFaceH4/ultrafeedback_binarized: 1.0
8+
9+
dataset_splits:
10+
- train_prefs
11+
- test_prefs
12+
preprocessing_num_workers: 12
13+
14+
# DPOTrainer arguments
15+
bf16: true
16+
beta: 0.5
17+
do_eval: true
18+
hub_private_repo: true
19+
eval_strategy: steps
20+
eval_steps: 100
21+
gradient_accumulation_steps: 8
22+
gradient_checkpointing: true
23+
gradient_checkpointing_kwargs:
24+
use_reentrant: False
25+
hub_model_id: smollm2-360M-dpo
26+
learning_rate: 1.0e-6
27+
log_level: info
28+
logging_steps: 10
29+
lr_scheduler_type: cosine
30+
max_length: 1024
31+
max_prompt_length: 512
32+
num_train_epochs: 2
33+
optim: adamw_torch
34+
output_dir: data/smollm2-360M-dpo
35+
per_device_train_batch_size: 2
36+
per_device_eval_batch_size: 4
37+
push_to_hub: true
38+
report_to:
39+
- tensorboard
40+
- wandb
41+
save_strategy: "no"
42+
seed: 42
43+
warmup_ratio: 0.1

recipes/smollm2/sft/config.yaml

Lines changed: 49 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,49 @@
1+
# Model arguments
2+
model_name_or_path: HuggingFaceTB/SmolLM2-1.7B
3+
model_revision: main
4+
tokenizer_name_or_path: HuggingFaceTB/SmolLM2-1.7B-Instruct # Custom tokenizer with <|im_start|> and <|im_end|> tokens
5+
torch_dtype: bfloat16
6+
use_flash_attention_2: true
7+
8+
# Data training arguments
9+
dataset_mixer:
10+
HuggingFaceTB/smoltalk: 1.0
11+
12+
dataset_configs:
13+
- all
14+
15+
dataset_splits:
16+
- train
17+
- test
18+
preprocessing_num_workers: 36
19+
20+
# SFT trainer config
21+
bf16: true
22+
do_eval: true
23+
evaluation_strategy: epoch
24+
gradient_accumulation_steps: 4
25+
gradient_checkpointing: true
26+
gradient_checkpointing_kwargs:
27+
use_reentrant: false
28+
hub_model_id: smollm2-1.7B-sft
29+
hub_strategy: every_save
30+
learning_rate: 3.0e-04
31+
log_level: info
32+
logging_steps: 5
33+
logging_strategy: steps
34+
lr_scheduler_type: cosine
35+
max_seq_length: 8192
36+
max_steps: -1
37+
num_train_epochs: 2
38+
output_dir: data/smollm2-1.7B-sft
39+
overwrite_output_dir: true
40+
per_device_eval_batch_size: 4
41+
per_device_train_batch_size: 4
42+
push_to_hub: true
43+
remove_unused_columns: true
44+
report_to:
45+
- tensorboard
46+
- wandb
47+
save_strategy: "no"
48+
seed: 42
49+
warmup_ratio: 0.1

recipes/smollm2/sft/config_smol.yaml

Lines changed: 46 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,46 @@
1+
# Model arguments
2+
model_name_or_path: HuggingFaceTB/SmolLM2-360M # we use this script for the 135M model too
3+
model_revision: main
4+
tokenizer_name_or_path: HuggingFaceTB/SmolLM2-360M-Instruct # Custom tokenizer with <|im_start|> and <|im_end|> tokens
5+
torch_dtype: bfloat16
6+
use_flash_attention_2: true
7+
8+
# Data training arguments
9+
dataset_mixer:
10+
HuggingFaceTB/smol-smoltalk: 1.0
11+
12+
dataset_splits:
13+
- train
14+
- test
15+
preprocessing_num_workers: 36
16+
17+
# SFT trainer config
18+
bf16: true
19+
do_eval: true
20+
evaluation_strategy: epoch
21+
gradient_accumulation_steps: 4
22+
gradient_checkpointing: true
23+
gradient_checkpointing_kwargs:
24+
use_reentrant: false
25+
hub_model_id: smollm2-360M-sft
26+
hub_strategy: every_save
27+
learning_rate: 1.0e-03 # 3e-4
28+
log_level: info
29+
logging_steps: 5
30+
logging_strategy: steps
31+
lr_scheduler_type: cosine
32+
max_seq_length: 8192
33+
max_steps: -1
34+
num_train_epochs: 2
35+
output_dir: data/smollm2-360M-sft
36+
overwrite_output_dir: true
37+
per_device_eval_batch_size: 4
38+
per_device_train_batch_size: 4
39+
push_to_hub: true
40+
remove_unused_columns: true
41+
report_to:
42+
- tensorboard
43+
- wandb
44+
save_strategy: "no"
45+
seed: 42
46+
warmup_ratio: 0.1

0 commit comments

Comments
 (0)