[callback] Add flexible callback system with YAML configuration, HuggingFace Trainer support, and usage examples #5 #9024
+755
−14
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What does this PR do?
Fixes # (issue)
This PR introduces a robust callback plugin system for LLaMA-Factory, including:
Support for registering custom and built-in callbacks via YAML configuration
Callback argument injection (including environment variable substitution)
Seamless integration with HuggingFace Trainer callbacks
Example YAML and Python files demonstrating callback usage
End-to-end test cases for custom callback registration and execution
Documentation updates for callback development and usage (to be completed in the main repo docs)
Before submitting
Did you read the contributor guideline?
Did you write any new necessary tests?
ran following command/code.
a) llamafactory-cli train examples/callback/llama3_lora_sft_callback.yaml
b) llamafactory-cli train --stage sft --do_train True --model_name_or_path meta-llama/Llama-3.2-1B-Instruct --preprocessing_num_workers 16 --finetuning_type lora --template llama3 --flash_attn auto --dataset_dir data --dataset identity --cutoff_len 2048 --learning_rate 5e-05 --num_train_epochs 3.0 --max_samples 100000 --per_device_train_batch_size 2 --gradient_accumulation_steps 8 --lr_scheduler_type cosine --max_grad_norm 1.0 --logging_steps 5 --save_steps 100 --warmup_steps 0 --packing False --enable_thinking True --report_to none --output_dir saves/Llama-3.2-1B-Instruct/lora/train_2025-08-25-09-11-512 --plot_loss True --trust_remote_code True --ddp_timeout 180000000 --include_num_input_tokens_seen True --optim adamw_torch --lora_rank 8 --lora_alpha 16 --lora_dropout 0 --lora_target all
c) llamafactory-cli webui
-- fp32
Llama-3.2-1B-Instruct
dataset identity
d) generated command, and can run:
llamafactory-cli train
--stage sft
--do_train True
--model_name_or_path meta-llama/Llama-3.2-1B-Instruct
--preprocessing_num_workers 16
--finetuning_type lora
--template llama3
--flash_attn auto
--dataset_dir data
--dataset identity
--cutoff_len 2048
--learning_rate 5e-05
--num_train_epochs 3.0
--max_samples 100000
--per_device_train_batch_size 2
--gradient_accumulation_steps 8
--lr_scheduler_type cosine
--max_grad_norm 1.0
--logging_steps 5
--save_steps 100
--warmup_steps 0
--packing False
--enable_thinking True
--report_to none
--output_dir saves/Llama-3.2-1B-Instruct/lora/train_2025-08-25-09-11-51
--plot_loss True
--trust_remote_code True
--ddp_timeout 180000000
--include_num_input_tokens_seen True
--optim adamw_torch
--lora_rank 8
--lora_alpha 16
--lora_dropout 0
--lora_target all