[Feature] Update loss_scale method call to pass through inputs.extra_kwargs #6159
+2
−1
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
PR type
PR information
问题描述
在
swift/llm/template/base.py
中调用self.loss_scale
时,没有将额外的参数传递给底层的get_loss_scale
函数,导致用户无法通过数据集样本自定义参数来灵活控制loss_scale
行为。修改内容
文件:
swift/llm/template/base.py
修改前:
修改后:
get_loss_scale 函数签名原本就支持额外的 **kwargs 参数:
但在template base.py的调用处没有利用这一特性,导致额外的参数无法传递
用户现在可以在数据集的样本中定义 extra_kwargs,通过这些参数来自定义loss_scale
无破坏性修改:完全向后兼容,不影响现有代码的正常运行
支持高级用法:例如:
基于样本难度动态调整损失权重
根据任务类型应用不同的缩放策略
Experiment results
已在qwen3-omni模型的megatron sft训练和qwen2.5-omni模型的deepspeed sft 训练中试验了该功能。具体为:在样本中增加extra_kwargs ,自定义loss_scale根据这些extra_kwargs 调整 loss_scale , 模型性能提升。
该修改为框架基础设施优化,不直接影响模型性能指标,但显著提升了框架的灵活性和可定制性,为用户实现更精细化的训练控制提供了可能。