feat(training): add freeze_router_bias and freeze_expert_bias configs… #2215

eous · 2026-01-08T23:50:21Z

… for MoE fine-tuning

Add config options to freeze router and/or expert biases during MoE fine-tuning, preserving pretrained routing behavior and expert bias values.

Changes to job_config.py:

Add freeze_router_bias: bool = False
Add freeze_expert_bias: bool = False
Document dependency on use_router_bias/use_expert_bias in MoEArgs

Changes to parallelize.py:

Add freeze_moe_biases() function
Apply freezing before parallelization in parallelize_gptoss()
Add warnings when freeze options enabled but no biases found

Note: These options require the model config to have use_router_bias=True and/or use_expert_bias=True in MoEArgs (e.g., GPT-OSS models).

… for MoE fine-tuning Add config options to freeze router and/or expert biases during MoE fine-tuning, preserving pretrained routing behavior and expert bias values. Changes to job_config.py: - Add freeze_router_bias: bool = False - Add freeze_expert_bias: bool = False - Document dependency on use_router_bias/use_expert_bias in MoEArgs Changes to parallelize.py: - Add freeze_moe_biases() function - Apply freezing before parallelization in parallelize_gptoss() - Add warnings when freeze options enabled but no biases found Note: These options require the model config to have use_router_bias=True and/or use_expert_bias=True in MoEArgs (e.g., GPT-OSS models).

Copilot

Pull request overview

This PR adds configuration options to freeze router and expert biases during MoE (Mixture of Experts) fine-tuning, which helps preserve pretrained routing behavior and prevent instability from bias updates.

Key changes:

Added freeze_router_bias and freeze_expert_bias boolean configuration fields to the Training class
Implemented freeze_moe_biases() function to freeze MoE bias parameters
Integrated the freezing logic into parallelize_gptoss() before model parallelization

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File	Description
torchtitan/config/job_config.py	Added two new boolean config fields with comprehensive documentation explaining their purpose and dependencies on MoEArgs
torchtitan/models/gpt_oss/infra/parallelize.py	Implemented freeze_moe_biases function and integrated it into parallelize_gptoss with appropriate logging and warnings

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-01-08T23:53:12Z

torchtitan/models/gpt_oss/infra/parallelize.py

+        if freeze_router and "moe.router.gate.bias" in name:
+            param.requires_grad = False
+            router_frozen += 1
+        elif freeze_expert and ("experts.mlp1_bias" in name or "experts.mlp2_bias" in name):


The logic here will miss freezing expert biases when both freeze_router and freeze_expert are True. The elif condition means that if a parameter name contains "moe.router.gate.bias", it will never check for expert bias patterns even if freeze_expert is True. This should be two separate if statements to allow both types of parameters to be checked independently.

Suggested change

elif freeze_expert and ("experts.mlp1_bias" in name or "experts.mlp2_bias" in name):

if freeze_expert and ("experts.mlp1_bias" in name or "experts.mlp2_bias" in name):

Copilot · 2026-01-08T23:53:13Z

torchtitan/models/gpt_oss/infra/parallelize.py

+def freeze_moe_biases(
+    model: nn.Module,
+    freeze_router: bool = False,
+    freeze_expert: bool = False,
+) -> tuple[int, int]:
+    """
+    Freeze router gate biases and/or expert biases in all MoE layers.
+
+    This is recommended for fine-tuning MoE models to preserve pretrained
+    routing behavior and prevent instability from bias updates.
+
+    Args:
+        model: The model containing MoE layers with router gates and experts.
+        freeze_router: Whether to freeze router gate biases.
+        freeze_expert: Whether to freeze expert biases (mlp1_bias, mlp2_bias).
+
+    Returns:
+        Tuple of (router_frozen_count, expert_frozen_count).
+    """
+    router_frozen = 0
+    expert_frozen = 0
+    for name, param in model.named_parameters():
+        if freeze_router and "moe.router.gate.bias" in name:
+            param.requires_grad = False
+            router_frozen += 1
+        elif freeze_expert and ("experts.mlp1_bias" in name or "experts.mlp2_bias" in name):
+            param.requires_grad = False
+            expert_frozen += 1
+    return router_frozen, expert_frozen


The repository has comprehensive unit and integration test coverage. The new freeze_moe_biases function and the integration in parallelize_gptoss should have test coverage to verify:

The function correctly freezes router biases when freeze_router=True

The function correctly freezes expert biases when freeze_expert=True

Both types can be frozen simultaneously

Warnings are logged when biases are not found

The counts returned are accurate

Consider adding a unit test for freeze_moe_biases and an integration test that validates the freezing behavior in a training scenario.

Copilot AI review requested due to automatic review settings January 8, 2026 23:50

eous requested review from fegin, tianyu-l, wconstab and wwwjn as code owners January 8, 2026 23:50

Copilot started reviewing on behalf of eous January 8, 2026 23:50 View session

meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Jan 8, 2026

Copilot AI reviewed Jan 8, 2026

View reviewed changes

eous mentioned this pull request Jan 8, 2026

feat(gpt-oss): Add CPU offload optimizer, differential LR/WD, and more #2205

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(training): add freeze_router_bias and freeze_expert_bias configs… #2215

feat(training): add freeze_router_bias and freeze_expert_bias configs… #2215

Uh oh!

eous commented Jan 8, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Jan 8, 2026

Uh oh!

Copilot AI Jan 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	elif freeze_expert and ("experts.mlp1_bias" in name or "experts.mlp2_bias" in name):
	if freeze_expert and ("experts.mlp1_bias" in name or "experts.mlp2_bias" in name):

feat(training): add freeze_router_bias and freeze_expert_bias configs… #2215

Are you sure you want to change the base?

feat(training): add freeze_router_bias and freeze_expert_bias configs… #2215

Uh oh!

Conversation

eous commented Jan 8, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Jan 8, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 8, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant