Skip to content

Add the FTRL (Follow The Regularized Leader) optimizer. #268

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

copybara-service[bot]
Copy link

@copybara-service copybara-service bot commented May 29, 2025

Add the FTRL (Follow The Regularized Leader) optimizer.

This implementation is based on the FTRL algorithm, McMahan et al., 2013.

Features / Params in FTRLOptimizerSpec (as used in the primitive):

  • learning_rate: The base learning rate.
  • learning_rate_power: Controls the per-coordinate learning rate decay (typically -0.5).
  • l1_regularization_strength: Applies L1 regularization, which can lead to sparsity in the model weights.
  • l2_regularization_strength: Applies L2 regularization.
  • beta: An additional smoothing term..
  • clip_weight_min, clip_weight_max: Optional bounds for clipping the updated embedding weights.
  • weight_decay_factor: Factor for applying weight decay to the gradients.
  • multiply_weight_decay_factor_by_learning_rate: Boolean flag; if true, the weight_decay_factor is multiplied by the learning_rate before applying decay.
  • multiply_linear_by_learning_rate: Boolean flag; if true, the linear term update incorporates the learning_rate differently.
  • allow_zero_accumulator: Boolean flag; if true, allows the accumulator to be exactly zero. Otherwise, a small epsilon is added for numerical stability when accumulator is zero.

The optimizer maintains two slot variables for each trainable embedding parameter:

  • accumulator: Stores the sum of squared gradients, used to adapt the learning rate on a per-coordinate basis.
  • linear: Stores a linear combination related to the gradients, which is central to the FTRL weight update rule.

@copybara-service copybara-service bot changed the title Add the Adam optimizer from [Kingma et al., 2014](http://arxiv.org/abs/1412.6980). Add the FTRL (Follow The Regularized Leader) optimizer. May 29, 2025
@copybara-service copybara-service bot force-pushed the test_764794731 branch 3 times, most recently from 40d69c6 to 63f5235 Compare June 3, 2025 15:51
This implementation is based on the FTRL algorithm, [McMahan et al., 2013](https://research.google.com/pubs/archive/41159.pdf).

Features / Params in `FTRLOptimizerSpec` (as used in the primitive):
- **learning_rate**: The base learning rate.
- **learning_rate_power**: Controls the per-coordinate learning rate decay (typically -0.5).
- **l1_regularization_strength**: Applies L1 regularization, which can lead to sparsity in the model weights.
- **l2_regularization_strength**: Applies L2 regularization.
- **beta**: An additional smoothing term..
- **clip_weight_min**, **clip_weight_max**: Optional bounds for clipping the updated embedding weights.
- **weight_decay_factor**: Factor for applying weight decay to the gradients.
- **multiply_weight_decay_factor_by_learning_rate**: Boolean flag; if true, the `weight_decay_factor` is multiplied by the `learning_rate` before applying decay.
- **multiply_linear_by_learning_rate**: Boolean flag; if true, the linear term update incorporates the `learning_rate` differently.
- **allow_zero_accumulator**: Boolean flag; if true, allows the accumulator to be exactly zero. Otherwise, a small epsilon is added for numerical stability when `accumulator` is zero.

The optimizer maintains two slot variables for each trainable embedding parameter:
- **accumulator**: Stores the sum of squared gradients, used to adapt the learning rate on a per-coordinate basis.
- **linear**: Stores a linear combination related to the gradients, which is central to the FTRL weight update rule.

PiperOrigin-RevId: 764794731
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant