You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Add the FTRL (Follow The Regularized Leader) optimizer.
This implementation is based on the FTRL algorithm, [McMahan et al., 2013](https://research.google.com/pubs/archive/41159.pdf).
Features / Params in `FTRLOptimizerSpec` (as used in the primitive):
- **learning_rate**: The base learning rate.
- **learning_rate_power**: Controls the per-coordinate learning rate decay (typically -0.5).
- **l1_regularization_strength**: Applies L1 regularization, which can lead to sparsity in the model weights.
- **l2_regularization_strength**: Applies L2 regularization.
- **beta**: An additional smoothing term..
- **clip_weight_min**, **clip_weight_max**: Optional bounds for clipping the updated embedding weights.
- **weight_decay_factor**: Factor for applying weight decay to the gradients.
- **multiply_weight_decay_factor_by_learning_rate**: Boolean flag; if true, the `weight_decay_factor` is multiplied by the `learning_rate` before applying decay.
- **multiply_linear_by_learning_rate**: Boolean flag; if true, the linear term update incorporates the `learning_rate` differently.
- **allow_zero_accumulator**: Boolean flag; if true, allows the accumulator to be exactly zero. Otherwise, a small epsilon is added for numerical stability when `accumulator` is zero.
The optimizer maintains two slot variables for each trainable embedding parameter:
- **accumulator**: Stores the sum of squared gradients, used to adapt the learning rate on a per-coordinate basis.
- **linear**: Stores a linear combination related to the gradients, which is central to the FTRL weight update rule.
PiperOrigin-RevId: 764794731
0 commit comments