Have the plan to support fp8 inference? #19671

lingzhi98 · 2024-05-06T05:49:00Z

fp8 training is supported in keras. Does keras have plan to support fp8 inference? Maybe naive solution is enough like TransformerEngine.

fchollet · 2024-05-06T18:19:08Z

@james77777778 any thoughts on this?

james77777778 · 2024-05-07T01:49:59Z

If the model is trained with fp8, it is ready for inference. We can fix the scaling factor and drop the amax_history if we don't train the model in the future.

If the model is not trained with fp8 and we don't plan to train it in the future, we need a mechanism to calibrate it. Calibration is similar to fp8 training but we only need to compute the scaling factor offline with an additional calibration dataset.

I'm unsure whether we should add the calibration logic into Keras.

lingzhi98 · 2024-05-07T02:12:32Z

Thanks for your reply. It seems keras need more discussion to decide whether to support fp8 calibration. Maybe you can update the latest progress if have any result in the future.

lingzhi98 · 2024-05-07T02:27:26Z

And for fp8 inference after fp8 training, keras seems support not well. Can we add is_training argument in float8_call to decide whether to compute new scale? New amax history is also not need.

james77777778 · 2024-05-08T01:16:18Z

And for fp8 inference after fp8 training, keras seems support not well. Can we add is_training argument in float8_call to decide whether to compute new scale? New amax history is also not need.

Since #19682 has been merged, you can set training=False for the layer (or model) to skip the computation of both the scaling factor and amax history.
The variable for amax history will still be retained but it should occupy a small portion of memory.

lingzhi98 · 2024-05-08T02:05:33Z

Thanks, will test it soon.

github-actions bot assigned sachinprasadhs May 6, 2024

sachinprasadhs added the type:feature The user is asking for a new feature. label May 6, 2024

james77777778 mentioned this issue May 7, 2024

Add training=False behavior for float8-trained Dense and EinsumDense #19682

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Have the plan to support fp8 inference? #19671

Have the plan to support fp8 inference? #19671

lingzhi98 commented May 6, 2024

fchollet commented May 6, 2024

james77777778 commented May 7, 2024

lingzhi98 commented May 7, 2024

lingzhi98 commented May 7, 2024

james77777778 commented May 8, 2024

lingzhi98 commented May 8, 2024

Have the plan to support fp8 inference? #19671

Have the plan to support fp8 inference? #19671

Comments

lingzhi98 commented May 6, 2024

fchollet commented May 6, 2024

james77777778 commented May 7, 2024

lingzhi98 commented May 7, 2024

lingzhi98 commented May 7, 2024

james77777778 commented May 8, 2024

lingzhi98 commented May 8, 2024