Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Have the plan to support fp8 inference? #19671

Open
lingzhi98 opened this issue May 6, 2024 · 6 comments
Open

Have the plan to support fp8 inference? #19671

lingzhi98 opened this issue May 6, 2024 · 6 comments
Assignees
Labels
type:feature The user is asking for a new feature.

Comments

@lingzhi98
Copy link

fp8 training is supported in keras. Does keras have plan to support fp8 inference? Maybe naive solution is enough like TransformerEngine.

@fchollet
Copy link
Member

fchollet commented May 6, 2024

@james77777778 any thoughts on this?

@sachinprasadhs sachinprasadhs added the type:feature The user is asking for a new feature. label May 6, 2024
@james77777778
Copy link
Contributor

If the model is trained with fp8, it is ready for inference. We can fix the scaling factor and drop the amax_history if we don't train the model in the future.

If the model is not trained with fp8 and we don't plan to train it in the future, we need a mechanism to calibrate it. Calibration is similar to fp8 training but we only need to compute the scaling factor offline with an additional calibration dataset.

I'm unsure whether we should add the calibration logic into Keras.

@lingzhi98
Copy link
Author

Thanks for your reply. It seems keras need more discussion to decide whether to support fp8 calibration. Maybe you can update the latest progress if have any result in the future.

@lingzhi98
Copy link
Author

And for fp8 inference after fp8 training, keras seems support not well. Can we add is_training argument in float8_call to decide whether to compute new scale? New amax history is also not need.

@james77777778
Copy link
Contributor

And for fp8 inference after fp8 training, keras seems support not well. Can we add is_training argument in float8_call to decide whether to compute new scale? New amax history is also not need.

Since #19682 has been merged, you can set training=False for the layer (or model) to skip the computation of both the scaling factor and amax history.
The variable for amax history will still be retained but it should occupy a small portion of memory.

@lingzhi98
Copy link
Author

Thanks, will test it soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type:feature The user is asking for a new feature.
Projects
None yet
Development

No branches or pull requests

4 participants