-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for MLprogram in ort_coreml #116
Conversation
It enables fp16 computation on ANE, instead of allocating all to CPU. However, the MLprogram is not well-supported currently, supporting much less EPs than regular NeuralNetwork.
Need onnxruntime >= 1.20.0 |
Interesting and thanks for the information. |
please use snake case and place the ml_program param to the end of the param list |
test on m2pro: ml_program=1 + fp16=True: ANE 120% usage, 10.52fps script: import vapoursynth as vs
from vapoursynth import core
import vsmlrt
src = core.lsmas.LWLibavSource('/path/to/source').resize.Spline36(1920//2, 1080//2, format=vs.RGBS, matrix_in_s="709") # same performance if format=RGBH
fin = vsmlrt.Waifu2x(clip=src, noise=-1, scale=2, backend=vsmlrt.Backend.ORT_COREML(ml_program=1, fp16=True), model=vsmlrt.Waifu2xModel.anime_style_art_rgb) # anime_style_art_rgb uses simplest Ops, almost all supported by tested backends
fin.set_output(0) |
Thanks for your contribution! |
Onnxruntime supports two Core ML execution providers: NeuralNetwork and MLProgram. The NeuralNetwork provider is the default choice as it supports a wider range of operators, but it does not support FP16 precision (so all nodes falls to CPUExecutionProvider).
The MLProgram provider, while newer and currently supporting fewer operators, does support FP16 and is under active development. (Recent GitHub PRs suggest that it will mature rapidly, adding tens of new operators). Although it might be slower now due to limited operator support, once it achieves comprehensive coverage, the potential CPU/GPU acceleration through FP16 could make it perform better than the NeuralNetwork provider.
In Onnxruntime, the ONNX model is converted to a Core ML model and saved to disk, which is then loaded via Apple's CoreML framework. By choosing FP16 inputs with the MLProgram provider, we can significantly reduce both memory and disk usage as the Core ML model will be stored in a more compact FP16 format. While the ANE always performs computations in FP16 internally regardless of input precision, making FP16 acceleration unnecessary for the neural engine itself, the storage benefits remain valuable.
Moreover, FP16 inputs may accelerate computations on GPU and CPU, as both support FP16 (though not enabled by default). However, the exact behavior of FP16 handling in Onnxruntime remains unclear due to its complex execution flow: ORT first decides which nodes to assign to CoreML, uses CPUExecutionProvider for the rest, and then CoreML further distributes its nodes among CPU, GPU, and ANE.
For more details on FP16 behavior, refer to this documentation: 16-bit precision in Core ML on ANE.
ML Program relevant PRs: microsoft/onnxruntime#19347 microsoft/onnxruntime#22068 microsoft/onnxruntime#22480 microsoft/onnxruntime#22710 and so on.