-
Notifications
You must be signed in to change notification settings - Fork 170
📤 Add export task (coreml and tflite) #174
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
📤 Add export task (coreml and tflite) #174
Conversation
… loop, undo export param, use FastModelLoader in InferenceModel
Hi, Can you check if it is still able to run in this modification? Henry Tsui |
# Conflicts: # yolo/model/yolo.py
Hi Henry, |
I did try this PR. I think there is error in:
should be:
Also did you manage to get good performance using |
Yeah, you are right about the change. I'm still looking into the slowness. When skipping the export layers in 'if self.export_mode:' it's using ANE and is super fast. If you can help me in debugging that would be great. Eg, what code is hf or ultralytics using for that decoding layers?
|
Thanks I confirmed after switching to I wish to help but I'm afraid I'm not skilled enough to know how it works. I guess this is the final layer that doing something similar to NMS and if done outside of ML model (post inference) then whole pipeline would be slow as well? I only found this repo and issue that might be helpful (includig comments and related linked there pocketpixels/yolov5 repo) Not sure though if architecture is very different between yolov5 and yolov9. |
Asked gemini 2.5 pro about this - cannot verify but it suggested that graph is not static and suggest precomputing anchors outside and pass as input:
full LLM markdown output here: |
EDIT: My bad this is irrelevant - I tested benchmarked wrong model - this below don't fix - is still slow. I did another test with commenting out some lines in yolo.py: preds_cls = torch.concat(preds_cls, dim=1).to(x[0][0].device)
preds_anc = torch.concat(preds_anc, dim=1).to(x[0][0].device)
preds_box = torch.concat(preds_box, dim=1).to(x[0][0].device)
strides = self.get_strides(output["Main"], input_width)
# anchor_grid, scaler = self.generate_anchors([input_width, input_height], strides) #
# anchor_grid = anchor_grid.to(x[0][0].device)
# scaler = scaler.to(x[0][0].device)
# pred_LTRB = preds_box * scaler.view(1, -1, 1)
# lt, rb = pred_LTRB.chunk(2, dim=-1)
# preds_box = torch.cat([anchor_grid - lt, anchor_grid + rb], dim=-1)
return preds_cls, preds_anc, preds_box and this still runs very fast so probably gemini is right that the bottleneck code is : |
I did try to export with different combinations as well, even made stuff 'static' by using
Instead of
Next to that, pred_anc is not used in the post process code anywhere, so we can skip that. No luck yet but I'm currently busy with other projects so I will have a look at in a couple of weeks again. |
This pull requests adds a new export task including the option to export coreml and tflite format.
Use:
Next to this it adds the option to use the FastModelLoader again during inference.
Tflite export depends on ai_edge_torch which requires Python3.10
Next steps would be to add quantization and auto install missing modules