🚀 Added
Qwen2.5-VL 🤝 Workflows
Thanks to @Matvezy, we’ve enabled Qwen2.5-VL model in inference
and the Workflows ecosystem, making it easier than ever to utilise its powerful vision-language capabilities. 🎉
💡 About Qwen2.5-VL
Qwen2.5-VL is a Visual Language model which understands both images and text, allowing it to analyze documents, detect objects, and interpret videos with human-like comprehension. All of those advancements are now available in Workflows 🤯
Take a look at docs 📖 to find more details.
🚗 Speed improvements in inference
🏁
@isaacrob-roboflow is not slowing down - and with him, literally whole inference
is moving faster. In this release, he added few important changes:
- 🎯 Torch-base images pre-processing: ability to run pre-processing on GPU using
pytorch
to utilise underlying hardware more efficiently. - 💡 ONNX IO bindings enabled: technique which minimise data round-trip time to and from memory (especially helpful when pre-processing can happen directly on GPU)
- 🕵️ Details of the change: #941
Want to hear about results?
- 🏍️ For big images (for instance of size
2048 x 2048
) we observe substantial drop in inference latency - in example case latency dropped from 130-140ms 👉 50-60ms - that's nearly 3x speedup 🤯
We can't wait future optimisations 💪
💻 New home page for inference
docs
You may have already noticed, but just to be sure, let's take a look at the new design of inference
home page prepared by @isoceles.
Check out here
🚨 Deprecated
Caution
We needed to take immediate actions. Vulnerability was detected in transformers
library forced us to introduce changes into inference
dependencies, effectively removing part of the components which we could not prepare security patches for. Vulnerability description CVE-2024-11393:
Hugging Face Transformers MaskFormer Model Deserialization of Untrusted Data Remote Code Execution Vulnerability. This vulnerability allows remote attackers to execute arbitrary code on affected installations of Hugging Face Transformers. User interaction is required to exploit this vulnerability in that the target must visit a malicious page or open a malicious file. The specific flaw exists within the parsing of model files. The issue results from the lack of proper validation of user-supplied data, which can result in deserialization of untrusted data. An attacker can leverage this vulnerability to execute code in the context of the current user.
We advise all clients to migrate into inference 0.38.0
and stop using old builds in production.
🐳 CogVLM was removed
First of the implication of CVE-2024-11393 was upgrade to newest transformers
version which was conflicting with CogVLM model requirements - as a result we decided to end of support of the model in inference
.
As far as we are concerned, CogVLM turned out not to be the most popular model in the ecosystem, but if anyone is looking for alternatives - we can suggest other models existing in inference
and Workflows ecosystem - like newly introduced Qwen2.5-VL.
Effective immediately, model was removed from the library, we left usage examples and and stub Workflow block which would fire an error if one tries to run it in Execution Engine providing the information about deprecation.
🐍 Python 3.8 no longer supported
Python 3.8 has already reached End Of Life and as a result - many libraries dropped support for this version of the interpreter. We tried to keep our codebase compatible as long as possible, but we were not able to apply security patch (as newer version of dependencies related to transformers
already dropped support). As a result - inference
will no longer support Python 3.8.
🥡 End Of Life - Jetson with Jetpack 4.5
As a result of Python 3.8 deprecation, we also needed to abandon builds for Jetson with Jetpack 4.5 which were bounded into this Python version.
📗 Other changes
- Prepare version of Workflows EE where thread pool executor is injectable instead of created at each run by @PawelPeczek-Roboflow in #1014
- Add changes to make it possible to register WebHooks by @PawelPeczek-Roboflow in #1020
- Fix docs homepage mobile nav by @yeldarby in #1023
- Aspect ratio operation by @EmilyGavrilenko in #1025
- Workflow error message improvements by @EmilyGavrilenko in #1018
- Add --metrics-enabled and --metrics-disabled to inference server start by @grzegorz-roboflow in #1024
- Expose decoding buffer size and predictions queue size as inference pipeline manager request parameters by @grzegorz-roboflow in #1022
- Update Workflows Changelog by @PawelPeczek-Roboflow in #1027
- Extend dynamic_zones block to expose updated detections as extra output by @grzegorz-roboflow in #1029
- Loginless Builder by @yeldarby in #1030
- Bump esbuild from 0.19.12 to 0.25.0 in /theme in the npm_and_yarn group across 1 directory by @dependabot in #1021
gpu_speedups
code review by @grzegorz-roboflow in #1031- Add an option to use pytorch for GPU-based image preprocessing by @isaacrob-roboflow in #941
- Handle new getWeights in RoboflowInferenceModel by @grzegorz-roboflow in #1028
- Addition of Qwen 2.5 VL to Inference and Workflows by @Matvezy in #1019
- Add rustc to OS dependencies by @grzegorz-roboflow in #975
- Fix problem with assertions by @PawelPeczek-Roboflow in #1033
- Fix broken CI by @PawelPeczek-Roboflow in #1034
- Fix broken parallel GPU CI by @PawelPeczek-Roboflow in #1035
🏅 New Contributors
Full Changelog: v0.37.1...v0.38.0