-
-
Notifications
You must be signed in to change notification settings - Fork 115
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement MiniCPM-V-2 #637
Comments
Related: https://huggingface.co/spaces/opencompass/open_vlm_leaderboard https://github.com/OpenGVLab/InternVL appears to surpass MiniCPM on the benchmark. However, testing suggests it does not perform as well on GUI elements. From https://internvl.opengvlab.com/ (same image as above):
The image displays a calculator with a gray background and orange buttons. The calculator shows the number 45 in its display. The buttons are arranged in a typical layout with numbers 0 through 9, and function buttons such as addition, subtraction, multiplication, division, and percentage. The calculator also has buttons for clear, memory, and a decimal point. The top left corner of the calculator has three colored dots, which could indicate different modes or functions of the calculator.
The interactable GUI elements in the image include: The display showing the number "45". |
Also worth considering: https://huggingface.co/microsoft/Phi-3-vision-128k-instruct |
https://github.com/THUDM/GLM-4/blob/main/README_en.md
|
Feature request
https://github.com/OpenBMB/MiniCPM is the smallest multimodal model available. The latest version, https://huggingface.co/openbmb/MiniCPM-V-2, appears to be able to understand GUI images:
This model has been integrated into https://github.com/vllm-project/vllm, which is likely how we want to implement it here.
This involves creating an
openadapt.adapters.vllm
with parameterizable model name.Motivation
Fully offline support.
The text was updated successfully, but these errors were encountered: