An open-source implementation of LLaVA-NeXT.
-
Updated
May 27, 2024 - Python
An open-source implementation of LLaVA-NeXT.
InternLM-XComposer2 is a groundbreaking vision-language large model (VLLM) excelling in free-form text-image composition and comprehension.
[ACL ARR Under Review] Dataset and Code of "ImplicitAVE: An Open-Source Dataset and Multimodal LLMs Benchmark for Implicit Attribute Value Extraction"
[CVPR 2024 Highlight🔥] Chat-UniVi: Unified Visual Representation Empowers Large Language Models with Image and Video Understanding
Collection of AWESOME vision-language models for vision tasks
[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4V. 接近GPT-4V表现的可商用开源多模态对话模型
MLX-VLM is a package for running Vision LLMs locally on your Mac using MLX.
Microsoft Phi-3 Vision-the first Multimodal model By Microsoft- Demo With Huggingface
🎉 PILOT: A Pre-trained Model-Based Continual Learning Toolbox
[CVPR 2024] Official PyTorch Code for "PromptKD: Unsupervised Prompt Distillation for Vision-Language Models"
日本語LLMまとめ - Overview of Japanese LLMs
A comprehensive collection of multilingual datasets and large language models, meticulously curated for evaluating and enhancing the performance of large language models across diverse languages and tasks.
FreeVA: Offline MLLM as Training-Free Video Assistant
Official implementation of our IEEE Access paper (2024), ZEN-IQA: Zero-Shot Explainable and No-Reference Image Quality Assessment with Vision Language Model
A library for marking web pages for Set-of-Mark (SoM) prompting with vision-language models.
In the dynamic landscape of medical artificial intelligence, this study explores the vulnerabilities of the Pathology Language-Image Pretraining (PLIP) model, a Vision Language Foundation model, under targeted attacks like PGD adversarial attack.
A Python tool to evaluate the performance of VLM on the medical domain.
A curated list of awesome knowledge-driven autonomous driving (continually updated)
The official repo of Qwen-VL (通义千问-VL) chat & pretrained large vision language model proposed by Alibaba Cloud.
Add a description, image, and links to the vision-language-model topic page so that developers can more easily learn about it.
To associate your repository with the vision-language-model topic, visit your repo's landing page and select "manage topics."