vision-language-model

Here are 112 public repositories matching this topic...

xiaoachen98 / Open-LLaVA-NeXT

An open-source implementation of LLaVA-NeXT.

chatbot llama multimodal multi-modality gpt-4 visual-language-learning chatgpt vision-language-model llava large-multimodal-models llama3 gpt4o llava-next

Updated May 27, 2024
Python

InternLM / InternLM-XComposer

Star

InternLM-XComposer2 is a groundbreaking vision-language large model (VLLM) excelling in free-form text-image composition and comprehension.

foundation gpt language-model multimodal multi-modality vision-transformer gpt-4 visual-language-learning llm chatgpt instruction-tuning large-language-model supervised-finetuning mllm vision-language-model large-vision-language-model

Updated May 27, 2024
Python

HenryPengZou / ImplicitAVE

Star

[ACL ARR Under Review] Dataset and Code of "ImplicitAVE: An Open-Source Dataset and Multimodal LLMs Benchmark for Implicit Attribute Value Extraction"

attribute-value-extraction vision-language-model multimodal-llm implicit-attribute-value-extraction

Updated May 27, 2024
Jupyter Notebook

PKU-YuanGroup / Chat-UniVi

Star

[CVPR 2024 Highlight🔥] Chat-UniVi: Unified Visual Representation Empowers Large Language Models with Image and Video Understanding

video-understanding image-understanding large-language-models vision-language-model

Updated May 27, 2024
Python

jingyi0000 / VLM_survey

Star

Collection of AWESOME vision-language models for vision tasks

computer-vision deep-learning survey transfer-learning clip knowledge-distillation vision-language-model multi-modal-model

Updated May 27, 2024

OpenGVLab / InternVL

Star

[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4V. 接近GPT-4V表现的可商用开源多模态对话模型

image-classification gpt multi-modal semantic-segmentation video-classification mme image-text-retrieval llm vision-language-model gpt-4v vit-6b vit-22b gpt-4o

Updated May 26, 2024
Python

Blaizzy / mlx-vlm

Star

MLX-VLM is a package for running Vision LLMs locally on your Mac using MLX.

mlx vision-framework apple-silicon vision-transformer llm vision-language-model llava local-ai idefics paligemma

Updated May 26, 2024
Python

shrimantasatpati / Microsoft-Phi-3-Vision

Star

Microsoft Phi-3 Vision-the first Multimodal model By Microsoft- Demo With Huggingface

opensource vision-language-model phi-3-vision phi-3-mini microsoft-phi3

Updated May 26, 2024
Jupyter Notebook

sun-hailong / LAMDA-PILOT

Star

🎉 PILOT: A Pre-trained Model-Based Continual Learning Toolbox

machine-learning deep-learning toolkit reproducible-research pytorch incremental-learning lifelong-learning continual-learning pre-trained-models vision-transformer vision-language-model

Updated May 26, 2024
Python

zhengli97 / PromptKD

Star

[CVPR 2024] Official PyTorch Code for "PromptKD: Unsupervised Prompt Distillation for Vision-Language Models"

clip knowledge-distillation multi-modal-learning prompt-learning vision-language-model cvpr2024

Updated May 25, 2024
Python

llm-jp / awesome-japanese-llm

Star

日本語LLMまとめ - Overview of Japanese LLMs

japanese generative-model japanese-language language-models language-model generative-models multimodal vision-and-language vision-language foundation-models large-language-models llm llms generative-ai large-language-model vision-language-model japanese-llm japanese-language-model llm-japanese

Updated May 24, 2024

srvCodes / clap4clip

Star

bayesian-inference variational-inference continual-learning catastrophic-forgetting vision-language-model

Updated May 24, 2024
Python

zabir-nabil / awesome-multilingual-large-language-models

Star

A comprehensive collection of multilingual datasets and large language models, meticulously curated for evaluating and enhancing the performance of large language models across diverse languages and tasks.

multilingual awesome-machine-learning awesome-nlp large-language-models llms large-language-model vision-language-model large-language-models-and-translation-systems multilingual-llm gpt-other-languages llms-language awesome-large-language-models

Updated May 23, 2024

whwu95 / FreeVA

Star

FreeVA: Offline MLLM as Training-Free Video Assistant

chatbot video-understanding zero-shot-video-captioning video-question-answering chatgpt vision-language-model llava training-free multimodal-large-language-models

Updated May 22, 2024
Python

mtakamichi / ZEN-IQA

Star

Official implementation of our IEEE Access paper (2024), ZEN-IQA: Zero-Shot Explainable and No-Reference Image Quality Assessment with Vision Language Model

pytorch clip iqa image-quality-assessment blind-image-quality-assessment pytorch-implementation nr-iqa vision-language-model

Updated May 21, 2024
Python

reidbarber / webmarker

Star

A library for marking web pages for Set-of-Mark (SoM) prompting with vision-language models.

som prompt prompt-engineering vision-language-model set-of-mark

Updated May 19, 2024
TypeScript

jaiprakash1824 / VLM_Adv_Attack

Star

In the dynamic landscape of medical artificial intelligence, this study explores the vulnerabilities of the Pathology Language-Image Pretraining (PLIP) model, a Vision Language Foundation model, under targeted attacks like PGD adversarial attack.