#

vqa

Here are 231 public repositories matching this topic...

aws-samples / visual-question-answering-finetuning

Finetuning Large Visual Models on Visual Question Answering

vqa finetuning blip2 genai

Updated May 23, 2024
Jupyter Notebook

open-compass / VLMEvalKit

Open-source evaluation toolkit of large vision-language models (LVLMs), support GPT-4v, Gemini, QwenVLPlus, 40+ HF models, 20+ benchmarks

computer-vision evaluation pytorch gemini openai vqa vit gpt multi-modal clip claude openai-api gpt4 large-language-models llm chatgpt llava qwen gpt-4v

Updated May 22, 2024
Python

j-min / DSG

Davidsonian Scene Graph (DSG) for Text-to-Image Evaluation (ICLR 2024)

vqa text-to-image dsg llm text-to-image-generation text-to-image-evaluation

Updated May 18, 2024
Jupyter Notebook

oliverc1623 / ceriad

An extension of the Planner-Actor-Reporter framework applied to autonomous vehicles in Highway-Env and CARLA.

reinforcement-learning vqa autonomous-driving autonomous-vehicles carla-simulator highway-env llms llava

Updated May 8, 2024
Python

ai4streaming-workshop / ai4streaming-workshop.github.io

AIS: Vision, Graphics and AI for Streaming Workshop at CVPR 2024

streaming compression video workshop computer-vision deep-learning gaming graphics efficiency neural-networks vqa codec super-resolution cvpr av1 ugc iqa vmaf

Updated May 8, 2024
CSS

reshalfahsi / vqa-clip-lstm

Visual Question Answering Using CLIP + LSTM

nlp pytorch lstm vqa clip visual-question-answering pytorch-lightning vizwiz-vqa

Updated May 5, 2024
Jupyter Notebook

AIRI-Institute / OmniFusion

OmniFusion — a multimodal model to communicate using text and images

transformer vqa vcr visual-encoding multimodal ai-assistant large-language-models

Updated Apr 28, 2024
Python

AdrianBZG / llama-multimodal-vqa

Multimodal Instruction Tuning for Llama 3

chatbot vqa llama language-models visual-question-answering multimodal huggingface gpt-4 visual-language-learning chatgpt instruction-tuning multimodal-instruction-tuning llama2 llama3

Updated Apr 25, 2024
Python

rabiulcste / vqazero

visual question answering prompting recipes for large vision-language models

vqa vision-and-language prompt-engineering

Updated Apr 22, 2024
Python

OpenGVLab / Multi-Modality-Arena

Chatbot Arena meets multi-modality! Multi-Modality Arena allows you to benchmark vision-language models side-by-side while providing images as inputs. Supports MiniGPT-4, LLaMA-Adapter V2, LLaVA, BLIP-2, and many more!

chat chatbot vqa gradio multi-modality large-language-models llms chatgpt vision-language-model

Updated Apr 21, 2024
Python

avinabsaha / HIDRO-VQA

Official Implementation of WACV 2024 Paper "HIDRO-VQA : High Dynamic Range Oracle for Video Quality Assessment"

video vqa hdr video-quality-assessment self-supervised-learning contrastive-learning videoqualityassessment wacv2024

Updated Apr 21, 2024
Python

OpenQuantumComputing / QAOA

This package is a flexible python implementation of the Quantum Approximate Optimization Algorithm /Quantum Alternating Operator ansatz (QAOA) aimed at researchers to readily test the performance of a new ansatz, a new classical optimizers, etc.

quantum-computing vqa quantumcomputing qaoa

Updated Apr 17, 2024
Python

XIRZC / rec2vqa

This repository is about Referring Expression Comprehension Based Visual Question Answering.

django vue vqa hic rec vlbert

Updated Apr 13, 2024
Jupyter Notebook

eric-ai-lab / MultipanelVQA

Code for the MultipanelVQA benchmark "Muffin or Chihuahua? Challenging Large Vision-Language Models with Multipanel VQA"

vqa vlm mllm screen-ai multipanel-understanding

Updated Apr 11, 2024
Jupyter Notebook

gutbash / lmm-graph-vision

How well do the GPT-4V, Gemini Pro Vision, and Claude 3 Opus models perform zero-shot vision tasks on data structures?

data-structures openai vqa visual-question-answering vqa-dataset google-generative-ai gpt-4v gpt-4-vision-preview gemini-pro-vision claude-3

Updated Apr 8, 2024
Python

csebuetnlp / IllusionVQA

This repository contains the data and code of the paper titled "IllusionVQA: A Challenging Optical Illusion Dataset for Vision Language Models"

vqa vqa-dataset optical-illusions visual-language-models

Updated Apr 8, 2024
Jupyter Notebook

LLaVA-CLI-with-multiple-images

mapluisch / LLaVA-CLI-with-multiple-images

LLaVA inference with multiple images at once for cross-image analysis.

python image-processing inference pillow python3 pytorch vqa lmms visual-question-answering lmm image-concatenation llava llama2 llama2-13b

Updated Mar 25, 2024
Python

pranavgupta2603 / CLIP-ViL-GradCAM

An implemention of CLIP-ViL Gradcam for VQA tasks

vqa clip explainable-ai multimodal gradcam

Updated Mar 15, 2024
Jupyter Notebook

FuxiaoLiu / LRV-Instruction

[ICLR'24] Mitigating Hallucination in Large Multi-Modal Models via Robust Instruction Tuning

evaluation vision vqa llama object-detection gpt evaluation-metrics iclr multimodal vision-and-language hallucination vicuna gpt-4 foundation-models prompt-engineering chatgpt llava iclr2024

Updated Mar 13, 2024
Python

hoannc0506 / Visual-Question-Answering

vqa vision-language-model

Updated Mar 11, 2024
Python

Improve this page

Add a description, image, and links to the vqa topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the vqa topic, visit your repo's landing page and select "manage topics."