A modular framework for vision & language multimodal research from Facebook AI Research (FAIR)
-
Updated
Mar 3, 2024 - Python
A modular framework for vision & language multimodal research from Facebook AI Research (FAIR)
Bottom-up attention model for image captioning and VQA, based on Faster R-CNN and Visual Genome
Oscar and VinVL
InternGPT (iGPT) is an open source demo platform where you can easily showcase your AI models. Now it supports DragGAN, ChatGPT, ImageBind, multimodal chat like GPT-4, SAM, interactive image editing, etc. Try it at igpt.opengvlab.com (支持DragGAN、ChatGPT、ImageBind、SAM的在线Demo系统)
An efficient PyTorch implementation of the winning entry of the 2017 VQA Challenge.
Visual Question Answering in Pytorch
Implementation for the paper "Compositional Attention Networks for Machine Reasoning" (Hudson and Manning, ICLR 2018)
[ICCV 2021- Oral] Official PyTorch implementation for Generic Attention-model Explainability for Interpreting Bi-Modal and Encoder-Decoder Transformers, a novel method to visualize any Transformer-based network. Including examples for DETR, VQA.
Strong baseline for visual question answering
A curated list of Visual Question Answering(VQA)(Image/Video Question Answering),Visual Question Generation ,Visual Dialog ,Visual Commonsense Reasoning and related area.
PyTorch implementation for the Neuro-Symbolic Concept Learner (NS-CL).
[CVPR 2021 Best Student Paper Honorable Mention, Oral] Official PyTorch code for ClipBERT, an efficient framework for end-to-end learning on image-text and video-text tasks.
The implementation of "Prismer: A Vision-Language Model with Multi-Task Experts".
PyTorch implementation of "Transparency by Design: Closing the Gap Between Performance and Interpretability in Visual Reasoning"
A lightweight, scalable, and general framework for visual question answering research
Open-source evaluation toolkit of large vision-language models (LVLMs), support GPT-4v, Gemini, QwenVLPlus, 40+ HF models, 20+ benchmarks
Tensorflow Implementation of Deeper LSTM+ normalized CNN for Visual Question Answering
Add a description, image, and links to the vqa topic page so that developers can more easily learn about it.
To associate your repository with the vqa topic, visit your repo's landing page and select "manage topics."