Vision and Language Group@ MIL

All

15 repositories

twigvlm
Public
Implementation of ICCV 2025 paper "Growing a Twig to Accelerate Large Vision-Language Models".
pytorch inference-acceleration token-pruning vision-language-models
Python
•
Apache License 2.0
•1•3•0•0•Updated Jul 29, 2025Jul 29, 2025
prophet
Public
Implementation of CVPR 2023 paper "Prompting Large Language Models with Answer Heuristics for Knowledge-based Visual Question Answering".
pytorch visual-question-answering multimodal-deep-learning gpt-3 prompt-engineering okvqa a-okvqa
Python
•
Apache License 2.0
•28•276•3•0•Updated Jun 14, 2025Jun 14, 2025
imp
Public
a family of highly capabale yet efficient large multimodal models
Python
•
Apache License 2.0
•16•187•3•3•Updated Aug 23, 2024Aug 23, 2024
mlc-imp
Public
Enable everyone to develop, optimize and deploy AI models natively on everyone's devices.
Python
•
Apache License 2.0
•1.8k•10•0•0•Updated May 29, 2024May 29, 2024
anetqa
Public template
HTML
•1•0•0•0•Updated Mar 15, 2024Mar 15, 2024
anetqa-code
Public
Python
•
Apache License 2.0
•2•9•1•0•Updated Mar 7, 2024Mar 7, 2024
rosita
Public
ROSITA: Enhancing Vision-and-Language Semantic Alignments via Cross- and Intra-modal Knowledge Integration
vqa vision-and-language pre-training referring-expression-comprehension image-text-retrieval
Python
•
Apache License 2.0
•13•56•1•0•Updated Jun 13, 2023Jun 13, 2023
bst
Public
Python
•
Apache License 2.0
•1•5•0•0•Updated May 12, 2023May 12, 2023
xmchat
Public
Apache License 2.0
•2•30•3•0•Updated Apr 24, 2023Apr 24, 2023
bottom-up-attention.pytorch
Public
A PyTorch reimplementation of bottom-up-attention models
bottom-up-attention detectron2 pytorch
Jupyter Notebook
•
Apache License 2.0
•75•302•26•0•Updated Apr 7, 2022Apr 7, 2022
openvqa
Public
A lightweight, scalable, and general framework for visual question answering research
benchmark deep-learning pytorch vqa visual-question-answering
Python
•
Apache License 2.0
•64•325•6•0•Updated Sep 3, 2021Sep 3, 2021
mcan-vqa
Public
Deep Modular Co-Attention Networks for Visual Question Answering
attention visual-reasoning visual-question-answering
Python
•
Apache License 2.0
•89•455•1•0•Updated Dec 16, 2020Dec 16, 2020
activitynet-qa
Public
An VideoQA dataset based on the videos from ActivityNet
vqa activitynet videoqa dataset
Python
•
Apache License 2.0
•10•85•2•0•Updated Nov 22, 2020Nov 22, 2020
mmnas
Public
Deep Multimodal Neural Architecture Search
Python
•
Apache License 2.0
•8•29•1•0•Updated Nov 15, 2020Nov 15, 2020
mt-captioning
Public
A PyTorch implementation of the paper Multimodal Transformer with Multiview Visual Representation for Image Captioning
pytorch image-captioning multimodal-transformer
Python
•
Apache License 2.0
•7•25•1•1•Updated Sep 4, 2020Sep 4, 2020