[ACL 2024 🔥] Video-ChatGPT is a video conversation model capable of generating meaningful conversation about videos. It combines the capabilities of LLMs with a pretrained visual encoder adapted for spatiotemporal video representation. We also introduce a rigorous 'Quantitative Evaluation Benchmarking' for video-based conversational models.
chatbot
llama
clip
mulit-modal
vision-language
vicuna
gpt-4
vision-language-pretraining
llava
video-chatboat
video-conversation
-
Updated
May 20, 2024 - Python