Orchestrate Swarms of Agents From Any Framework Like OpenAI, Langchain, and Etc for Business Operation Automation. Join our Community: https://discord.gg/DbjBMJTSWD
-
Updated
May 27, 2024 - Python
Orchestrate Swarms of Agents From Any Framework Like OpenAI, Langchain, and Etc for Business Operation Automation. Join our Community: https://discord.gg/DbjBMJTSWD
[CVPR'24] RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human Feedback
An open-source implementation of LLaVA-NeXT.
InternLM-XComposer2 is a groundbreaking vision-language large model (VLLM) excelling in free-form text-image composition and comprehension.
✨✨Latest Papers and Datasets on Multimodal Large Language Models, and Their Evaluation.
Simple Implementation of a Transformer in the new framework MLX by Apple
The open source implementation of Gemini, the model that will "eclipse ChatGPT" by Google
Multi-Modal Tree of thoughts for DALLE-3 like auto self improvement
Implementation of MoE Mamba from the paper: "MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts" in Pytorch and Zeta
Implementation of MambaByte in "MambaByte: Token-free Selective State Space Model" in Pytorch and Zeta
My implementation of Kosmos2.5 from the paper: "KOSMOS-2.5: A Multimodal Literate Model"
Implementation of Qformer from BLIP2 in Zeta Lego blocks.
Algorithms and Publications on 3D Object Tracking
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
Toward Multi Modality Language Model - implementation of GPT-4o/Project Astra
This is the official repository for Vista dataset - A Vietnamese multimodal dataset contains more than 700,000 samples of conversations and images
Official code for NeurIPS2023 paper: CoDA: Collaborative Novel Box Discovery and Cross-modal Alignment for Open-vocabulary 3D Object Detection
[CVPR2024 Highlight] Official Code for "ImageNet-D: Benchmarking Neural Network Robustness on Diffusion Synthetic Object"
PyTorch Implementation of the paper "MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training"
Chatbot Arena meets multi-modality! Multi-Modality Arena allows you to benchmark vision-language models side-by-side while providing images as inputs. Supports MiniGPT-4, LLaMA-Adapter V2, LLaVA, BLIP-2, and many more!
Add a description, image, and links to the multi-modality topic page so that developers can more easily learn about it.
To associate your repository with the multi-modality topic, visit your repo's landing page and select "manage topics."