#

multimodal-deep-learning

Here are 349 public repositories matching this topic...

AI4Finance-Foundation / FinRobot

FinRobot: An Open-Source AI Agent Platform for Financial Applications using LLMs 🚀 🚀 🚀

finance multimodal-deep-learning robo-advisor large-language-models prompt-engineering chatgpt fingpt aiagent

Updated Jun 3, 2024
Jupyter Notebook

salesforce / LAVIS

LAVIS - A One-stop Library for Language-Vision Intelligence

deep-learning salesforce image-captioning deep-learning-library vision-framework vision-and-language multimodal-deep-learning multimodal-datasets vision-language-transformer vision-language-pretraining visual-question-anwsering

Updated Jun 3, 2024
Jupyter Notebook

geoaigroup / awesome-vision-language-models-for-earth-observation

A curated list of awesome vision and language resources for earth observation.

awesome remote-sensing awesome-list earth-observation vision-and-language multimodal-deep-learning

Updated Jun 3, 2024

Yuan-ManX / ai-multimodal-timeline

Here we will track the latest AI Multimodal Models, including Multimodal Foundation Models, LLM, Audio, Image, Video, Music and 3D content. 🔥

ai multi-modal deeplearning-ai multimodal multimodal-deep-learning llm

Updated Jun 3, 2024

KimMeen / Time-LLM

[ICLR 2024] Official implementation of " 🦙 Time-LLM: Time Series Forecasting by Reprogramming Large Language Models"

machine-learning deep-learning time-series language-model time-series-analysis time-series-forecast time-series-forecasting multimodal-deep-learning cross-modality multimodal-time-series cross-modal-learning prompt-tuning large-language-models

Updated Jun 3, 2024
Python

Awesome-Text-to-Image

Yutong-Zhou-cv / Awesome-Text-to-Image

(ෆ`꒳´ෆ) A Survey on Text-to-Image Generation/Synthesis.

survey generative-adversarial-network image-manipulation image-generation text-to-image image-synthesis multimodal multimodal-deep-learning awseome-list text-to-face

Updated Jun 3, 2024

multimodal-supernovae

ThomasHelfer / multimodal-supernovae

A codebase dedicated to exploring multimodal learning approaches by integrating images of host galaxies of supernovae and their corresponding light-curves and spectra.

pytorch astro multimodal-deep-learning

Updated Jun 3, 2024
Jupyter Notebook

friedrichor / Awesome-Multimodal-Papers

A curated list of awesome Multimodal studies.

deep-learning multimodal-learning multimodal multimodal-deep-learning multimodal-data multimodal-dialogue multimodal-large-language-models large-multimodal-models

Updated Jun 2, 2024
HTML

pytorch-widedeep

jrzaurin / pytorch-widedeep

A flexible package for multimodal-deep-learning to combine tabular data with text and images using Wide and Deep models in Pytorch

python deep-learning text images tabular-data pytorch pytorch-cv multimodal-deep-learning pytorch-nlp pytorch-transformers model-hub pytorch-tabular-data

Updated Jun 3, 2024
Python

MMMU-Benchmark / MMMU

This repo contains evaluation code for the paper "MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI"

machine-learning natural-language-processing deep-neural-networks computer-vision deep-learning evaluation question-answering stem multimodality multimodal-learning visual-question-answering multimodal multimodal-deep-learning foundation-models large-language-models llm llms large-multimodal-models

Updated May 31, 2024
Python

nicolay-r / nicolay-r

This is my personal news list updates in Information Retrieval domain

nlp information-retrieval tensorflow torch language-model relation-extraction multimodal-deep-learning tranformers large-language-models

Updated May 31, 2024

darmangerd / vubot

Multimodal Computer Vision application leveraging object detections, gesture recognition and speech to text, in order to help user ask questions about their environment.

computer-vision speech-recognition object-detection gesture-recognition multimodal multimodal-deep-learning

Updated May 31, 2024
Python

VisualWebBench / VisualWebBench

Evaluation framework for paper "VisualWebBench: How Far Have Multimodal LLMs Evolved in Web Page Understanding and Grounding?"

machine-learning natural-language-processing computer-vision deep-learning evaluation question-answering visual-question-answering multimodal multimodal-deep-learning foundation-models large-language-models llm llms mllm multimodal-large-language-models large-multimodal-models

Updated May 31, 2024
Python

darrylnurse / viewvie

Movie detection application.

react nodejs python express computer-vision ffmpeg react-router embeddings google-cloud-platform multimodal-deep-learning vector-database

Updated May 30, 2024
JavaScript

a-tabaza / fairouz_demo

Demo for Binding Text, Images, Graphs, and Audio for Music Representation Learning

music-information-retrieval multimodal-deep-learning joint-embedding

Updated May 28, 2024
Python

zhu-xlab / DOFA

Code for Neural Plasticity-Inspired Foundation Model for Observing the Earth Crossing Modalities

geospatial remote-sensing earth-science earth-observation multimodal-deep-learning foundation-models

Updated May 27, 2024
Jupyter Notebook

Anne-Andresen / Hybrid-GAN-C-and-cpp-implementation

Pure C 3D Hybrid GAN using Cross attention, attention and convolution

c cpp cuda transformers pytorch medical-imaging gan attention-mechanism 3d 3d-models low-level-programming multimodal-deep-learning transformer-pytorch gan-models cross-attention cross-attention-c transformers-c

Updated May 26, 2024
C

aehrc / cvt2distilgpt2

Improving Chest X-Ray Report Generation by Leveraging Warm-Starting

pytorch image-captioning medical-image-analysis multimodal multimodal-deep-learning mimic-cxr gpt-2 pytorch-lightning huggingface-transformers distilgpt2 chest-xray-imaging vision-transformer

Updated May 26, 2024
Python

sisinflab / Ducho

Python framework to extract multimodal features for multimodal recommendation in a highly-customizable way.

docker sentiment-analysis tensorflow transformers pytorch image-classification recommender-system convolutional-neural-networks audio-processing multimodal-deep-learning acmmm2023 thewebconf2024

Updated May 25, 2024
Python

DongmingShenDS / Multi-Modal-ML-Project

A data science project to predict online pet adoption speed using image, natural language, and tabular data with a multi-modal ML framework.

machine-learning-algorithms ordinal-classification multimodal-deep-learning bert-embeddings vision-transformer

Updated May 24, 2024
Jupyter Notebook

Improve this page

Add a description, image, and links to the multimodal-deep-learning topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the multimodal-deep-learning topic, visit your repo's landing page and select "manage topics."