Skip to content
Change the repository type filter

All

    Repositories list

    • Reading list for research topics in multimodal machine learning
      MIT License
      850800Updated Mar 14, 2023Mar 14, 2023
    • iPerceive

      Public
      Applying Common-Sense Reasoning to Multi-Modal Dense Video Captioning and Video Question Answering | Python3 | PyTorch | CNNs | Causality | Reasoning | LSTMs | Transformers | Multi-Head Self Attention
      Python
      MIT License
      20100Updated Nov 10, 2020Nov 10, 2020
    • CVSE

      Public
      The official source code for the paper Consensus-Aware Visual-Semantic Embedding for Image-Text Matching (ECCV 2020)
      Python
      19000Updated Oct 21, 2020Oct 21, 2020
    • Starter code for the VMT task and challenge
      Python
      4000Updated Jul 29, 2020Jul 29, 2020
    • A real time Multimodal Emotion Recognition web app for text, sound and video inputs
      Jupyter Notebook
      Apache License 2.0
      285200Updated Jul 6, 2020Jul 6, 2020
    • ACL 2020 Tutorial by Malihe Alikhani and Matthew Stone
      3000Updated Jun 29, 2020Jun 29, 2020
    • A curated list of awesome papers, datasets and tutorials within Multimodal Knowledge Graph.
      TeX
      MIT License
      43100Updated Feb 5, 2020Feb 5, 2020
    • A curated list of awesome papers, datasets and tutorials within Multimodal Machine Learning.
      TeX
      4000Updated Feb 2, 2020Feb 2, 2020
    • [ACL'19] [PyTorch] Multimodal Transformer
      Python
      151000Updated Dec 12, 2019Dec 12, 2019
    • Code for the paper "VisualBERT: A Simple and Performant Baseline for Vision and Language"
      Python
      Other
      104000Updated Nov 2, 2019Nov 2, 2019
    • MTN

      Public
      Code for the paper Multimodal Transformer Networks for End-to-End Video-Grounded Dialogue Systems (ACL19)
      Python
      MIT License
      25000Updated Oct 19, 2019Oct 19, 2019
    • This repository contains code and metadata of How2 dataset
      Python
      17000Updated Oct 7, 2019Oct 7, 2019
    • lxmert

      Public
      PyTorch code for EMNLP 2019 paper "LXMERT: Learning Cross-Modality Encoder Representations from Transformers".
      Python
      MIT License
      158000Updated Sep 27, 2019Sep 27, 2019
    • pythia

      Public
      A modular framework for vision & language multimodal research from Facebook AI Research (FAIR)
      Python
      Other
      935000Updated Sep 23, 2019Sep 23, 2019
    • Jupyter Notebook
      95000Updated Sep 21, 2019Sep 21, 2019
    • Sequence-to-Sequence Framework in PyTorch
      Jupyter Notebook
      Other
      51000Updated Aug 12, 2019Aug 12, 2019
    • Attention-based multimodal fusion for sentiment analysis
      Python
      MIT License
      75000Updated Jan 12, 2019Jan 12, 2019
    • Contextual inter modal attention for multimodal sentiment analysis
      Python
      MIT License
      8000Updated Dec 13, 2018Dec 13, 2018
    • Python
      MIT License
      1100Updated Jul 20, 2018Jul 20, 2018