A paper reading list of Multimodal Conversational AI that I keep for my own research purposes. 😇 Will tidy up and re-organize along the way.
PS: I will appreciate paper suggestions if you have any! 😸
- Katharina Kann, et al. 2022. Open-domain Dialogue Generation: What We Can Do, Cannot Do, And Should Do Next. NLP4ConvAI, ACL.
- Somil Gupta, Bhanu Pratap Singh Rawat, Hong Yu. 2020. Conversational Machine Comprehension: a Literature Review. COLING.
- Anirudh Sundar, Larry Heck. 2022. Multimodal Conversational AI: A Survey of Datasets and Approaches. NLP4ConvAI, ACL.
- Aditya Mogadala, Marimuthu Kalimuthu, Dietrich Klakow. 2021. Trends in Integration of Vision and Language Research: A Survey of Tasks, Datasets, and Methods. JAIR.
- Jabeen Summaira, et al. 2021. Recent Advances and Trends in Multimodal Deep Learning: A Review. arXiv.
- Chao Zhang, et al. 2020. Multimodal Intelligence: Representation Learning, Information Fusion, and Applications. IEEE Journal of Selected Topics in Signal Processing.
- Yonatan Bisk, et al. 2020. Experience Grounds Language. EMNLP.
- Tadas Baltrušaitis, Chaitanya Ahuja, Louis-Philippe Morency. 2018. Multimodal Machine Learning: A Survey and Taxonomy. IEEE Transactions on Pattern Analysis and Machine Intelligence.
- Dhanesh Ramachandram, Graham W. Taylor. 2017. Deep Multimodal Learning: A Survey on Recent Advances and Trends. IEEE Signal Processing Magazine.
- Wenzhong Guo, Jianwen Wang, Shiping Wang. 2019. Deep Multimodal Representation Learning: A Survey. IEEE Access.
- Yiqun Yao, Rada Mihalcea. 2022. Modality-specific Learning Rates for Effective Multimodal Additive Late-fusion. Findings, ACL.
- Jing Gao, et al. 2020. A survey on deep learning for multimodal data fusion. Neural Computation, MIT.
- Bei Li, et al. 2022. On Vision Features in Multimodal Machine Translation. ACL.
- Umut Sulubacak, et al. 2020. Multimodal machine translation through visuals and speech. Machine Translation, Springer.
- Shaowei Yao, Xiaojun Wan. 2020. Multimodal Transformer for Multimodal Machine Translation. ACL.
Image
Text
- Ozan Caglayan, et al. 2019. Probing the Need for Visual Context in Multimodal Machine Translation. NAACL.
- Xichen Pan, et al. 2022. Leveraging Unimodal Self-Supervised Learning for Multimodal Audio-Visual Speech Recognition. ACL.
- Wenliang Dai, et al. 2022. Enabling Multimodal Generation on CLIP via Vision-Language Knowledge Distillation. Findings, ACL.
- Hui Su, et al. 2022. RoCBert: Robust Chinese Bert with Multimodal Contrastive Pretraining. ACL.
- Anil Rahate, et al. 2022. Multimodal Co-learning: Challenges, Applications with Datasets, Recent Advances and Future Directions. Information Fusion, Elsevier.
- Zhiyuan Ma, et al. 2022. UniTranSeR: A Unified Transformer Semantic Representation Framework for Multimodal Task-Oriented Dialog System. ACL.
- Qingfeng Sun, et al. 2022. Multimodal Dialogue Response Generation. ACL.
- Jiaxin Qi, et al. 2020. Two Causal Principles for Improving Visual Dialog. CVPR.
- Hardik Chauhan, et al. 2019. Ordinal and Attribute Aware Response Generation in a Multimodal Dialogue System. ACL.
- Lizi Liao, et al. 2018. Knowledge-aware Multimodal Dialogue Systems. MM, ACM.
- Shubham Agarwal, et al. 2018. A Knowledge-Grounded Multimodal Search-Based Conversational Agent. Workshop on Search-Oriented Conversational AI, EMNLP.
- Shubham Agarwal, et al. 2018. Improving Context Modelling in Multimodal Dialogue Generation. INLG.
- Xiaoxiao Guo, et al. 2018. Dialog-based Interactive Image Retrieval. NeurIPS. [GitHub]
- Abhishek Das, et al. 2017. Visual Dialog. CVPR. [GitHub]
- Tom Young, et al. 2020. Dialogue systems with audio context. Neurocomputing, Elsevier.
- Tatsuya Kawahara. 2019. Spoken Dialogue System for a Human-like Conversational Robot ERICA. IWSDS.
- Zekang Li, et al. 2020. Bridging Text and Video: A Universal Multimodal Transformer for Audio-Visual Scene-Aware Dialog. Dialog System Technology Challenge, AAAI. [GitHub]
- Xiangyang Mou, et al. 2020. Multimodal Dialogue State Tracking By QA Approach with Data Augmentation. Dialog System Technology Challenge, AAAI.
- Yun-Wei Chu, et al. 2020. Multi-step Joint-Modality Attention Network for Scene-Aware Dialogue System. Dialog System Technology Challenge, AAAI.
- Huang Le, et al. 2019. Multimodal Transformer Networks for End-to-End Video-Grounded Dialogue Systems. ACL. [GitHub]
- Qingxiu Dong, et al. 2022. Premise-based Multimodal Reasoning: Conditional Inference on Joint Textual and Visual Clues. ACL.
- Weifeng Zhang, et al. 2021. DMRFNet: Deep Multimodal Reasoning and Fusion for Visual Question Answering and explanation generation. Information Fusion, Elsevier.
- Remi Cadene, et al. 2019. MUREL: Multimodal Relational Reasoning for Visual Question Answering. CVPR.
- Sruthy Manmadhan, Binsu C. Kovoor. 2020. Visual question answering: a state-of-the-art review. Artificial Intelligence Review, Springer.
- Remi Cadene, et al. 2019. RUBi: Reducing Unimodal Biases for Visual Question Answering. NeurIPS.
- Remi Cadene, et al. 2019. Murel: Multimodal relational reasoning for visual question answering.
- Yan Ling, Jianfei Yu, Rui Xia. 2022. Vision-Language Pre-Training for Multimodal Aspect-Based Sentiment Analysis. ACL.
- Yang Wu, et al. 2022. Sentiment Word Aware Multimodal Refinement for Multimodal Sentiment Analysis with ASR Errors. Findings, ACL.
- Jiquan Wang, et al. 2022. Multimodal Sarcasm Target Identification in Tweets. ACL.
- Huisheng Mao, et al. 2022. M-SENA: An Integrated Platform for Multimodal Sentiment Analysis. System Demonstrations, ACL.
- Wenliang Dai, et al. 2021. Weakly-supervised Multi-task Learning for Multimodal Affect Recognition. arXiv.
- Trisha Mittal, et al. 2020. M3ER: Multiplicative Multimodal Emotion Recognition using Facial, Textual, and Speech Cues. AAAI.
- Paul Pu Liang, et al. 2021. MultiBench: Multiscale Benchmarks for Multimodal Representation Learning. NeuRIPS.
- Jan Deriu, et al. 2020. Survey on evaluation methods for dialogue systems. Artificial Intelligence Review, Springer.
- Masahiro Araki, et al. 2018. Collection of Multimodal Dialog Data and Analysis of the Result of Annotation of Users’ Interest Level. LREC.
Manual
- Mauajama Firdaus, et al. 2022. EmoSen: Generating Sentiment and Emotion Controlled Responses in a Multimodal Dialogue System. IEEE Transactions on Affective Computing.
- Yunlong Liang, et al. 2022. MSCTD: A Multimodal Sentiment Chat Translation Dataset. ACL.
- Yirong Chen, et al. 2022. CPED: A Large-Scale Chinese Personalized and Emotional Dialogue Dataset for Conversational AI. arXiv.
Chinese
- Zhengcong Fei, et al. 2021. Towards Expressive Communication with Internet Memes: A New Multimodal Conversation Dataset and Benchmark. Dialog System Technology Challenge, AAAI. [GitHub]
Chinese
- Deeksha Varshney, Asif Ekbal Anushkha Singh. 2021. Knowledge Grounded Multimodal Dialog Generation in Task-oriented Settings. PACLIC.
- Satwik Kottur, et al. 2021. SIMMC 2.0: A Task-oriented Dialog Dataset for Immersive Multimodal Conversations. EMNLP.
- Kübra Bodur, et al. 2021. ChiCo: A Multimodal Corpus for the Study of Child Conversation. ICMI.
- Mauajama Firdaus, et al. 2020. MEISD: A Multimodal Multi-Label Emotion, Intensity and Sentiment Dialogue Dataset for Emotion Recognition and Sentiment Analysis in Conversations. COLING.
- Seungwhan Moon, et al. 2020. Situated and Interactive Multimodal Conversations. COLING.
- Darryl Hannan, Akshay Jain, Mohit Bansal. 2020. ManyModalQA: Modality Disambiguation and QA over Diverse Inputs. AAAI.
- Santiago Castro, 2019. Towards Multimodal Sarcasm Detection (An Obviously Perfect Paper). ACL.
- Satwik Kottur, et al. 2019. CLEVR-Dialog: A Diagnostic Dataset for Multi-Round Reasoning in Visual Dialog. NAACL. [GitHub]
- Soujanya Poria, et al. 2019. MELD: A Multimodal Multi-Party Dataset for Emotion Recognition in Conversations. ACL. [Homepage]
- Asma Ben Abacha, et al. 2019. VQA-Med: Overview of the Medical Visual Question Answering Task at ImageCLEF 2019. CLEF.
- Amrita Saha, Mitesh Khapra, Karthik Sankaranarayanan. 2018. Towards Building Large Scale Multimodal Domain-Aware Conversation Systems. AAAI. [Homepage]
- Harm de Vries, et al. 2018. Talk the Walk: Navigating New York City through Grounded Dialogue. arXiv. [GitHub]
- Jialu Wang, Yang Liu, Xin Wang. 2022. Assessing Multilingual Fairness in Pre-trained Multimodal Representations. Findings, ACL.
- Victor Milewski, Miryam de Lhoneux, Marie-Francine Moens. 2022. Finding Structural Knowledge in Multimodal-BERT. ACL. [GitHub]
- Delphine Potdevin, Céline Clavel, Nicolas Sabouret. 2020. Virtual intimacy in human-embodied conversational agent interactions: the influence of multimodality on its perception. Journal on Multimodal User Interfaces, Springer.
- Stefan Schaffer, Norbert Reithinger. 2019. Conversation is Multimodal: Thus Conversational User Interfaces should be as well. Conversational User Interfaces (CUI), ACM.
- Liu Yang, Catherine Achard, and Catherine Pelachaud. 2022. Multimodal Analysis of Interruptions. International Conference on Human-Computer Interaction, Springer.
- Stephen C. Levinson, Judith Holler. 2014. The origin of human multi-modal communication. Philosophical Transactions of the Royal Society B.
- Mireille Fares. 2020. Towards Multimodal Human-Like Characteristics and Expressive Visual Prosody in Virtual Agents. Doctoral Consortium Paper, ICMI.
- Jianfeng Gao, Michel Galley, Lihong Li. 2018. Neural Approaches to Conversational AI. Tutorial, SIGIR, ACM.
- Louis-Philippe Morency, Tadas Baltrušaitis. 2017. Multimodal Machine Learning: Integrating Language, Vision and Speech. Tutorial Abstracts, ACL.
- Louis-Philippe Morency, Tadas Baltrusaitis. 2017. Multimodal Machine Learning. Tutorial, ACL.
- Margaret Mitchell, John C. Platt, Kate Saenko. 2017. Guest Editorial: Image and Language Understanding. International Journal of Computer Vision, Springer.
- Desmond Elliott, Douwe Kiela and Angeliki Lazaridou. 2016. Multimodal Learning and Reasoning. Tutorial, ACL.