This repository is maintained by Tongtong Wu and Jingqi Kang.
The automation script of this repo is powered by Auto-Bibfile.
You can directly use our bibtex.bib in overleaf with this link.
This page categorizes the literature by the Published Venue.
- [Overview] -- Homepage
- [NLP] [CV] -- Summary
- [NLP] [CV] -- Application
- [NLP] [CV] -- Approach
- [NLP] [CV] -- Author
- [NLP] [CV] -- Backbone Model
- [NLP] [CV] -- Contribution
- [NLP] [CV] -- Dataset
- [NLP] [CV] -- Metrics
- [NLP] [CV] -- Research Questions
- [NLP] [CV] -- Setting
- [NLP] [CV] -- Learning Paradigm
- [NLP] [CV] -- Published Time
- [NLP] [CV] -- Published Venue
- Cross-media Structured Common Space for Multimedia Event Extraction ,
by Li, Manling and Zareian, Alireza and Zeng, Qi and Whitehead, Spencer and Lu, Di and Ji, Heng and Chang, Shih-Fu [bib]
The first paper to define a multimodal event extraction task
- Joint Multimedia Event Extraction from Video and Article ,
by Chen, Brian and Lin, Xudong and Thomas, Christopher and Li, Manling and Yoshida, Shoya and Chum, Lovish and Ji, Heng and Chang, Shih-Fu [bib]
This paper proposes a new task of video multimodal event extraction
- Image Enhanced Event Detection in News Articles ,
by Tong, Meihan, Wang, Shuai, Cao, Yixin, Xu, Bin, Li, Juanzi, Hou, Lei and Chua, Tat-Seng [bib]
This paper proposes a multimodal fusion method with alternating dual attention
- Multimodal Relation Extraction with Efficient Graph Alignment ,
by Zheng, Changmeng, Feng, Junhao, Fu, Ze, Cai, Yi, Li, Qing and Wang, Tao [bib]
This paper proposes a Multimodal Neural Network with Efficient Graph Alignment (MEGA) method for relation extraction in social media posts
- Improving Event Extraction via Multimodal Integration ,
by Zhang, Tongtao, Whitehead, Spencer, Zhang, Hanwang, Li, Hongzhi, Ellis, Joseph, Huang, Lifu, Liu, Wei, Ji, Heng and Chang, Shih-Fu [bib]
The first paper to do multimodal event extraction
- A Survey of Data Representation for Multi-Modality Event Detection and Evolution ,
by Xiao, Kejing, Qian, Zhaopeng and Qin, Biao [bib]
- Good Visual Guidance Makes A Better Extractor: Hierarchical Visual Prefix for Multimodal Entity and Relation Extraction ,
by Chen, Xiang, Zhang, Ningyu, Li, Lei, Yao, Yunzhi, Deng, Shumin, Tan, Chuanqi, Huang, Fei, Si, Luo and Chen, Huajun [bib]
This paper proposes visual prefix-guided fusion by concatenating object-level visual representation as the prefix of each self-attention layer in BERT
- CLIP-Event: Connecting Text and Images with Event Structures ,
by Manling Li and Ruochen Xu and Shuohang Wang and Luowei Zhou and Xudong Lin and Chenguang Zhu and Michael Zeng and Heng Ji and Shih{-}Fu Chang [bib]
This paper is inspired by CLIP using the Contrast Learning Framework, a method for connecting text and images using event structures
- Learning Transferable Visual Models From Natural Language Supervision ,
by Alec Radford and Jong Wook Kim and Chris Hallacy and Aditya Ramesh and Gabriel Goh and Sandhini Agarwal and Girish Sastry and Amanda Askell and Pamela Mishkin and Jack Clark and Gretchen Krueger and Ilya Sutskever [bib]
This paper proposes to learn a multimodal embedding space by jointly training an image encoder and a text encoder
- Prompting Visual-Language Models for Efficient Video Understanding ,
by Chen Ju and Tengda Han and Kunhao Zheng and Ya Zhang and Weidi Xie [bib]
This paper upgrades CLIP image encoder to video encoder
- CLIP-Adapter: Better Vision-Language Models with Feature Adapters ,
by Peng Gao and Shijie Geng and Renrui Zhang and Teli Ma and Rongyao Fang and Yongfeng Zhang and Hongsheng Li and Yu Qiao [bib]
This paper is inspired by Adapter in order to transfer the knowledge of CLIP in order to implement Few-shot classification
- Learning to Prompt for Vision-Language Models ,
by Kaiyang Zhou and Jingkang Yang and Chen Change Loy and Ziwei Liu [bib]
This paper proposes to learn soft prompts represented by continuous context vectors, replacing the hand-designed Prompt in CLIP
- DenseCLIP: Language-Guided Dense Prediction with Context-Aware Prompting ,
by Yongming Rao and Wenliang Zhao and Guangyi Chen and Yansong Tang and Zheng Zhu and Guan Huang and Jie Zhou and Jiwen Lu [bib]
This paper proposes an instance-level prompt, where each data corresponds to a different prompt