Metaphors are commonly found in advertising and internet memes. However, the free form of internet memes often leads to a lack of high-quality textual data. Metaphor detection demands a deep interpretation of both textual and visual elements, requiring extensive common-sense knowledge, which poses a challenge to language models. To address these challenges, we propose a compact framework called C4MMD, which utilizes a Chain-of-Thought(CoT) method for Multi-modal Metaphor Detection. Specifically, our approach designs a three-step process inspired by CoT that extracts and integrates knowledge from Multi-modal Large Language Models(MLLMs) into smaller ones. We also developed a modality fusion architecture to transform knowledge from large models into metaphor features, supplemented by auxiliary tasks to improve model performance. Experimental results on the MET-MEME dataset demonstrate that our method not only effectively enhances the metaphor detection capabilities of small models but also outperforms existing models. To our knowledge, this is the first systematic study leveraging MLLMs in metaphor detection tasks. The main process of our method is shown in the following figure.
You may need to download the following content in advance to use our code:
- MET-Meme Dataset
- InternLM-XComposer model and it's inference demo(as our demo may out of date, we recommend you to use the latest one.).
- Pre-trained Vision and Language models used in mordality fusion structure.
After that, you can use our code by:
https://github.com/xyz189411yt/C4MMD.git
cd C4MMD-main
pip install -r requirements.txt
Note: Torch>=1.13.0, Transformers>= 4.24.0
You need to follow the three steps to run the model.
- Step 1: Data pre-processing -> data_divide.py
- Step 2: CoT module for MLLM -> CoT_module.py
- Step 3: Main training process -> C4MMD_train.py
The original dataset did not be splited into training, testing, and validation sets, so you can use data_divide.py to splite the dataset. We also merged the image-text correspondence files and image-label correspondence files at the same time.
You can only change the data path of this file and run it with the following line.
python data_divide.py
It should be noted that we have given a example in this file to showcase our data processing method. You can also use a data format as you wish, but just ensure that the final data format should be same with data to execute the final training process.
CoT_module.py contains our main contribution. You need to first modify the paths around line 142 and run it:
python CoT_module.py
If you successfully execute the CoT_module.py, you will find that a new data file (new_xxx.json) has been created in the data folder. In this file, each example has three additional attributes corresponding to the three modal features generated by MLLM:
- internlm_img_info: additional image information
- Internlm_text_info: additional text information
- Internlm_mix_info: Additional information after mixing two modalities.
If you want to try other MLLMs, just take it to fit the CoT module section (after line 137).
If you have processed the data into the data format, you only need to modify each file path and pre-trained models in C4MMD_train.py, and then train your model with following command.
python C4MMD_train.py
This project is contributed by Yanzhi Xu*, Yueying Hua*, Shichen Li and Zhongqing Wang from Natural Language Processing Lab, Soochow University.
@inproceedings{xu2024C4MMD,
title = "Exploring Chain-of-Thought for Multi-modal Metaphor Detection",
author = "Xu, Yanzhi and
Hua, Yueying and
Li, Shichen and
Wang, Zhongqing",
booktitle = "Proceedings of ACL",
year = "2024",
}