| Website
This work introduced a cross-modal learning method to train visual models for Sentiment Analysis in the Twitter domain.
It was used to fine-tune the Vision-Transformer (ViT) model pre-trained on Imagenet-21k, which was able to achieve incredible results on external benchmarks which were manually annotated, even beating the current State Of The Art!
We crawled ∼3.7 pictures from social media from 1 April to 30 June to use them in our cross-modal approach. In particular, a cross-modal teacher-student learning technique was used to avoid human annotators, thus minimizing the efforts required and allowing for the creation of vast training sets.
The latter can help future research to train robust visual models, as the number of parameters of current SOTA is exponentially growing along with their need of data to avoid overfitting problems.
$ git config --global http.postBuffer 1048576000
$ git clone --recursive https://github.com/fabiocarrara/cross-modal-visual-sentiment-analysis/tree/master
$ chmod +x install_dependencies.sh
$ ./install_dependencies.sh
$ python3 scripts/test_benchmark.py -m <model_name> -b <benchmark_name>
$ options for <model_name>: [boosted_model, ViT_L16, ViT_L32, ViT_B16, ViT_B32, merged_T4SA, bal_flat_T4SA2.0, bal_T4SA2.0, unb_T4SA2.0, B-T4SA_1.0_upd_filt, B-T4SA_1.0_upd, B-T4SA_1.0]
$ options for <benchmark_name>: [5agree, 4agree, 3agree, FI_complete, emotion_ROI_test, twitter_testing_2]
Execute a five fold cross validation on a benchmark, get the mean accuracy, the standard deviation and save the predictions (by default use the boosted_model)
$ python3 scripts/5_fold_cross.py -b <benchmark_name>
Fine tune FI on the five split, get the mean accuracy, the standard deviation and save the predictions (by default use the boosted_model)
$ python3 scripts/fine_tune_FI.py
Confidence Filter Threshold | Accuracy on Twitter Dataset (TD) | |||||||
---|---|---|---|---|---|---|---|---|
Label | Dataset | Pos | Neu | Neg | Student Arch | 5 agree |
|
|
Model 3.1 | A | - | - | - | B/32 | 82.2 | 78.0 | 75.5 |
Model 3.2 | A | 0.70 | 0.70 | 0.70 | B/32 | 84.7 | 79.7 | 76.6 |
Model 3.3 | B | 0.70 | 0.70 | 0.70 | B/32 | 82.3 | 78.7 | 75.3 |
Model 3.4 | B | 0.90 | 0.90 | 0.70 | B/32 | 84.4 | 80.3 | 77.1 |
Model 3.5 | A+B | 0.90 | 0.90 | 0.70 | B/32 | 86.5 | 82.6 | 78.9 |
Model 3.6 | A+B | 0.90 | 0.90 | 0.70 | L/32 | 85.0 | 82.4 | 79.4 |
Model 3.7 | A+B | 0.90 | 0.90 | 0.70 | B/16 | 87.0 | 83.1 | 79.4 |
Model 3.8 | A+B | 0.90 | 0.90 | 0.70 | L/16 | 87.8 | 84.8 | 81.9 |
COMING SOON
@inproceedings{serra2023emotions,
author = {Serra, Alessio and Carrara, Fabio and Tesconi, Maurizio and Falchi, Fabrizio},
editor = {Kobi Gal and Ann Now{\'{e}} and Grzegorz J. Nalepa and Roy Fairstein and Roxana Radulescu},
title = {The Emotions of the Crowd: Learning Image Sentiment from Tweets via Cross-Modal Distillation},
booktitle = {{ECAI} 2023 - 26th European Conference on Artificial Intelligence, September 30 - October 4, 2023, Krak{\'{o}}w, Poland - Including 12th Conference on Prestigious Applications of Intelligent Systems ({PAIS} 2023)},
series = {Frontiers in Artificial Intelligence and Applications},
volume = {372},
pages = {2089--2096},
publisher = {{IOS} Press},
year = {2023},
url = {https://doi.org/10.3233/FAIA230503},
doi = {10.3233/FAIA230503},
}