Contrasting Deepfakes Diffusion via Contrastive Learning and Global-Local Similarities
<div class="is-size-5 publication-authors">
ECCV 2024

<span class="author-block">
Lorenzo Baraldi*¹'²,
<span class="author-block">
Federico Cocchi*¹'²,
<span class="author-block">
Marcella Cornia¹,
<span class="author-block">
Lorenzo Baraldi¹,
<span class="author-block">
Alessandro Nicolosi³,
<span class="author-block">
Rita Cucchiara¹,

<div class="is-size-5 publication-authors">
¹University of Modena and Reggio Emilia,
²University of Pisa,
³Leonardo S.p.A.
<div class="is-size-5 publication-authors">
* Equal contribution

<h2 class="subtitle has-text-centered">
Each image is generated starting from a LAION-400M caption, thus referring to a realistic textual description.
A dataset for Large-scale Deepfake Detection
<div class="content has-text-justified">
Existing deepfake detection datasets are limited in their diversity of generators and quantity of images. Therefore, we create and release a new dataset that can support learning deepfake detection methods from scratch.
Our <b>D</b>iffusion-generated <b>D</b>eepfake <b>D</b>etection dataset (D<sup>3</sup>) contains nearly 2.3M records and 11.5M images.
Each record in the dataset consists of a prompt, a real image, and four images generated with as many generators.
Prompts and corresponding real images are taken from <a href=""> LAION-400M </a>, while fake images are generated, starting from the same prompt, using different text-to-image generators.

We employ four state-of-the-art opensource diffusion models, namely Stable Diffusion 1.4 (SD-1.4), Stable Diffusion 2.1 (SD-2.1), Stable Diffusion XL (SD-XL), and DeepFloyd IF (DF-IF).
While the first three generators are variants of the Stable Diffusion approach, DeepFloyd IF is strongly inspired by Imagen and thus represents a different generation technique.
With the aim of increasing the variance of the dataset, images have been generated with different aspect ratios, 256<sup>2</sup>, 512<sup>2</sup>, 640x480, and 640x360.
Moreover, to mimic the distribution of real images, we also employ a variety of encoding and compression methods (BMP, GIF, JPEG, TIFF, PNG).
In particular, we closely follow the distribution of encoding methods of LAION itself, therefore favoring the presence of JPEG-encoded images.
<!--<div class="columns is-centered">
<div class="column is-full-width">
<h2 class="title is-3">Related Links</h2>
<div class="content has-text-justified">
There's a lot of excellent work that was introduced around the same time as ours.
<a href="">Progressive Encoding for Neural Optimization</a> introduces an idea similar to our windowed position encoding for coarse-to-fine optimization.
<a href="">D-NeRF</a> and <a href="">NR-NeRF</a>
both use deformation fields to model non-rigid scenes.
Some works model videos with a NeRF by directly modulating the density, such as <a href="">Video-NeRF</a>, <a href="">NSFF</a>, and <a href="">DyNeRF</a>
There are probably many more by the time you are reading this. Check out <a href="">Frank Dellart's survey on recent NeRF papers</a>, and <a href="">Yen-Chen Lin's curated list of NeRF papers</a>.
Secure and Safe AI
<div class="content has-text-justified">
This work has been done under the Multimedia use case of the European network <a href="">ELSA - European Lighthouse on Secure and Safe AI</a>.
The objective of the Multimedia use case is to develop effective solutions for detecting and mitigating the spread of deep fake images in multimedia content.
Machine-generated images are becoming more and more popular in the digital world, thanks to the spread of Deep Learning models that can generate visual data like Generative Adversarial Networks, and Diffusion Models. While image generation tools can be employed for lawful goals (e.g., to assist content creators, generate simulated datasets, or enable multi-modal interactive applications), there is a growing concern that they might also be used for illegal and malicious purposes, such as the forgery of natural images, the generation of images in support of fake news, misogyny or revenge porn. While the results obtained in the past few years contained artefacts which made generated images easily recognizable, today's results are way less recognizable from a pure perceptual point of view. In this context, assessing the authenticity of fake images becomes a fundamental goal for security and for guaranteeing a degree of trustworthiness of AI algorithms. There is a growing need, therefore, to develop automated methods which can assess the authenticity of images (and, in general, multimodal content), and which can follow the constant evolution of generative models, which become more realistic over time.
The Challenge on Deepfake Detection
<div class="content has-text-justified">
Join our thrilling <a href=""
class="external-link">competition</a> on deepfake detection and put your skills to the test. As the rise of deepfake technology poses unprecedented challenges, we invite individuals and teams from all backgrounds to showcase their expertise in identifying and debunking manipulated media.
