Skip to content

This repository contains code for training and evaluating deepfake detection models using the OpenForensics dataset.

Notifications You must be signed in to change notification settings

wakaflocka17/DLA_DEEPFAKEDETECTION

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ•΅πŸ»β€β™‚οΈ DLA DEEPFAKE DETECTION 2024/25 - UNICA

Deepfake examples

Deepfake Detection Project using the OpenForensics dataset


πŸ“‘ Summary

  1. πŸ§‘πŸ»β€πŸŽ“ Students
  2. πŸ“Œ Description
  3. πŸ“₯ Download the Dataset
  4. πŸ“„ Documentation
  5. πŸš€ Installation
  6. πŸ› οΈ Test the DataLoader
  7. 🎯 Train the Model
  8. πŸ“Š Evaluate the Model
  9. πŸ“‚ Project Structure
  10. πŸ“Š Project Goals
  11. πŸ–₯️ Hardware and Limitations
  12. 🀝 Contributions
  13. ❓ How to Cite

πŸ§‘πŸ»β€πŸŽ“ Students

Francesco Congiu

Student ID: 60/73/65300

E-Mail: [email protected]

Simone Giuffrida

Student ID: 60/73/65301

E-Mail: [email protected]

Fabio Littera

Student ID: 60/73/65310

E-Mail: [email protected]


πŸ“Œ Description

This repository contains the code for training and evaluating deepfake detection models using the OpenForensics dataset. The project follows two approaches:

  1. Transfer Learning with pre-trained models (e.g., MobileNet, Xception).
  2. Training from Scratch with a custom neural network.

πŸ“₯ Download the Dataset

The OpenForensics dataset required for the project can be downloaded from the following link:
πŸ”— OpenForensics Dataset - Zenodo


πŸ“„ Documentation

Below are links to the full project documentation:

πŸ“‚ Cartella con tutta la documentazione


πŸš€ Installation

To run the project locally, follow these steps:

1️⃣ Clone the Repository

Open the terminal and run:

git clone [email protected]:wakaflocka17/DLA_DEEPFAKEDETECTION.git
cd DLA_DEEPFAKEDETECTION

(Or, if using HTTPS)

git clone https://github.com/wakaflocka17/DLA_DEEPFAKEDETECTION.git
cd DLA_DEEPFAKEDETECTION

2️⃣ Create and Activate a Virtual Environment

It is recommended to create a virtual environment to isolate dependencies:

python3 -m venv openforensics_env
source openforensics_env/bin/activate  # macOS/Linux

(On Windows, use: openforensics_env\Scripts\activate)

3️⃣ Install Dependencies

Install all necessary libraries:

pip install -r requirements.txt

4️⃣ Set Up the Project Structure

First, however, we make the script executable with the command:

chmod +x setup_folders.sh

Run the following script to create the required folders:

setup_folders.sh

This will create:

DLA_DEEPFAKEDETECTION/
│── data/
β”‚   β”œβ”€β”€ Train/
β”‚   β”œβ”€β”€ Val/
β”‚   β”œβ”€β”€ Test-Dev/
β”‚   β”œβ”€β”€ Test-Challenge/
β”‚   β”œβ”€β”€ dataset/
β”‚
│── processed_data/
β”‚   β”œβ”€β”€ Train/
β”‚   β”‚   β”œβ”€β”€ real/
β”‚   β”‚   β”œβ”€β”€ fake/
β”‚   β”œβ”€β”€ Val/
β”‚   β”‚   β”œβ”€β”€ real/
β”‚   β”‚   β”œβ”€β”€ fake/
β”‚   β”œβ”€β”€ Test-Dev/
β”‚   β”‚   β”œβ”€β”€ real/
β”‚   β”‚   β”œβ”€β”€ fake/
β”‚   β”œβ”€β”€ Test-Challenge/
β”‚   β”‚   β”œβ”€β”€ real/

5️⃣ Download the Dataset

To automatically download the OpenForensics dataset, use the provided script:

python3 scripts/download_dataset.py

πŸ’‘ Ensure you have a stable internet connection, as the dataset is large (60GB+).

6️⃣ Move Images and JSON Files to Their Correct Directories

Now that all files have been extracted, we need to organize them into the correct dataset folders (Train, Val, Test-Dev, Test-Challenge). Run:

python3 scripts/extract_dataset.py

πŸ’‘ This will:

  • Move training images to data/Train/Train/ and the corresponding Train_poly.json to data/Train/.
  • Move validation images to data/Val/Val/ and Val_poly.json to data/Val/.
  • Move test-dev images to data/Test-Dev/Test-Dev/ and Test-Dev_poly.json to data/Test-Dev/.
  • Move test-challenge images to data/Test-Challenge/Test-Challenge/ and Test-Challenge_poly.json to data/Test-Challenge/.

7️⃣ Delete Unnecessary ZIP Files

After extraction and organization, the original .zip files are no longer needed. Delete them using:

python3 scripts/delete_all_zips.py

πŸ’‘ This will clean up the dataset directory, saving storage space.

8️⃣ Verify Installation

To check if everything works correctly, run:

python3 -c "import torch; print(torch.__version__)"
python3 -c "import cv2; print(cv2.__version__)"

If no errors appear, the setup is complete! 🎯


πŸ› οΈ Test the DataLoader

Before training, verify that the dataset is correctly loaded:

python3 scripts/dataloader.py --dataset Train --batch_size 32

πŸ’‘ This should display a batch of images and labels.

🎯 Train the Model

Train the model using either MobileNet or Xception:

βœ… Train with MobileNet:

python3 scripts/train.py --model mobilenet

βœ… Train with Xception:

python3 scripts/train.py --model xception

βœ… Train with Custom network:

python3 scripts/train.py --model custom

πŸ’‘ The trained model will be saved in the models/ directory.

πŸ“Š Evaluate the Model

After training, evaluate the model on Test-Dev and Test-Challenge:

βœ… Evaluate MobileNet on Test-Dev:

python3 scripts/evaluate.py --model mobilenet --dataset Test-Dev

βœ… Evaluate MobileNet on Test-Challenge:

python3 scripts/evaluate.py --model mobilenet --dataset Test-Challenge

βœ… Evaluate Xception on Test-Dev:

python3 scripts/evaluate.py --model xception --dataset Test-Dev

βœ… Evaluate Xception on Test-Challenge:

python3 scripts/evaluate.py --model xception --dataset Test-Challenge

βœ… Evaluate Custom network on Test-Dev:

python3 scripts/evaluate.py --model custom --dataset Test-Dev

βœ… Evaluate Custom network on Test-Challenge:

python3 scripts/evaluate.py --model custom --dataset Test-Challenge

πŸ’‘ The script will print Accuracy, Precision, Recall, and F1-score.


πŸ“‚ Project Structure

DLA_DEEPFAKEDETECTION/
│── .github/            # DependenciesBot
│── data/               # Dataset OpenForensics (originale, non modificato)
β”‚   β”œβ”€β”€ Train/          # Training Data
β”‚   β”œβ”€β”€ Val/            # Evaluation Data
β”‚   β”œβ”€β”€ Test-Dev/       # Test-Dev Data
β”‚   β”œβ”€β”€ Test-Challenge/ # Test-Challenge Data
β”‚   β”œβ”€β”€ dataset/        # How to save the original dataset
β”‚
│── processed_data/     # Preprocessing output (cropped faces)
β”‚   β”œβ”€β”€ Train/
β”‚   β”‚   β”œβ”€β”€ real/       # Real faces extracted from the training set
β”‚   β”‚   β”œβ”€β”€ fake/       # Fake faces extracted from the training set
β”‚   β”œβ”€β”€ Val/
β”‚   β”‚   β”œβ”€β”€ real/       # Real faces extracted for evaluation
β”‚   β”‚   β”œβ”€β”€ fake/       # Fake faces extracted for evaluation
β”‚   β”œβ”€β”€ Test-Dev/
β”‚   β”‚   β”œβ”€β”€ real/       # Real faces extracted for Test-Dev
β”‚   β”‚   β”œβ”€β”€ fake/       # Fake faces extracted for Test-Dev
β”‚   β”œβ”€β”€ Test-Challenge/
β”‚   β”‚   β”œβ”€β”€ real/       # Real faces extracted for Test-Challenge
β”‚   β”‚   β”œβ”€β”€ fake/       # Fake faces extracted for Test-Challenge
β”‚
│── documentation/      # Documentation, reports, extra material
│── logs/               # Folder to track the accuracy of the assessment and the loss you have during training
│── models/             # Saved models (es. file .pth)
│── scripts/            # Scripts (training, preprocessing, ecc.)
│── notebooks/          # Jupyter Notebook for debugging and testing
│── utils/              # Generic utilities and support functions
│── requirements.txt    # Project dependencies
│── setup_folders.sh    # Script for automatic creation of folders
│── README.md           # Project documentation

πŸ“Š Project Goals

βœ… Face extraction from images using bounding boxes.
βœ… Binary classification (fake/real) of extracted faces.
βœ… Training with transfer learning using MobileNet or Xception.
βœ… Development of a custom CNN for classification.

πŸ–₯️ Hardware and Limitations

Note

The experiments were performed on a MacBook Pro (2024) with the following specifications:

  • Operating system: macOS Sonoma;
  • Processor: Apple M4 Pro;
  • GPU: Apple integrated GPU (M4 Pro);
  • RAM: 32 GB (unified memory);

Warning

Due to the size and computational complexity of the dataset, it is possible that some experiments may be slower or difficult to execute on systems with fewer resources or less performing hardware.


🀝 Contributions

Feel free to contribute to the project! πŸ’‘

πŸ“Œ How to Contribute

  1. Fork the repository.
  2. Create a new branch:
    git checkout -b new-feature
  3. Commit your changes:
      git commit -m "Add new feature"
  4. Push your changes:
      git push origin new-feature
  5. Open a Pull Request on GitHub.

❓ How to Cite

If you use this repository (or part of its code) for your research, a scholarly publication, or a project, please kindly cite us. You can use the following BibTeX entry:

@misc{Deepfake-Project,
  author       = {Congiu F., Giuffrida S., Littera F.},
  title        = {Deepfake Detection Project using the OpenForensics dataset},
  howpublished = {\url{https://github.com/wakaflocka17/DLA_DEEPFAKEDETECTION}},
  year         = {2025}
}

Or, if you prefer not to use BibTeX, feel free to mention the authors and the link to the repository in the acknowledgments or bibliography of your paper.

About

This repository contains code for training and evaluating deepfake detection models using the OpenForensics dataset.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •