Student ID: 60/73/65300
E-Mail: [email protected]
Student ID: 60/73/65301
E-Mail: [email protected]
Student ID: 60/73/65310
E-Mail: [email protected]
This repository contains the code for training and evaluating deepfake detection models using the OpenForensics dataset. The project follows two approaches:
- Transfer Learning with pre-trained models (e.g., MobileNet, Xception).
- Training from Scratch with a custom neural network.
The OpenForensics dataset required for the project can be downloaded from the following link:
π OpenForensics Dataset - Zenodo
Below are links to the full project documentation:
- π Theoretical Background
- π Feature Extraction
- π Metadata Analysis
- π― Fine Tuning of MobileNet and xCeption
- π Building a Network from scratch
- π Analysis and Experimental results
π Cartella con tutta la documentazione
To run the project locally, follow these steps:
Open the terminal and run:
git clone [email protected]:wakaflocka17/DLA_DEEPFAKEDETECTION.git
cd DLA_DEEPFAKEDETECTION(Or, if using HTTPS)
git clone https://github.com/wakaflocka17/DLA_DEEPFAKEDETECTION.git
cd DLA_DEEPFAKEDETECTIONIt is recommended to create a virtual environment to isolate dependencies:
python3 -m venv openforensics_env
source openforensics_env/bin/activate # macOS/Linux(On Windows, use: openforensics_env\Scripts\activate)
Install all necessary libraries:
pip install -r requirements.txtFirst, however, we make the script executable with the command:
chmod +x setup_folders.shRun the following script to create the required folders:
setup_folders.shThis will create:
DLA_DEEPFAKEDETECTION/
βββ data/
β βββ Train/
β βββ Val/
β βββ Test-Dev/
β βββ Test-Challenge/
β βββ dataset/
β
βββ processed_data/
β βββ Train/
β β βββ real/
β β βββ fake/
β βββ Val/
β β βββ real/
β β βββ fake/
β βββ Test-Dev/
β β βββ real/
β β βββ fake/
β βββ Test-Challenge/
β β βββ real/
To automatically download the OpenForensics dataset, use the provided script:
python3 scripts/download_dataset.pyπ‘ Ensure you have a stable internet connection, as the dataset is large (60GB+).
Now that all files have been extracted, we need to organize them into the correct dataset folders (Train, Val, Test-Dev, Test-Challenge). Run:
python3 scripts/extract_dataset.pyπ‘ This will:
- Move training images to
data/Train/Train/and the correspondingTrain_poly.jsontodata/Train/. - Move validation images to
data/Val/Val/andVal_poly.jsontodata/Val/. - Move test-dev images to
data/Test-Dev/Test-Dev/andTest-Dev_poly.jsontodata/Test-Dev/. - Move test-challenge images to
data/Test-Challenge/Test-Challenge/andTest-Challenge_poly.jsontodata/Test-Challenge/.
After extraction and organization, the original .zip files are no longer needed. Delete them using:
python3 scripts/delete_all_zips.pyπ‘ This will clean up the dataset directory, saving storage space.
To check if everything works correctly, run:
python3 -c "import torch; print(torch.__version__)"
python3 -c "import cv2; print(cv2.__version__)"If no errors appear, the setup is complete! π―
Before training, verify that the dataset is correctly loaded:
python3 scripts/dataloader.py --dataset Train --batch_size 32π‘ This should display a batch of images and labels.
Train the model using either MobileNet or Xception:
β
Train with MobileNet:
python3 scripts/train.py --model mobilenetβ
Train with Xception:
python3 scripts/train.py --model xceptionβ
Train with Custom network:
python3 scripts/train.py --model customπ‘ The trained model will be saved in the models/ directory.
After training, evaluate the model on Test-Dev and Test-Challenge:
β
Evaluate MobileNet on Test-Dev:
python3 scripts/evaluate.py --model mobilenet --dataset Test-Devβ
Evaluate MobileNet on Test-Challenge:
python3 scripts/evaluate.py --model mobilenet --dataset Test-Challengeβ
Evaluate Xception on Test-Dev:
python3 scripts/evaluate.py --model xception --dataset Test-Devβ
Evaluate Xception on Test-Challenge:
python3 scripts/evaluate.py --model xception --dataset Test-Challengeβ
Evaluate Custom network on Test-Dev:
python3 scripts/evaluate.py --model custom --dataset Test-Devβ
Evaluate Custom network on Test-Challenge:
python3 scripts/evaluate.py --model custom --dataset Test-Challengeπ‘ The script will print Accuracy, Precision, Recall, and F1-score.
DLA_DEEPFAKEDETECTION/
βββ .github/ # DependenciesBot
βββ data/ # Dataset OpenForensics (originale, non modificato)
β βββ Train/ # Training Data
β βββ Val/ # Evaluation Data
β βββ Test-Dev/ # Test-Dev Data
β βββ Test-Challenge/ # Test-Challenge Data
β βββ dataset/ # How to save the original dataset
β
βββ processed_data/ # Preprocessing output (cropped faces)
β βββ Train/
β β βββ real/ # Real faces extracted from the training set
β β βββ fake/ # Fake faces extracted from the training set
β βββ Val/
β β βββ real/ # Real faces extracted for evaluation
β β βββ fake/ # Fake faces extracted for evaluation
β βββ Test-Dev/
β β βββ real/ # Real faces extracted for Test-Dev
β β βββ fake/ # Fake faces extracted for Test-Dev
β βββ Test-Challenge/
β β βββ real/ # Real faces extracted for Test-Challenge
β β βββ fake/ # Fake faces extracted for Test-Challenge
β
βββ documentation/ # Documentation, reports, extra material
βββ logs/ # Folder to track the accuracy of the assessment and the loss you have during training
βββ models/ # Saved models (es. file .pth)
βββ scripts/ # Scripts (training, preprocessing, ecc.)
βββ notebooks/ # Jupyter Notebook for debugging and testing
βββ utils/ # Generic utilities and support functions
βββ requirements.txt # Project dependencies
βββ setup_folders.sh # Script for automatic creation of folders
βββ README.md # Project documentation
β
Face extraction from images using bounding boxes.
β
Binary classification (fake/real) of extracted faces.
β
Training with transfer learning using MobileNet or Xception.
β
Development of a custom CNN for classification.
Note
The experiments were performed on a MacBook Pro (2024) with the following specifications:
- Operating system: macOS Sonoma;
- Processor: Apple M4 Pro;
- GPU: Apple integrated GPU (M4 Pro);
- RAM: 32 GB (unified memory);
Warning
Due to the size and computational complexity of the dataset, it is possible that some experiments may be slower or difficult to execute on systems with fewer resources or less performing hardware.
Feel free to contribute to the project! π‘
- Fork the repository.
- Create a new branch:
git checkout -b new-feature
- Commit your changes:
git commit -m "Add new feature" - Push your changes:
git push origin new-feature
- Open a Pull Request on GitHub.
If you use this repository (or part of its code) for your research, a scholarly publication, or a project, please kindly cite us. You can use the following BibTeX entry:
@misc{Deepfake-Project,
author = {Congiu F., Giuffrida S., Littera F.},
title = {Deepfake Detection Project using the OpenForensics dataset},
howpublished = {\url{https://github.com/wakaflocka17/DLA_DEEPFAKEDETECTION}},
year = {2025}
}Or, if you prefer not to use BibTeX, feel free to mention the authors and the link to the repository in the acknowledgments or bibliography of your paper.
