FrameX is a deep learning-based project designed to enhance low-resolution images into high-resolution ones using advanced image super-resolution techniques. This project leverages PyTorch, torchvision, and other Python libraries to create and train models for upscaling images. The project features a custom dataset pipeline, EDSR model implementation, and supports training and evaluation on GPUs.
For the purpose of this project, we worked solely with a dataset of 4k Minecraft Images, linked in the download.sh file.
This was trained on NERSC Perlmutter using the NVIDIA A100 platform.
We found that training for 30 Epochs was sufficient.
Here are example images processed by FrameX:
FrameX/
|-- Examples/ # Directory for sample datasets or code examples
|-- download.sh # Script to download necessary datasets
|-- README.md # Documentation for the project
|-- requirements.txt # Python dependencies
|-- training.py # Main script for training and evaluation
- Custom Dataset Loader: Dynamically loads and preprocesses image data for super-resolution tasks.
- Super-Resolution Models: Includes implementations for SRCNN and EDSR architectures.
- Patch-Based Training: Extracts image patches to optimize training and memory usage.
- Efficient Training Pipeline:
- Utilizes GPU acceleration for faster training.
- Implements mixed precision training using
torch.cuda.amp
.
- Image Quality Metrics:
- Computes Peak Signal-to-Noise Ratio (PSNR) for evaluation.
- Real-Time Visualization: Saves intermediate input, output, and ground truth images during training.
- Supports Large-Scale Datasets: Designed to handle 4K and 8K images with efficient memory management.
The project requires Python and the following libraries:
- Clone the repository:
git clone <repository_url> cd FrameX
- Install dependencies:
pip install -r requirements.txt
- Download the dataset:
bash download.sh
The training.py
script supports end-to-end training and evaluation:
- Modify the dataset directory and parameters in the script:
images_dir = "./dataset/images/"
- Run the training script:
python training.py
- During training:
- Training and validation losses are printed for each epoch.
- PSNR values are calculated for quality assessment.
- Model checkpoints and intermediate results are saved to the
./runs
directory.
Ensure that the image dataset is placed in the following structure:
dataset/
|-- images/
|-- image1.jpg
|-- image2.jpg
...
Images should be in .jpg
format. The script automatically splits the dataset into training and testing sets.
- A deep and robust architecture leveraging residual blocks.
- Features:
- Residual scaling for stability.
- Adjustable number of residual blocks and features.
- Patch Size: Default is 96x96 for image patches.
- Learning Rate: Default is
1e-4
with Adam optimizer. - Epochs: Default is 30.
- Batch Size: Default is 128.
- Intermediate Outputs: Randomly selected input, output, and ground truth images are saved in the
runs/
directory. - Model Checkpoints: Saved every 5 epochs in the root directory.
- Final Model: Saved as
srcnn_model.pth
.
Evaluation is done on a test dataset to compute PSNR and save reconstructed high-resolution images. To test a single image:
- Modify the test dataset path.
- Run inference directly on the trained model.
Example to visualize random patches during training:
sample_batch = next(iter(train_dataloader))
sample_input, sample_target = sample_batch
print(sample_input.shape, sample_target.shape)
- Add SSIM (Structural Similarity Index) for quality evaluation.
- Extend support for multi-scale super-resolution.
- Implement GAN-based super-resolution techniques (e.g., SRGAN).
Ruben Alias