This repository contains a project designed to detect and calculate the distance of ships in the sea. Using YOLOv8n, the model detects ships in video frames by generating bounding boxes at the point where ships make contact with the water. The distance from the camera to the ships is then calculated based on these bounding boxes.
- YOLOv8n Model Training: Separate code for training the YOLOv8n model using a custom dataset of ships in the sea.
- Bounding Box Creation: Detects ships in images and videos, generating bounding boxes at the point of contact with the water.
- Distance Calculation: Tools to calculate the distance of ships from the camera using the bounding box and ground measurements.
- Annotation Tools: Scripts for annotating videos with calculated distances.
- Video Processing: Tools to split videos into frames for training the model.
- Camera Calibration: YAML files storing intrinsic parameters and specifications for different cameras.
- Python 3.10.8
- OpenCV
To install dependencies, run:
pip install -r requirements.txt
For training and inference using the YOLOv8n model, the data needs to be organized in a specific folder structure. The data format and folder setup are crucial for the proper functioning of the model.
- The data should be placed in a main folder that contains two subfolders:
/images/
: This folder holds all the image files (e.g.,.jpg
,.png
) used for training, validation, and testing./labels/
: This folder contains the corresponding label files (e.g.,.txt
), with annotations that describe the objects in the images.
- Each label file should have the same name as its corresponding image but with a
.txt
extension. - The label files contain the following information about each object in the image:
- Class Index: The index number for the object's class (e.g.,
0
for a boat,1
for a person). - Bounding Box Coordinates: The bounding box is represented by four values:
- x_center: The x-coordinate of the center of the bounding box, as a ratio of the image width. For example, the center of the image would have an
x_center
value of0.5
. - y_center: The y-coordinate of the center of the bounding box, as a ratio of the image height. Similarly, the center of the image would have a
y_center
value of0.5
. - width: The width of the bounding box, as a ratio of the image width.
- height: The height of the bounding box, as a ratio of the image height.
- x_center: The x-coordinate of the center of the bounding box, as a ratio of the image width. For example, the center of the image would have an
- Class Index: The index number for the object's class (e.g.,
An example label file for a boat detected in an image might look like this:
0 0.1575 0.6317705 0.126875 0.04916699999999996
0 0.3996484375 0.7306253333333332 0.15499999999999997 0.07500000000000018
0 0.7063671875 0.6042190833333333 0.20999999999999996 0.06583299999999996
/best_models/
: Contains weight for the YOLOv8n model./Boat_Detection/train.py
: Contains script for training the YOLOv8n model./Boat_Detection/draw_boxes.py
: Code for creating bounding boxes on new images and videos./DistanceCalculation/distance_inference.py
: Tools for calculating the distance of ships from their bounding boxes./DistanceCalculation/distance_tools.py
: Tools to annotate videos with ship distances/helper/split_videos.py/
: Utilities for splitting videos into frames for training purposes./camera_specs/
: YAML files storing intrinsic values and specifications for different cameras.
- We started by obtaining a publicly available boat bounding box dataset from OpenImagesV4, consisting of 5000 training, 1000 validation, and 1000 test images.
- The dataset was formatted to fit the YOLOv8n model's requirements.
- We fine-tuned a pre-trained YOLOv8n model that was originally trained on 100 classes from the OpenImagesV4 dataset, specifically for boat detection.
- Initial results showed promise, though further optimization was attempted by cleaning some faulty data.
- However, the optimization did not lead to significant improvements.
- We trained a new model with the same YOLOv8n architecture but observed that the pre-trained model consistently yielded better results.
- To explore further, we switched to a newer version of the dataset, OpenImagesV8, downloading 4000 images (3000 train, 500 val, 500 test).
- The newly trained model using OpenImagesV8 performed slightly worse in detecting boats compared to the older model trained with OpenImagesV4.
- We experimented with using separate classes for different types of boats and one class for people.
- This adjustment resulted in a model that was a better general detector, but its performance in detecting boats was slightly worse.
- After deciding to proceed with the best-performing model, we shifted focus to developing the distance calculation algorithm.
- Initially, we completed the vertical distance calculation, then combined it with lateral distance to calculate total ship distance.
- The algorithm was first tested with short distances in an office environment.
- We collected field data to rigorously test our distance calculation algorithm and concluded that it worked with approximately 90% accuracy.
- Use the video tools to split videos into frames and annotate them.
- Transform annotations into YOLOv8 format.
- Use the scripts in the
/Boat_Detection/
directory to train the YOLOv8n model on your dataset.
- Use the
/bounding_boxes/
code to generate bounding boxes for ships in new video frames.
- Run the distance calculation tools on detected bounding boxes to estimate the distance of ships from the camera.
- Annotate your videos with the calculated distances for visualization.
- Camera calibration data (intrinsic parameters) is stored in the
/camera_specs/
directory, with separate YAML files for each camera model. - The distance calculation tools automatically reference the correct YAML file based on the camera used for the video.