Main goal of project is enabling tranffic signs recognition utilizing own implementation of SVM (Support Vector Machine) algorithm running on HOG features (Histograms of Oriented Gradients). Approach is to be implemented in OpenCV libraries for Java. Ready artefact (like opencv HOGDescrptor) or pretrained models are not to be used.
Building OpenCV artefacts for Java is done according to tutorial
Unfortutely documentation and community around OpenCV in Java is not as widely available as in case of C++ or Python. Lot of details have to be figured about combining information from OpenCV javadoc and C++ documentation.
In project images of traffic sign are used. There are sets of images available online in framework of INI Benchmark project. Images are splitted into two datasets data GTSRB (for classification) and GTSDB (for localization).
The best reference for undersanding Histogram of Oriented Gradients is original paper by Dalal and Triggs. It gives holistic view on HOG ad explains why and how.
Second, easier resource is article by Satya Mallick - not everything is said straightfully, but author gives good tips about the way to vizualize HOGs (this has been implemented)
First challenge working with Java OpenCV is that there is no native methods to even display simple image (openCV Image or Mat object). Utilities to display images and visualize features are implemented from scratch in DisplayUtils class.
Calculatig HOG descriptor consists of few steps:
For now only preprocessign is resizing input images to the same size 64x64 pixels. It is be determined if using additinal preprocessing (e.g. blurring) would increase effectiveness of utilizing HOGs.
For every pixel in 64x64 image gradient magintude and direction (angle) is calculated
Image is divided into 8x8 pixels cells, what give 8 cells in each row and column per image.
Having calculated gradient in every cell, histograms are built. Histogram consists of 18 bins gor gradient direction (360 degrees divided with interval of 20 degrees). Basing on gradient direction in given pixel magnitude is splited and added to pratcular bins. Similiar histogram is build for every cell.
Acorrding to original paper introducing HOG normalization is performed in block 16x16 pixels (so 2x2 cells)
Most of implemented methods are fully paramtrized so HOG descriptor can be defined in many ways. Considering above number (they seem to be rational for given iamges of traffic signes) our HOG descriptor is to have length of 3528 values.
Where this number come from? In 64x64 image we can move 16x16 block 7 times vertically and 7 times horizontally (moving by 1 cell = 8 pixels). In every block we have 4 cells. Every cell has 18 bins in its historgram.
That gives us 7 * 7 * 4 * 18 = 3528.
On below picture (right) visualization of HOG is shown. In every cell we have all bin presented as lines. Line length is proportional to bin value. It is visible that dominant direction of the histogram captures the oval shape of the sign and shape of numbers.
For comparison here are visualizations of HOGs for the same image but scaled to 128x128 and 32x32 before computations
Additinal note: Some more complicated methods are covered with unit tests (/test/eiasr/)