Guillaume Haerinck - M2 Image 2021
Class by Eva Dokladalova
The goal of this project is to implement a fast crowd counting application using a webcam video stream. The approach is not using the latest state of the art techniques such as machine learning for simplicity purposes. The image acquisition and presentation is done with last version of OpenCV, while the algorithm for crowd counting is custom and / or adapted from other algorithms :
- Each frame is converted to grayscale
- Background subtraction from the video is done with the ViBe algorithm
- Median filtering to ease the next steps
- Morphological opening to remove noise
- Morphological closing to fill the gaps
- Connected components extraction and counting
The code has been primarily tested on an Intel Core i3-4160 CPU @3.60GHz with 2 cores and 4 threads. It supports SIMD through SSE4.2 and AVX2 instructions. Source code for the application can be found here.
To measure the execution time of each method we rely on the open-source Tracy instrumenting profiler. It will track the execution with the help of the ZoneScoped
macro placed on top of each important methods. To use it, one simply has to grab the executable here and run it along with the project. More detailed step about the setup can be found here.
My choice is motivated by the fact that it is a modern, portative and non-intrusive environment. It offers advanced features such as machine-code view, threads handling and several histograms. Other approaches such as inline gettimeofday
would have unnecessarily complexified the code while being too thin in details.
We start by converting the 3 channel frame to a 1 channel grayscale to improve the performance of the background detection. We use the following formula for each pixel : Grayscale = 0.299R + 0.587G + 0.114B
. It is possible to access OpenCV matrices as contiguous array, better suited for cache coherency. In code it is expressed as follows :
for (int i = 0; i < in.rows; i++)
for (int j = 0; j < in.cols; j++)
out.data[i * out.step + j] = 0.114 * in.data[i * in.step + j * 3] + 0.587 * in.data[i * in.step + j * 3 + 1] + 0.299 * in.data[i * in.step + j * 3 + 2];
The background subtraction is the burden of the ViBe algorithm and will be better explained in their research paper. I used their C source code as the C++ version was contained in multiple files and relied on structures that weakened the understanding of the algorithm. I have made no modification in their source code for this first version. It is available in the vibe-background-sequential.cpp
file.
The median filter will replace each pixel with the median value of its 8 block neighbour. In code it is defined with an array of size 8 that will be filled by the surrounding pixels. Once filled the array is sorted and we take in the center value. In our case, the sort method is an "insertion" sort. For the previous project in tried to use a bubble sort optimisation but it turned out to be less performant than the insertion sort.
The morphological opening is used to remove the noise. I use a small 5x5 kernel of ones. I tried to implement it myself from a portion of code found online but it gave me unsatisfying results for a generous amount of time so I rely on openCV instead. The code is commented so it is still possible to test it easily. The implementation can be found in the morphology.cpp
.
The closing is used to fill the gaps still present in the body shape. This time the kernel is bigger (40x40 of ones) and it can be seen in the performance cost. As for the Opening, I rely on OpenCV.
Finally I use the OpenCV method to count the number of connected components of the image, and do so basic size filtering on the results. With the area detected, I draw a square on the final image to show the people detected.
The cost of a frame is between 500 and 600ms which is around 1.5 frame per seconds. We can see that the MedianBlur and the Closing are the two operation which are costing the most.
Method | Mean Time (ms) | Standard deviation (ms) |
---|---|---|
convertRGBToGrayscale |
38.3 | 5.68 |
libvibeModel_Sequential_Segmentation_8u_C1R |
21.27 | 5.21 |
medianFilter |
254 | 42 |
opening |
15.77 | 3.55 |
closing |
132 | 31 |
extractConnectedComponents |
27 | 9.37 |
I rely mainly on the nested loop optimisation with the directive #pragma omp parallel for collapse(2)
. I only use it when the loop have enough value or the overhead of context switching will cost more than the gains given by multithreading.
The cost of a frame vary between 300 and 400ms, which is about 3 frames per seconds. It is a 1.5x improvements over the non-threaded version as some methods such as opening and closing are still not using multiple threads. For the grayscale conversion and the median filter we reduced the cost by more than half. Then for the vibe algorithm, even though there are multiple OpenMP directives, we only gain a third of performance.
Method | Mean Time (ms) | Standard deviation (ms) |
---|---|---|
convertRGBToGrayscale |
15.5 | 3.97 |
libvibeModel_Sequential_Segmentation_8u_C1R |
14.31 | 6.35 |
medianFilter |
98.51 | 18.79 |
opening |
17.21 | 2.29 |
closing |
126.19 | 120.81 |
extractConnectedComponents |
18.15 | 2.23 |
By uncommenting portion of code it is also possible to test easily the raw performance of OpenCV.
The cost of a frame vary between 240 and 300ms which is about 4 frames per seconds. The OpenCV counterpart offer similar performance than multithread for RGB conversion while being single-threaded, and is still two times more performant for median filtering.
Method | Mean Time (ms) | Standard deviation (ms) |
---|---|---|
convertRGBToGrayscale |
14.14 | 3.6 |
libvibeModel_Sequential_Segmentation_8u_C1R |
14.4 | 3.3 |
medianFilter |
44.9 | 4.3 |
opening |
19.33 | 1.95 |
closing |
95.1 | 6.75 |
extractConnectedComponents |
12.6 | 1.6 |
My overall feeling is that this was a great exercise to learn the right mindset for optimisation. The fact that we had to follow the strict steps of code-profile-optimise for the report gave us a great workflow to reproduce at work. I regret is that I lack free time to continue with SIMD for example, and also to better understand the OpenCV implementation and why it performs so well.
Name | Type | Description |
---|---|---|
OpenCV People Counter in Python | Article | |
People Counting in Real Time in Python and OpenCV | Github | |
People counter OpenCV - Intel | Github | |
OpenCV substract background | OpenCV Doc | |
OpenCV Morphological operations | OpenCV Doc | |
OpenCV SSIM and PSNR | OpenCV Doc | |
OpenCV exemple connected components | OpenCV Doc | |
OpenCV Connected component labelling | Article | |
Vibe background substraction algorithm | Paper |