The development exercise deals with the design and development of a perception module for the new product line of Acme Robotics – warehouse management robots (WMR), for carrying the goods from one place to another in a workspace shared by humans. Hence it is desired that WMRs should not collide with anyone. This problem can be solved by implementing our Human Detector(HD) module. Our HD module works on the principle of supervised learning by utilizing Haar Cascade Classifier for object detection (humans in our case). HD module will use an on-board camera provided in the WMR. It will process the captured images (frame by frame) to detect the presence of any human. The module will return the coordinates of the detected human(s) in the 2D-coordinate frame corresponding to the field of view of the robot. This data will contain bottom left (x,y), bottom right (x,y) and height of the enveloping rectangular box around the detected humans. This data can then be processed by other modules developed by Acme Robotics like-real time path planning, to modify the travel path of the WMR to avoid any possibility of a collision.
The presence of humans in a webcam feed is detected by the use of Haar Cascade Classifier from the OpenCV library, which is covered under the 3-clause BSD License. Cascade functionality can be used to detect objects by importing a trained model XML file. These XML files can be of a pre-trained model or can be generated using command prompt utility of the OpenCV. For this project our team created a new model from scratch. We will thus first higlight the steps to generate a new cascade model.
$ sudo apt-get update
$ sudo apt-get upgrade
$ sudo apt-get install build-essential cmake git libgtk2.0-dev pkg-config libavcodec-dev libavformat-dev libswscale-dev
$ sudo apt-get install python3.5-dev python3-numpy libtbb2 libtbb-dev
$ sudo apt-get install libopencv-dev libjpeg-dev libpng-dev libtiff5-dev libjasper-dev libdc1394-22-dev libeigen3-dev libtheora-dev libvorbis-dev libxvidcore-dev libx264-dev sphinx-common libtbb-dev yasm libfaac-dev libopencore-amrnb-dev libopencore-amrwb-dev libopenexr-dev libgstreamer-plugins-base1.0-dev libavutil-dev libavfilter-dev libavresample-dev
$ sudo apt-get install git
$ git clone https://github.com/opencv/opencv.git
$ cd opencv
$ mkdir build
$ cd build
$ cmake -D BUILD_TIFF=ON -D WITH_CUDA=OFF -D ENABLE_AVX=OFF -D WITH_OPENGL=OFF -D WITH_OPENCL=OFF -D WITH_IPP=OFF -D WITH_TBB=ON -D BUILD_TBB=ON -D WITH_EIGEN=OFF -D WITH_V4L=OFF -D WITH_VTK=OFF -D BUILD_TESTS=OFF -D BUILD_PERF_TESTS=OFF -D CMAKE_BUILD_TYPE=RELEASE -D CMAKE_INSTALL_PREFIX=/usr/local -D OPENCV_EXTRA_MODULES_PATH= ../ ../opencv_contrib/modules ../
$ make -j4
$ sudo make install
$ sudo ldconfig
$ sudo apt-get install python-opencv
Check if everything is done by running below command. A version should be displayed.
$ python
>>> import cv2
>>> print(cv2.__version__)
'''after version is printed, exit out of python to commoand line'''
>>> exit()
For any deep learning based training and model generation the most important thing is to have a dataset of POSITIVE and NEGATIVE instances. In this case the postive instances refer to a set of images which contain human object(s) and negative instances refer to a set of images which do not contain humans. Positive dataset is taken from Horses or Humans and negative dataset is taken from INRIA Person Dataset.
Follow the below steps-
- Create a directory <model> and cd into it.
- Download the images in two different folders namely, pos and neg.
- Label the positive images and create a info.lst file in the directory with entiries for each image to specify the object position: <filename> <no_of_instances> <left_top_x_coordinate> <left_top_y_coordinate> <width> <height>
- In the <model> directory create a background text file "bgNeg.txt" with information of path of each negatives instances <filepath_filename>
- From the run the below command in terminal to generate a vector file. Lets name that file as "positives.vec". Here -w and -h are the training model parameters (ensure that the object in positive instances have similar width/height ratio for better performance)
opencv_createsamples -info pos/info.lst -num 527 -w 36 -h 110 -vec positives.vec
-
mkdir data in <model>
-
Lastly, tain your Haar Cascade Model from the <model> directory.
opencv_traincascade -data data -vec positives.vec -bg bgNeg.txt -numPos 500-numNeg 800 -numStages 10 -w 36 -h 110
Training is a time demanding and computationally expensive process. Larger the number of instances, traininig stages, -w and -h parametrs, the more time it takes. We used 527 positive instances with w=36 and h=110 over 10 stages, and it took 16hrs (approx.) for one model to train.
Eigen 3.3.7
It has been provided in the repository.
cpplint error in header guards and so was required to added full path as a string for the guard.
$ cd Human-Detection-Module
$ mkdir build
$ cd build
$ cmake ..
$ make
$ make test
$ cd Human-Detection-Module
$ cd build
$ ./app/humanDetect
$ cd Human-Detection-Module
$ cd build
$ ./test/cpp-test
$ git clone https://github.com/doxygen/doxygen.git
$ cd doxygen
$ mkdir build
$ cd build
$ cmake -G "Unix Makefiles" ..
$ make
$ sudo make install
$ cd
$ doxygen -g dconfig
$ cd MidTermGroup-14
The above commands are used to generate doxygen documentation. This file for our project has been provided in the repository as DoxyConfig.
A homogeneous matrix trasformation and lens formula are used to convert from the image cordinate to the world cordinate.
The image origin is at the top left corner with positive x axis pointing towards right, and positive y axis pointing towards bottom.While the world coordinate (view point of robot) is set with origin at the center of view and positive x axis towards right and positive y axis pointing upwards.
This give the roational matrix as
r= [ 1 0 0
0 -1 0
0 0 -1]
Translation (from world origin to image origin) = [ -Image width /2
Image height /2
0];
Homogenenous matrix from image to world frame is :
h= [ R T
0 1};
World Cordinate = H* Image Cordinate;
(with z assumed to be 0)