Skip to content

Latest commit

 

History

History
165 lines (98 loc) · 16.9 KB

README.md

File metadata and controls

165 lines (98 loc) · 16.9 KB

Vision-Based Near-Shore Wave Tracking and Recognition for High Elevation and Aerial Video Cameras (C++, OpenCV)

Frame Grab This repository contains a program for modeling, detecting, tracking, and recognizing near-shore ocean waves, written in C++ with use of OpenCV 3+ library.

This program is demoed on video clips from several Southern California locations in a Vimeo video here.

Software and Library Requirements

  • OpenCV 3.2.0
  • a C++11 compiler
  • CMake 3.8.1 or higher if you are generating build files with the CMakeLists.txt script.

A High-Level Overview

This program implements a method of near-shore ocean wave recognition through a common Computer Vision "recognition" workflow for video sequences, and is fast enough to run in realtime.

The general object recognition workflow for video sequences proceeds from detection, to tracking, and then recognition1. The process in this program is thus:

  1. Preprocessing of video: background modeling of the maritime environment and foreground extraction of near-shore waves.
  2. Detection of objects: Identification and localization of waves in the scene.
  3. Tracking of objects: Identity-preserving localization of detected waves through successive video frames for the capture of wave dynamics.
  4. Recognition of objects: Classification of waves based on their dynamics.

Wave recognition has uses in higher-level objectives such as automatic wave period, frequency, and size determination, as well as region-of-interest definition for human activity recognition.

Program Architecture

In accordance with the general comupter vision recognition workflow for objects in videos, the program is split into four modules (preprocessing, detection, tracking, and the 'Wave' class), in addition to main(). Module functions are declared and described in their header files (*.hpp), with implementation and usage in the associated source files (*.cpp).

Code Organization

File Purpose
include/wave_objects.hpp Declaration of the Wave class and associated data members and member functions.
include/preprocessing.hpp Declaration of preprocessing functions for frames of an OpenCV VideoReader object.
include/detection.hpp Declaration of wave detection functions from preprocessed frames of an OpenCV VideoWriter object.
include/tracking.hpp Declaration of wave tracking functions from preprocessed frames of an OpenCV VideoWriter object.
src/wave_objects.cpp Definition and construction of the Wave object, and Wave get/set methods.
src/preprocessing.cpp Defintions of the frame preprocessing functions. Preprocessing downsizes full frames, applys Mixture of Gaussians mask, and denoises with morphological operators.
src/detection.cpp Defintions of the Wave detection functions. Detection routine search for contours, filters contours, and returns Wave objects.
src/tracking.cpp Defintions of the Wave tracking functions. Tracking routine defines a search region of interest for a Wave object and identifies its representation in future frames. Updates Wave data as necessary. Includes several clean-up functions.
main.cpp Main implementation of the Multiple Wave Tracking program. Implements preprocessing, detection, and tracking functions, as well as input and output handling.
scenes/ A directory of sample videos for the Multiple Wave Tracking program.
CMakeLists.txt Helper CMake script to generate build files for compilation.

The Multiple Wave Tracking Model, in Short

Main() implements the recognition workflow from above. The following bullets list the modeling operation employed in this program, and a full discussion on model choices can be found in "Model Details" below.

  • Preprocessing: Input frames are downsized by a factor of four for analysis. Background modeling is performed using a Mixture-of-Gaussians model with five Gaussians per pixels and a background history of 300 frames, resulting in a binary image in which background is represented by values of 255 and foreground as 0. A square denoising kernel of 5x5 pixels is applied pixel-wise to the binary image to remove foreground features that are too small to be considered objects of interest.
  • Detection: Contour-finding is applied to the denoised image to identify all forground objects. These contours are filtered for both area and shape using a contour's moments, resulting in the return of large, oblong shapes in the scene. These contours are converted to Wave objects and passed to the tracking routine.
  • Tracking: A region-of-interest is defined for each potential wave object in which we expect the wave to exist in successive frames. The wave's representation is captured using simple linear search through the ROI and its dynamics are updated according to center-of-mass measurements.
  • Recognition: We use two dynamics to determine whether or not the tracked object is indeed a positive instance of a wave: mass and displacement. Mass is calculated by weighting pixels equally and performing a simple count. Displacement is measured by calculating the orthogonal displacement of the wave's center-of-mass relative to its original major axis. We accept an object as a true instance of a wave if its mass and orthogonal displacement exceed user-defined thresholds.

Data and Assumptions

In order to use tracking inference in the classification of waves, we must use static video cameras (e.g. surveillance cameras) as input to the program. Included in the scene directory are three videos from different scenes that can be used to test the Multiple Wave Tracking program. These videos are 1280 x 720 pixels and encoded with the mp4 codec. Please note that if you use your own videos, you may have to re-encode your videos to play nice with the OpenCV library. A common tool for handling video codecs is the FFMEG library.

As a vision-based project, this program performs best on scenes in which the object of interest (the wave) is sufficiently separated from other objects (i.e. there is no occlusion or superimposition). This assumption is fair for the wave object as ocean physics dictate that near-shore waves generally have consistently delineated periods of inter-arrival times, due to the physical processes of assimilation, imposition, and interference that take place a great distance from shore.

To this end, a camera is able to pronouce this periodicity on the focal plane simply by increasing its own elevation from the ocean surface plane. That is- the higher the elevation of the camera, the better the separation in the frame.

Compiling and Launching the Model

The source files for this project must be compiled prior to execution. A script is provided for generating build files using CMake as CMakeLists.txt, though it is not necessary to compile with this method. The dependency for compiling the Multiple Wave Tracking program is the OpenCV library, which can be obtained here. This project uses OpenCV version 3.2.0. You will need to place the OpenCV header and library directories in your compiler's search path. You may reference this documentation for a refresher on compiling OpenCV projects with G++ and CMake.

After compiling the Multiple Wave Tracking program, you can launch the program from the command line after navigating to the directory containing the executable. For example:

joe_bloggs build $ ./mwt_cpp some_video_with_waves.mp4

You should see output like this:

Starting analysis of 840 frames.
100 frames complete. (122 frames/sec; 0 sec/frame)
200 frames complete. (131 frames/sec; 0.005 sec/frame)

The program reports its status every 100 frames, as well as the performance of the program.

The program will report simple statistics at the conclusion of analysis, like the following:

Program complete.
Program took 5950 milliseconds.
Program speed: 168 frames per second.
2 wave(s) found.

Model Discussion

What exactly is a "wave"?

Examples of Near-Shore Waves

When considering the static representation of a wave in color space, we make use of the high contrast between a wave that is broken and the surrounding water environment. For our program, a wave is denoted by the presence of sea foam when it has "broken". Foam as a physical object is the trapping of air inside liquid bubbles whose refractive properties give the foam a holistic color approaching white. This is contrasted with the ocean surface that does not have such refractive properties and rather traps light such that its intensity is much lower than that of foam. Therefore, when we use computer vision to search a maritime image for a wave, we are really looking for the signature of a wave in the form of sea foam. It is important to note that one wave object can be represented by many disparate "sections" along the length of the wave.

Further, a "wave" in our case has an assumed behaviour of dynamic movement through time. We can take advantage of the fact that ocean waves propogating from a source travel in a direction that is orthogonal (perpendicular) to the plane tangent to its wavefront to simply describe a wave's travel with a 1-demensional value (e.g. "the wave has traveled 50 feet").

This representation of a wave in the time domain allows us to abstract the near-shore wave identification problem into a recognition-through-tracking formulation.

Preprocessing

Preprocessed Frame

In our videos of maritime environments, waves are surely the largest objects present and thus our downsizing of input videos by a factor of four (or greater) is acceptable. Background modeling, however, for these environments is very difficult endeavor even with static cameras due to the dynamic nature of liquids.

We employ an adaptive Gaussian Mixture Model (MOG)2 with five Gaussians per pixel in order to give us the greatest flexibility in accounting for a variety of light and surface refractions. The MOG model will classify as foreground any pixel whose value is calculated to exceed 2.5x the standard deviation of any one of these Gaussians. The Gaussian distributions that constitute a pixel's representation are determined using Maximum Likelihood Estimation (MLE), solved in our case with an Expectation Maximization (EM) algorithm. Although Background Subtraction methods have been developed specifically for maritime environments, a quick review suggests that MOG performs faster than such methods, while having having slightly worse accuracy on tested datasets3 (none of which consider a breaking wave to be foreground, however).

A background history of 300 frames (equivalent to 10 seconds in our sample videos) ensures that the background is adaptive to changes in ambient conditions that are gradual, while ensuring that a wave passing through is an infrequent-enough event to be classified as foreground.

Despite the flexibility of the MOG model, there will still be residual errors in background modeling. To account for this, we apply a denoising operation on the frame that employs a mathematical morphology operation known as 'opening', which is a sequence of erosion followed by dilation. This has the effect of sending the residual foreground pixels from the GMM model to the background while retaining the general shape of salient foreground features.

We note in passing that the EM algorithm that is performed on a per-pixel basis contributes significantly to the overall expense of background modeling in the program, estimated to be about 50% of the total CPU resources.

Detection

Detection and Filtering

The resultant frame from the preprocessing module is a binary image in which the background is represented by one value while the foreground is represented by another. In our case, we are left with an image in which waves are represented as sea foam in the foreground. Detection is intended to localize these shapes and to subject them to thresholding that further eliminates false instances.

We use the contour finding method from Suzuki and Abe4 that employs a traditional border tracing algorithm to return shapes and locations. These contours are filtered for area (to eliminate foreground objects that are too small to be considered) and inertia (to eliminate foreground objects whose shape does not match that of a breaking wave). We are left with large, oblong object contours which we convert to Wave objects and pass to the tracking routine.

Tracking

Tracking Waves

We can take advantage of two assumptions about waves that eliminate our reliance on traditional sample-based tracking methologies and their associated probabilistic components. The first is that waves are highly periodic in arrival and therefore will not exhibit occlusion or superimposition. The second assumption is about the wave's dynamics; specifically, that a wave's movement can be desribed by its displacement orthogonal to the axis along which the wave was first identified in the video sequence. These two assumptions allow us to confidently define a search region in the next frame using just a center-of-mass estimate in the current frame, and reduces our search space for the wave's position in successive frames to a search along one dimension. The reduction in dimensionality of the search space allows us to cheaply and exhaustively search for a global position that describes our tracked wave in successive frames. We do not need to rely on sample-based tracking methods that are susceptible to drift and/or suboptimal identifications.

The tracking routine also manages multiple representations of the same wave through the merging of these multiple sections into one object, as a wave can be constructed of many disparate contours.

Recognition

Recognition

Tracking allows us to incorporate dynamics into classification of waves. We use two dynamics to determine whether or not the tracked object is indeed a positive instance of a wave: mass and displacement. Mass is calculated by weighting pixels equally and performing a simple count. Displacement is measured by calculating the orthogonal displacement of the wave's center-of-mass relative to its original major axis. We accept an object as a true instance of a wave if its mass and orthogonal displacement exceed user-defined thresholds.

By introducing tracking, we are able to confidently classify waves in videos by combining simple bilevel representations of the waves with cheaply-calculated dynamics. If we were to resort to bilevel detection methods for waves without employing dynamics, our methods would be susceptible to false positives from many sources. Certainly a large boat might be an example of a false positive, but harder examples that should be negatively classified include wave-types that have similar contour representations to ocean waves, including "shorebreak"-type waves that break on the shore, and "whitecapping"-type waves that have the appearance of breaking due to near-shore winds. Neither of these meet our definition of an ocean wave.

Improvements

  • Wave.points_ should be an array of pointers to pixels, rather than copies of the pixels.
  • Implement object construction using separate Wave and Section classes.

Footnotes

1: Bodor, Robert, Bennett Jackson, and Nikolaos Papanikolopoulos. "Vision-based human tracking and activity recognition." Proc. of the 11th Mediterranean Conf. on Control and Automation. Vol. 1. 2003.

2: KaewTraKulPong, Pakorn, and Richard Bowden. "An improved adaptive background mixture model for real-time tracking with shadow detection." Video-based surveillance systems 1 (2002): 135-144.

3: Bloisi, Domenico D., Andrea Pennisi, and Luca Iocchi. "Background modeling in the maritime domain." Machine vision and applications 25.5 (2014): 1257-1269.

4: Suzuki, Satoshi. "Topological structural analysis of digitized binary images by border following." Computer vision, graphics, and image processing 30.1 (1985): 32-46.