The video OCR processor for Richmond Sunlight.
This downloads video from the Virginia General Assembly's floor-session video archive and subjects it to various types of analysis. At this writing, that includes OCRing the on-screen chyrons, facial recognition, and closed-caption extraction. To come: voice pitch analysis and improved facial recognition.
The video processor was put together, piece by piece, over a decade, as a series of Bash and PHP scripts. This is an effort to consolidate those, and turn them into their own project. At the moment, it's still a series of Bash and PHP scripts, lashed together with twine, but isolating them as their own project will make it easier to standardize them and improve ment.
It lives on a compute-optimized EC2 instance. Source updates are delivered via Travis CI -> S3, which the instance pulls updates from on boot. (Note that the includes/
directory is pulled from the deploy
branch of richmondsunlight.com
repository on each build.) The instance is stopped by default, and only started once rs-machine identifies a new video's availability. rs-machine communicates this information via SQS, though it fires up the rs-video-processor EC2 instance directly. rs-video-processor grabs the first entry from SQS to run through its processing pipeline, and continues to loop over available SQS entries so long as they exist. When the queue is finished, it shuts itself down.