Extract caption data #75

waldoj · 2016-02-02T21:36:47Z

Apparently the General Assembly started closed captioning video last year. So, obviously, we need some way to both preserve and extract this data. Preserving that data for 2015 isn't gonna happen—that ship has sailed—but we ought to be able to extract it.

figure out how to extract captions from exiting DVD files, without requiring reencoding as MP4s
upload the resulting SRT files to the server along with the videos
timeshift SRT files by ~10 seconds
add SRT files to the database when videos are imported
merge SRT files with MP4s
submit the SRT file to YouTube alongside the video

waldoj · 2016-02-02T22:35:25Z

I've got a start at using the CLI. The only problem that I see at the moment is that it can only hand one title per command. So I think I'll need to write a shell script to determine the number of titles, and then iterate over them with HandBrakeCLI.

waldoj · 2016-02-03T01:31:20Z

This should work, but it does not:

HandBrakeCLI -i s20160114.dvdmedia -o s20160114/s20160114-cli.mp4 -e x264 -q 20 -B 160 -t 1 --loose-anamorphic --modulus 2 --decomb --subtitle 1

It's the --subtitle 1 that's killing it. It dies with this message:

[mp4 @ 0x10212f600] Application provided invalid, non monotonically increasing dts to muxer in stream 2: 39284561 >= 39284561
ERROR: avformatMux: track 2, av_interleaved_write_frame failed with error 'Invalid argument'
[20:20:20] reader: done. 9 scr changes
[20:20:20] work: average encoding speed for job is 233.997925 fps
Encoding: task 1 of 1, 32.52 % (263.71 fps, avg 234.00 fps, ETA 00h02m16s)[20:20:20] sync: got 15309 frames, 47082 expected
[20:20:20] render: lost time: 0 (0 frames)
[20:20:20] render: gained time: 0 (0 frames) (0 not accounted for)
[20:20:20] mpeg2video-decoder done: 15316 frames, 0 decoder errors, 0 drops
[20:20:20] ac3-decoder done: 0 frames, 0 decoder errors, 0 drops
[20:20:20] mux: track 0, 13083 frames, 55035781 bytes, 1008.59 kbps, fifo 4096
[20:20:20] mux: track 1, 20462 frames, 7167726 bytes, 131.36 kbps, fifo 4096
[20:20:20] mux: track 2, 3 frames, 251 bytes, 0.00 kbps, fifo 64
[20:20:20] libhb: work result = 4

Encode failed (error 4).

HandBrake has exited.

waldoj · 2016-02-03T01:36:40Z

Regarding extracting captions from ripped DVD files, this is likewise not working:

mencoder -o /dev/null dvd://1 -dvd-device s20160114.dvdmedia/ -oac copy -ovc copy -vobsubout s20160114

It generates .idx and .sub files, but the former just has a header and the latter is empty. The output is not encouraging:

There are 2 titles on this DVD.
There are 9 chapters in this DVD title.
There are 1 angles in this DVD title.
audio stream: 0 format: ac3 (stereo) language: unknown aid: 128.
number of audio channels on disk: 1.
number of subtitles on disk: 0
success: format: 2  data: 0x0 - 0x2e698000
MPEG-PS file format detected.
VIDEO:  MPEG2  720x480  (aspect 2)  29.970 fps  8000.0 kbps (1000.0 kbyte/s)
[V] filefmt:2  fourcc:0x10000002  size:720x480  fps:29.97  ftime:=0.0334

Note that number of subtitles on disk: 0. There are, of course, subtitles.

waldoj · 2016-02-03T03:56:35Z

Everything I'm trying isn't working. FFmpeg, mencoder, Avidemux, VLC, and Handbrake. This is really frustrating.

waldoj · 2016-02-03T16:42:25Z

CCExtractor is the solution. It was wicked easy: cextractor *.VOB. It spat out a VIDEO_TS.srt file. It'll be trivial to write a shell script to generate SubRip file for every DVD.

waldoj · 2016-02-04T02:43:42Z

I batch-processed all 2015 and 2016 DVDs and uploaded the SRT files to the server.

waldoj · 2016-02-04T02:52:07Z

I test-uploaded an SRT to YouTube and...it's off by 10 seconds. The lag is a few seconds greater than would actually be with a live transcript. So I need to figure out how to time-shift those. There are desktop tools that do that, but that's not going to work.

waldoj · 2016-02-06T04:43:18Z

OK the transcript creator is up and running, with transcripts running live on the site.

waldoj · 2016-02-12T02:25:54Z

Wrote a transcript time-shifter, combined it with the duplication eliminator. The results of that can be seen here—it works great.

Next up: load SRTs into the database, figure out how to include them with YouTube uploads, and merge them existing MP4s.

waldoj · 2016-02-12T03:28:24Z

Including SRTs with YouTube uploads is not possible with the program I'm using now.

waldoj · 2017-12-27T00:23:16Z

Moved to rs-video-processor.

waldoj added the enhancement label Feb 2, 2016

waldoj mentioned this issue Dec 27, 2017

Store caption data in MP4s openva/rs-video-processor#20

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extract caption data #75

Extract caption data #75

waldoj commented Feb 2, 2016

waldoj commented Feb 2, 2016

waldoj commented Feb 3, 2016

waldoj commented Feb 3, 2016

waldoj commented Feb 3, 2016

waldoj commented Feb 3, 2016

waldoj commented Feb 4, 2016

waldoj commented Feb 4, 2016

waldoj commented Feb 6, 2016

waldoj commented Feb 12, 2016

waldoj commented Feb 12, 2016

waldoj commented Dec 27, 2017

Extract caption data #75

Extract caption data #75

Comments

waldoj commented Feb 2, 2016

waldoj commented Feb 2, 2016

waldoj commented Feb 3, 2016

waldoj commented Feb 3, 2016

waldoj commented Feb 3, 2016

waldoj commented Feb 3, 2016

waldoj commented Feb 4, 2016

waldoj commented Feb 4, 2016

waldoj commented Feb 6, 2016

waldoj commented Feb 12, 2016

waldoj commented Feb 12, 2016

waldoj commented Dec 27, 2017