Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extract caption data #75

Open
4 of 6 tasks
waldoj opened this issue Feb 2, 2016 · 11 comments
Open
4 of 6 tasks

Extract caption data #75

waldoj opened this issue Feb 2, 2016 · 11 comments

Comments

@waldoj
Copy link
Member

waldoj commented Feb 2, 2016

Apparently the General Assembly started closed captioning video last year. So, obviously, we need some way to both preserve and extract this data. Preserving that data for 2015 isn't gonna happen—that ship has sailed—but we ought to be able to extract it.

  • figure out how to extract captions from exiting DVD files, without requiring reencoding as MP4s
  • upload the resulting SRT files to the server along with the videos
  • timeshift SRT files by ~10 seconds
  • add SRT files to the database when videos are imported
  • merge SRT files with MP4s
  • submit the SRT file to YouTube alongside the video
@waldoj
Copy link
Member Author

waldoj commented Feb 2, 2016

I've got a start at using the CLI. The only problem that I see at the moment is that it can only hand one title per command. So I think I'll need to write a shell script to determine the number of titles, and then iterate over them with HandBrakeCLI.

@waldoj
Copy link
Member Author

waldoj commented Feb 3, 2016

This should work, but it does not:

HandBrakeCLI -i s20160114.dvdmedia -o s20160114/s20160114-cli.mp4 -e x264 -q 20 -B 160 -t 1 --loose-anamorphic --modulus 2 --decomb --subtitle 1

It's the --subtitle 1 that's killing it. It dies with this message:

[mp4 @ 0x10212f600] Application provided invalid, non monotonically increasing dts to muxer in stream 2: 39284561 >= 39284561
ERROR: avformatMux: track 2, av_interleaved_write_frame failed with error 'Invalid argument'
[20:20:20] reader: done. 9 scr changes
[20:20:20] work: average encoding speed for job is 233.997925 fps
Encoding: task 1 of 1, 32.52 % (263.71 fps, avg 234.00 fps, ETA 00h02m16s)[20:20:20] sync: got 15309 frames, 47082 expected
[20:20:20] render: lost time: 0 (0 frames)
[20:20:20] render: gained time: 0 (0 frames) (0 not accounted for)
[20:20:20] mpeg2video-decoder done: 15316 frames, 0 decoder errors, 0 drops
[20:20:20] ac3-decoder done: 0 frames, 0 decoder errors, 0 drops
[20:20:20] mux: track 0, 13083 frames, 55035781 bytes, 1008.59 kbps, fifo 4096
[20:20:20] mux: track 1, 20462 frames, 7167726 bytes, 131.36 kbps, fifo 4096
[20:20:20] mux: track 2, 3 frames, 251 bytes, 0.00 kbps, fifo 64
[20:20:20] libhb: work result = 4

Encode failed (error 4).

HandBrake has exited.

@waldoj
Copy link
Member Author

waldoj commented Feb 3, 2016

Regarding extracting captions from ripped DVD files, this is likewise not working:

mencoder -o /dev/null dvd://1 -dvd-device s20160114.dvdmedia/ -oac copy -ovc copy -vobsubout s20160114

It generates .idx and .sub files, but the former just has a header and the latter is empty. The output is not encouraging:

There are 2 titles on this DVD.
There are 9 chapters in this DVD title.
There are 1 angles in this DVD title.
audio stream: 0 format: ac3 (stereo) language: unknown aid: 128.
number of audio channels on disk: 1.
number of subtitles on disk: 0
success: format: 2  data: 0x0 - 0x2e698000
MPEG-PS file format detected.
VIDEO:  MPEG2  720x480  (aspect 2)  29.970 fps  8000.0 kbps (1000.0 kbyte/s)
[V] filefmt:2  fourcc:0x10000002  size:720x480  fps:29.97  ftime:=0.0334

Note that number of subtitles on disk: 0. There are, of course, subtitles.

@waldoj
Copy link
Member Author

waldoj commented Feb 3, 2016

Everything I'm trying isn't working. FFmpeg, mencoder, Avidemux, VLC, and Handbrake. This is really frustrating.

@waldoj
Copy link
Member Author

waldoj commented Feb 3, 2016

CCExtractor is the solution. It was wicked easy: cextractor *.VOB. It spat out a VIDEO_TS.srt file. It'll be trivial to write a shell script to generate SubRip file for every DVD.

@waldoj
Copy link
Member Author

waldoj commented Feb 4, 2016

I batch-processed all 2015 and 2016 DVDs and uploaded the SRT files to the server.

@waldoj
Copy link
Member Author

waldoj commented Feb 4, 2016

I test-uploaded an SRT to YouTube and...it's off by 10 seconds. The lag is a few seconds greater than would actually be with a live transcript. So I need to figure out how to time-shift those. There are desktop tools that do that, but that's not going to work.

@waldoj
Copy link
Member Author

waldoj commented Feb 6, 2016

OK the transcript creator is up and running, with transcripts running live on the site.

@waldoj
Copy link
Member Author

waldoj commented Feb 12, 2016

Wrote a transcript time-shifter, combined it with the duplication eliminator. The results of that can be seen here—it works great.

Next up: load SRTs into the database, figure out how to include them with YouTube uploads, and merge them existing MP4s.

@waldoj
Copy link
Member Author

waldoj commented Feb 12, 2016

Including SRTs with YouTube uploads is not possible with the program I'm using now.

@waldoj
Copy link
Member Author

waldoj commented Dec 27, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant