srtgen

Generate subtitles for video file

Using the paid Google Cloud Speech-To-Text API

This program requires a Google account and an API key: Create project on Google Cloud

usage

$ ./srtgen.py 
usage
  srtgen.py --apikey path/to/keyfile.json path/to/input-video.mp4

environment variables
  GOOGLE_APPLICATION_CREDENTIALS=path/to/keyfile.json srtgen.py path/to/input-video.mp4

keyfile
  This program requires a Google account and an API key
  https://console.cloud.google.com/projectcreate

subtitle is written to stdout and output/xxxxxx-input-video.mp4/output_file.srt
where xxxxxx is the sha1 hash of the input video file

temporary files are stored in output/xxxxxx-input-video.mp4/

features

workaround size limit in google API
- no need for Google Cloud Storage = gs protocol
- duration is limited to 60 seconds
- file size is limited to 10485760 bytes

dependencies

ffmpeg
python
- pydub
- google.cloud.speech
  - API key
  - pricing
    - speech recognition needs lots of space and time = there is no free lunch
    - https://cloud.google.com/speech-to-text/pricing#pricing_table
      - first hour is free
        
        TODO one hour per month or one hour per google account?
      - Speech Recognition without Data Logging: $0.006 / 15 seconds = $0.024 / 1 minute = about $1.50 / 1 hour
      - Speech Recognition with Data Logging: $0.004 / 15 seconds = $0.016 / 1 minute = about $1.00 / 1 hour
      - Data Logging = feedback of manually corrected text to improve quality of service
        
        TODO implement upload of corrected text
  - TODO Automatic punctuation

https://github.com/BingLingGroup/autosub
- online speech recognition
  - Google
  - Xfyun
  - Baidu
https://github.com/abhirooptalasila/AutoSub
- offline speech recognition
  - Mozilla DeepSpeech
  - lower quality than google speech
  - limited by user hardware: space, time, cpu instruction set (binaries dont run on weak cpus)
https://github.com/topics/subtitles-generator
- https://github.com/nestyme/Subtitles-generator

todo

use speech_recognition module, so srtgen can use multiple backend services
- support offline speech recognition
  - mozilla deepspeech
  - vosk
  - Picovoice https://picovoice.ai/docs/picovoice/
- we need a service that returns timestamps for every word
  - google cloud speeech: enable_word_time_offsets=True
  - alternative: synchronize words and audio waveform
    - https://github.com/otsaloma?tab=stars&q=subtitle
      - https://github.com/smacke/ffsubsync Automagically synchronize subtitles with video.
      - https://github.com/kaegi/alass "Automatic Language-Agnostic Subtitle Synchronization"
hybrid of offline and online speech recognition
- deepspeech for offline speech recognition
- google for online speech recognition
- can deepspeech return confidence values?
- run deepspeech with different models? (and manually select the best result?)
automatic postprocessing
- reduce manual work
- split long sentences
- merge short sentences

Name		Name	Last commit message	Last commit date
Latest commit History 55 Commits
.gitignore		.gitignore
default.nix		default.nix
license.txt		license.txt
readme.md		readme.md
requirements.txt		requirements.txt
setup.py		setup.py
shell.nix		shell.nix
srtgen.py		srtgen.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.gitignore

.gitignore

default.nix

default.nix

license.txt

license.txt

readme.md

readme.md

requirements.txt

requirements.txt

setup.py

setup.py

shell.nix

shell.nix

srtgen.py

srtgen.py

Repository files navigation

srtgen

usage

features

dependencies

related

based on

postprocessing tools

similar tools

todo

About

Releases

Packages

Languages

License

milahu/srtgen

Folders and files

Latest commit

History

Repository files navigation

srtgen

usage

features

dependencies

related

based on

postprocessing tools

similar tools

todo

About

Topics

Resources

License

Stars

Watchers

Forks

Languages