Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't get speech regions #145

Open
gabriellluz opened this issue Oct 11, 2020 · 3 comments
Open

Can't get speech regions #145

gabriellluz opened this issue Oct 11, 2020 · 3 comments

Comments

@gabriellluz
Copy link

Make sure you have read the readme, searched and read the issues related to yours. Otherwise it will be considered as a duplicate which will be closed immediately.

Describe the bug
I'm trying to transcribe a video while using pre-processing function

To Reproduce
Steps to reproduce the behavior:
I've open a terminal and typed -i "/media/teste.mp4" -ap y -S en-us

  • Command line arguments you are using.Use the following markdown code block syntax is recommended. Copy them into the place between ```.
-i "/media/teste.mp4" -ap y -S en-us
  • A complete copy of command line output of the autosub. You can use Ctrl-A and Ctrl-C to copy all of them.
Input args(without "autosub"): -i "/media/teste.mp4" -ap y -S en-us
/usr/bin/ffmpeg -hide_banner -i "/media/teste.mp4" -vn -af "asplit[a],aphasemeter=video=0,ametadata=select:key=lavfi.aphasemeter.phase:value=-0.005:function=less,pan=1c|c0=c0,aresample=async=1:first_pts=0,[a]amix" -ac 1 -f flac -loglevel error "/tmp/tmpanuinulr.flac"

Use ffprobe to check conversion result.
/usr/bin/ffprobe "/tmp/tmpanuinulr.flac" -show_format -pretty -loglevel quiet
[FORMAT]
filename=/tmp/tmpanuinulr.flac
nb_streams=1
nb_programs=0
format_name=flac
format_long_name=raw FLAC
start_time=0:00:00.000000
duration=0:00:00.116100
size=14.130859 Kibyte
bit_rate=997.071000 Kbit/s
probe_score=100
TAG:major_brand=isom
TAG:minor_version=512
TAG:compatible_brands=isomiso2avc1mp41
TAG:title=teste
TAG:artist=teste
TAG:date=2017
TAG:comment=teste
TAG:encoder=Lavf58.29.100
[/FORMAT]

/usr/bin/ffmpeg -hide_banner -i "/tmp/tmpanuinulr.flac" -af "lowpass=3000,highpass=200" -loglevel error "/tmp/tmpvr_t1lo_.flac"

Use ffprobe to check conversion result.
/usr/bin/ffprobe "/tmp/tmpvr_t1lo_.flac" -show_format -pretty -loglevel quiet
[FORMAT]
filename=/tmp/tmpvr_t1lo_.flac
nb_streams=1
nb_programs=0
format_name=flac
format_long_name=raw FLAC
start_time=0:00:00.000000
duration=0:00:00.116100
size=12.680664 Kibyte
bit_rate=894.745000 Kbit/s
probe_score=100
TAG:major_brand=isom
TAG:minor_version=512
TAG:compatible_brands=isomiso2avc1mp41
TAG:title=teste
TAG:artist=teste
TAG:date=2017
TAG:comment=teste
TAG:encoder=Lavf58.29.100
[/FORMAT]

/home/mestre/.pyenv/versions/3.8.5/bin/ffmpeg-normalize -v "/tmp/tmpvr_t1lo_.flac" -ar 44100 -ofmt flac -c:a flac -pr -p -o "/tmp/tmp19_y75h4.flac"
Stream 1/1: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [00:00<00:00, 1006.46it/s]
Second Pass: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [00:00<00:00, 915.39it/s]
File: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  4.73it/s]

Use ffprobe to check conversion result.
/usr/bin/ffprobe "/tmp/tmp19_y75h4.flac" -show_format -pretty -loglevel quiet
[FORMAT]
filename=/tmp/tmp19_y75h4.flac
nb_streams=1
nb_programs=0
format_name=flac
format_long_name=raw FLAC
start_time=0:00:00.000000
duration=0:00:00.116100
size=12.829102 Kibyte
bit_rate=905.219000 Kbit/s
probe_score=100
TAG:major_brand=isom
TAG:minor_version=512
TAG:compatible_brands=isomiso2avc1mp41
TAG:title=teste
TAG:artist=teste
TAG:date=2017
TAG:comment=teste
TAG:encoder=Lavf58.29.100
[/FORMAT]

Audio pre-processing complete.
Translation destination language not provided. Only performing speech recognition.
Override "-of"/"--output-files" due to your args too few.
Output source subtitles file only.

Convert source file to "/tmp/tmpd8ys3vqq.wav" to detect audio regions.
/usr/bin/ffmpeg -hide_banner -y -i "/tmp/tmp19_y75h4.flac" -vn -ac 1 -ar 48000 -loglevel error "/tmp/tmpd8ys3vqq.wav"

Use ffprobe to check conversion result.
/usr/bin/ffprobe "/tmp/tmpd8ys3vqq.wav" -show_format -pretty -loglevel quiet
[FORMAT]
filename=/tmp/tmpd8ys3vqq.wav
nb_streams=1
nb_programs=0
format_name=wav
format_long_name=WAV / WAVE (Waveform Audio)
start_time=N/A
duration=0:00:00.116104
size=11.082031 Kibyte
bit_rate=781.919000 Kbit/s
probe_score=99
TAG:artist=teste
TAG:comment=teste
TAG:date=2017
TAG:title=teste
TAG:encoder=Lavf58.29.100
[/FORMAT]

Conversion completed.
Use Auditok to detect speech regions.
Auditok detection completed.
"/tmp/tmpd8ys3vqq.wav" has been deleted.
Error: Can't get speech regions.
Press Enter to exit...

No custom config used.

Environment (please complete the following information):

  • OS: Ubuntu 20.04
  • Python Version: python 3.8.5
  • Autosub Version: latest dev autosub==0.5.7a0
@BingLingGroup
Copy link
Owner

BingLingGroup commented Oct 11, 2020

Check the volume of your audio file to make sure it's mostly above -20dB or use -k option to keep all the intermediate files and review them.

@gabriellluz
Copy link
Author

gabriellluz commented Oct 11, 2020

But if I don't use the preprocessing option it works. The crash only happens when I use preprocessing. The volume is ok to me.

Installing auditok from their git repo kinda solved the issue.

pip install git+https://github.com/amsehili/auditok
Now I get a different message when typing the same command line:

Conversion completed.

Use Auditok to detect speech regions.
Traceback (most recent call last):
  File "/home/mestre/.pyenv/versions/3.8.5/lib/python3.8/site-packages/auditok/util.py", line 1007, in __getattr__
    return getattr(self._audio_source, name)
  File "/home/mestre/.pyenv/versions/3.8.5/lib/python3.8/site-packages/auditok/util.py", line 856, in __getattr__
    return getattr(self._audio_source, name)
  File "/home/mestre/.pyenv/versions/3.8.5/lib/python3.8/site-packages/auditok/util.py", line 736, in __getattr__
    return getattr(self._audio_source, name)
AttributeError: 'BufferAudioSource' object has no attribute 'get_sample_width'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/mestre/.pyenv/versions/3.8.5/bin/autosub", line 33, in <module>
    sys.exit(load_entry_point('autosub==0.5.7a0', 'console_scripts', 'autosub')())
  File "/home/mestre/.pyenv/versions/3.8.5/lib/python3.8/site-packages/autosub/__init__.py", line 159, in main
    cmdline_utils.audio_or_video_prcs(args,
  File "/home/mestre/.pyenv/versions/3.8.5/lib/python3.8/site-packages/autosub/cmdline_utils.py", line 1357, in audio_or_video_prcs
    regions = auditok_utils.auditok_gen_speech_regions(
  File "/home/mestre/.pyenv/versions/3.8.5/lib/python3.8/site-packages/autosub/auditok_utils.py", line 31, in auditok_gen_speech_regions
    sample_width=asource.get_sample_width(),
  File "/home/mestre/.pyenv/versions/3.8.5/lib/python3.8/site-packages/auditok/util.py", line 1009, in __getattr__
    raise AttributeError(
AttributeError: 'AudioReader' has no attribute 'get_sample_width'

@BingLingGroup
Copy link
Owner

Alright. The new error attributes to this #137 (comment). I will change the https://github.com/BingLingGroup/autosub/blob/dev/setup.py to make sure the user won't install the incompatible version of Auditok.

About the preprocessing opitons, I get them from this script from this issue agermanidis#40 . And I also mentioned the source or the function of these commands here https://github.com/BingLingGroup/autosub#input .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants