Error: Adding speaker on Win 11 #72

tin2tin · 2024-02-03T10:09:12Z

Trying to add speaker to the test file throws an error on Windows 11.

from whisperspeech.pipeline import Pipeline

pipe = Pipeline(s2a_ref='collabora/whisperspeech:s2a-q4-small-en+pl.model')
# pipe = Pipeline(s2a_ref='collabora/whisperspeech:s2a-q4-tiny-en+pl.model')
# pipe = Pipeline(s2a_ref='collabora/whisperspeech:s2a-q4-base-en+pl.model')

audio_tensor = pipe.generate("""
 This is some sample text.  You would add text here that you want spoken and then only leave one of the above lines ununcommented for the model you want to test.  Note that this script does not rely on the standard method within the whisperspeech pipeline.  Rather, it replaces a part of the functionality with reliance on pydub instead.  This approach "just worked."  Feel free to modify or distribute at your pleasure.
""", speaker='https://upload.wikimedia.org/wikipedia/commons/7/75/Winston_Churchill_-_Be_Ye_Men_of_Valour.ogg', lang='en', cps=14)

# generate uses CUDA if available; therefore, it's necessary to move to CPU before converting to NumPy array
audio_np = (audio_tensor.cpu().numpy() * 32767).astype(np.int16)

if len(audio_np.shape) == 1:
    audio_np = np.expand_dims(audio_np, axis=0)
else:
    audio_np = audio_np.T

print("Array shape:", audio_np.shape)
print("Array dtype:", audio_np.dtype)

try:
    audio_segment = AudioSegment(
        audio_np.tobytes(), 
        frame_rate=24000, 
        sample_width=2, 
        channels=1
    )
    audio_segment.export('output_audio.wav', format='wav')
    print("Audio file generated: output_audio.wav")
except Exception as e:
    print(f"Error writing audio file: {e}")

And this is the error:

Traceback (most recent call last):
  File "...untitled_37.blend\Text.001", line 9, in <module>
  File "...\python\Lib\site-packages\whisperspeech\pipeline.py", line 87, in generate
    return self.vocoder.decode(self.generate_atoks(text, speaker, lang=lang, cps=cps, step_callback=step_callback))
                               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "...\python\Lib\site-packages\whisperspeech\pipeline.py", line 80, in generate_atoks
    elif isinstance(speaker, (str, Path)): speaker = self.extract_spk_emb(speaker)
                                                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "...\python\Lib\site-packages\whisperspeech\pipeline.py", line 73, in extract_spk_emb
    samples, sr = torchaudio.load(fname)
                  ^^^^^^^^^^^^^^^^^^^^^^
  File "...\python\Lib\site-packages\torchaudio\_backend\utils.py", line 205, in load
    return backend.load(uri, frame_offset, num_frames, normalize, channels_first, format, buffer_size)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "...\python\Lib\site-packages\torchaudio\_backend\soundfile.py", line 27, in load
    return soundfile_backend.load(uri, frame_offset, num_frames, normalize, channels_first, format)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "...\python\Lib\site-packages\torchaudio\_backend\soundfile_backend.py", line 221, in load
    with soundfile.SoundFile(filepath, "r") as file_:
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "...\python\Lib\site-packages\soundfile.py", line 740, in __init__
    self._file = self._open(file, mode_int, closefd)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "...\python\Lib\site-packages\soundfile.py", line 1264, in _open
    _error_check(_snd.sf_error(file_ptr),
  File "...\python\Lib\site-packages\soundfile.py", line 1455, in _error_check
    raise RuntimeError(prefix + _ffi.string(err_str).decode('utf-8', 'replace'))
RuntimeError: Error opening 'https://upload.wikimedia.org/wikipedia/commons/7/75/Winston_Churchill_-_Be_Ye_Men_of_Valour.ogg': System error.
Error: Python: Traceback (most recent call last):
  File "...untitled_37.blend\Text.001", line 9, in <module>
  File "...\python\Lib\site-packages\whisperspeech\pipeline.py", line 87, in generate
    return self.vocoder.decode(self.generate_atoks(text, speaker, lang=lang, cps=cps, step_callback=step_callback))
                               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "...\python\Lib\site-packages\whisperspeech\pipeline.py", line 80, in generate_atoks
    elif isinstance(speaker, (str, Path)): speaker = self.extract_spk_emb(speaker)
                                                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "...\python\Lib\site-packages\whisperspeech\pipeline.py", line 73, in extract_spk_emb
    samples, sr = torchaudio.load(fname)
                  ^^^^^^^^^^^^^^^^^^^^^^
  File "...\python\Lib\site-packages\torchaudio\_backend\utils.py", line 205, in load
    return backend.load(uri, frame_offset, num_frames, normalize, channels_first, format, buffer_size)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "...\python\Lib\site-packages\torchaudio\_backend\soundfile.py", line 27, in load
    return soundfile_backend.load(uri, frame_offset, num_frames, normalize, channels_first, format)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "...\python\Lib\site-packages\torchaudio\_backend\soundfile_backend.py", line 221, in load
    with soundfile.SoundFile(filepath, "r") as file_:
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "...\python\Lib\site-packages\soundfile.py", line 740, in __init__
    self._file = self._open(file, mode_int, closefd)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "...\python\Lib\site-packages\soundfile.py", line 1264, in _open
    _error_check(_snd.sf_error(file_ptr),
  File "...\python\Lib\site-packages\soundfile.py", line 1455, in _error_check
    raise RuntimeError(prefix + _ffi.string(err_str).decode('utf-8', 'replace'))
RuntimeError: Error opening 'https://upload.wikimedia.org/wikipedia/commons/7/75/Winston_Churchill_-_Be_Ye_Men_of_Valour.ogg': System error.

The text was updated successfully, but these errors were encountered:

tin2tin · 2024-02-03T10:21:19Z

Is it the .ogg format it can't read?

BBC-Esq · 2024-02-03T11:49:24Z

Same type of error that I received on windows 10 if I recall.

tin2tin · 2024-02-03T12:14:03Z

If I change it to a wav file, it works but that wall of messages is still there.

BBC-Esq · 2024-02-03T17:41:54Z

If I change it to a wav file, it works but that wall of messages is still there.

Have you verified that the ultimate audio file does in fact sound like the speaker that you tried to specify?

tin2tin · 2024-02-03T20:09:09Z

Yes, it does work. Here is first Orson Wells and then Werner Herzog:

9739-10034.mp4

BBC-Esq · 2024-02-04T00:35:53Z

Cool, can you please post the script you used again? I still didn't get it working for some strange reason.

tin2tin · 2024-02-04T05:50:32Z

from whisperspeech.pipeline import Pipeline

pipe = Pipeline(s2a_ref='collabora/whisperspeech:s2a-q4-small-en+pl.model')
pipe.generate_to_file(filename, prompt, speaker="path to local wav file", lang='en', cps=14)

jpc · 2024-02-13T11:16:41Z

I have no opinion on any particular library for sound files but I think:

We want this to work out of the box on Windows and Linux/Mac
We probably want to support loading arbitrary mp3/ogg file for the reference speech
Loading directly from URLs is also nice for example code, maybe we can implement a simple wrapper ourselves on top of requests if torchaudio is not reliable on other platforms?
It would be nice to have one single dependency for loading and saving files for inference (training can use a lot more stuff since it does not put a burden on every user)

Do we all agree that this would be a good list of requirements?

BBC-Esq · 2024-02-13T13:55:41Z

I have no opinion on any particular library for sound files but I think:

1. We want this to work out of the box on Windows and Linux/Mac

2. We probably want to support loading arbitrary mp3/ogg file for the reference speech

3. Loading directly from URLs is also nice for example code, maybe we can implement a simple wrapper ourselves on top of `requests` if `torchaudio` is not reliable on other platforms?

4. It would be nice to have one single dependency for loading and saving files for inference (training can use a lot more stuff since it does not put a burden on every user)

Do we all agree that this would be a good list of requirements?

My only thoughts are regarding numbers three and four above regarding "torchaudio".

I originally was the one getting errors when using torchaudio on Windows that I think is the impetus for your message about possibly switching out torchaudio. To bring you up to speed...eventually I WAS able to resolve the torchaudio-related error we discussed. This required using pip install sounddevice. It's my understanding that torchaudio uses either sounddevice or "iosox" (not sure how to spell) under the hood...and for some reason I had to pip install sounddevice.

This fully resolved the error regarding torchaudio. Thus, as long as it's clear that windows users (and possibly other platforms, I don't know?) might have to specifically install sounddevice, we shouldn't need to abandon torchaudio....that is, unless other users encounter different problems.

Overall, I was NOT able to get a good tensor of a speaker's voice despite using very high quality audio, and I haven't re-tried since then. However, that's a separate issue and might involve my code...but as far as torchaudio and windows compatibility, hope this clarifies.

jpc · 2024-02-18T12:10:25Z

Hey, thanks for explaining this. So it would look like we should check if we can add soundfile to our dependencies to make sure it gets installed on Windows.

BBC-Esq · 2024-02-18T13:10:18Z

I would recommend adding it for Linux and Macos users as well. Apparently @signalprime encountered an error stating that torchaudio backend wasn't available, which was solved by pip installing soundfile. I believe his pull request might already do this.

My comment above was half misinformed...I think I confused "soundfile" and "sounddevice" because the names are similar. Thus, to clarify my comment...I had to separately pip install soundfile, not sounddevice to get the script to work.

As long as soundfile works on windows, mac, and linux, which it looks like it does, I'd recommend setting the default backend to soundfile instead of letting torchaudio automatically select it:

https://pytorch.org/audio/2.0.0/backend.html

My comments in @signalprime's pull request discuss this, but basically, soundfile is supposed to work on all three platforms whereas sox_io is only linux/mac. I researched the issue and this is a renmant of early pytorch versions. sox_io is located on sourceforge somewhere I believe...and then soundfile came around and was a hug improvement...But for some reason sox_io is still an option. Soundfile is updated constantly and recently, whereas I can't find any information on sox_io...even on sourceforge. Furthermore, torchaudio's github states confusingly that the "optional" dependencies are kaldi and soundfile...but not sox_io and soundfile like on their website's instructions...

https://github.com/pytorch/audio/blob/main/requirements.txt

Anyways, basically, to avoid my and @signalprime's errors, I'd recommend explicitly setting soundfile as the default torchaudio backend somewhere in the sourcecode.

If you agree...here are the installation instructions:

https://python-soundfile.readthedocs.io/en/latest/

Notice that they're somewhat different for Linux...so you may need to add an additional dependency for linux users named libsndfile:

BBC-Esq · 2024-02-19T02:02:30Z

Finally found the website for sox...apparently it hasn't been updated since 2015 so I'm not sure why in the world Pytorch would use it. I say set the default to SoundDevice... :-)

https://sourceforge.net/projects/sox/files/sox/

jpc · 2024-02-19T20:54:49Z

Hey, nice job finding all that info. I was not aware of the history behind the torchaudio backends!

I agree with your conclusion - I’ll test a fresh Linux install and if you can confirm that torchaudio with soundfile works well on Windows then we can use this exclusively.

I also implemented a workaround to support downloading the file from a URL, even without Sox.

BBC-Esq mentioned this issue Feb 3, 2024

need a simple python script to run WhisperSpeech locally to compare to bark #67

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error: Adding speaker on Win 11 #72

Error: Adding speaker on Win 11 #72

tin2tin commented Feb 3, 2024

tin2tin commented Feb 3, 2024

BBC-Esq commented Feb 3, 2024

tin2tin commented Feb 3, 2024

BBC-Esq commented Feb 3, 2024

tin2tin commented Feb 3, 2024

BBC-Esq commented Feb 4, 2024

tin2tin commented Feb 4, 2024

jpc commented Feb 13, 2024

BBC-Esq commented Feb 13, 2024 •

edited

jpc commented Feb 18, 2024

BBC-Esq commented Feb 18, 2024 •

edited

BBC-Esq commented Feb 19, 2024

jpc commented Feb 19, 2024

Error: Adding speaker on Win 11 #72

Error: Adding speaker on Win 11 #72

Comments

tin2tin commented Feb 3, 2024

tin2tin commented Feb 3, 2024

BBC-Esq commented Feb 3, 2024

tin2tin commented Feb 3, 2024

BBC-Esq commented Feb 3, 2024

tin2tin commented Feb 3, 2024

BBC-Esq commented Feb 4, 2024

tin2tin commented Feb 4, 2024

jpc commented Feb 13, 2024

BBC-Esq commented Feb 13, 2024 • edited

jpc commented Feb 18, 2024

BBC-Esq commented Feb 18, 2024 • edited

BBC-Esq commented Feb 19, 2024

jpc commented Feb 19, 2024

BBC-Esq commented Feb 13, 2024 •

edited

BBC-Esq commented Feb 18, 2024 •

edited