Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue running the Tortoise TTS collab #2

Open
ghost opened this issue Aug 22, 2023 · 3 comments
Open

Issue running the Tortoise TTS collab #2

ghost opened this issue Aug 22, 2023 · 3 comments

Comments

@ghost
Copy link

ghost commented Aug 22, 2023

I'm getting the following error while trying to run the generation:

IndexError Traceback (most recent call last)
in <cell line: 64>()
88 bytes_collected = 0
89 for voice_file in voice_files:
---> 90 voice_file = remove_silence(voice_file, window_size=2, threshold=0.1, save_as=dir_tmp_processed+path_leaf(voice_file))
91 file_duration = get_audio_duration(voice_file)
92 slice_file = dir_tmp_slices+path_leaf(voice_file)

2 frames
in clip_audio(audio_data, start, duration, sr)
94 xstart = librosa.time_to_samples(start, sr=sr)
95 xduration = librosa.time_to_samples(start+duration, sr=sr)
---> 96 audio_data = audio_data[:, xstart:xduration]
97 return audio_data
98

IndexError: too many indices for array: array is 1-dimensional, but 2 were indexed

@olaviinha
Copy link
Owner

Hmm is your voice_audio mono and not stereo?

@ghost
Copy link
Author

ghost commented Aug 28, 2023

Ok, so I've resampled all my audio to 48khz, now I'm getting a different error. Is there a maximum number of files you can use at a time?
image

@olaviinha
Copy link
Owner

olaviinha commented Aug 28, 2023

Notebook has been updated. Looks like I've failed to update it after some previous fixes. Please refresh and let me know if you are still experiencing issues.

Sample rate shouldn't matter, as the notebook will in any case re-encode it to 22050 hz. Tortoise TTS outputs 24 kHz audio.

Also: If I'm reading that screenshot correctly, your audio is about 20 seconds. As instructed in the notebook, about 1 minute of audio is required. Make sure you have 1 min audio.


If you want higher sample rate, feel free to try Sloppy Upsampler notebook. I have no idea if it makes speech better tho.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant