-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
More TTS architectures #29
base: main
Are you sure you want to change the base?
Conversation
@@ -5,8 +5,8 @@ services: | |||
build: . | |||
command: > | |||
bash -c "python setup.py develop && \ | |||
mkdir -p models/styletts2 && \ | |||
aws s3 sync s3://uberduck-models-us-west-2/prototype/styletts2 models/styletts2 && \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it looks like this branch is a bit out of date with main, can you run:
git checkout more-tts-archs
git pull --rebase origin main
<resolve any merge conflicts>
git push -f
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
aye aye captain
@@ -350,6 +366,7 @@ def _check_for_exceptions(response_task: Optional[asyncio.Task]) -> bool: | |||
print("response task was cancelled") | |||
except Exception as e: | |||
print("response task raised an exception:", e) | |||
print(traceback.format_exc(e)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
except for the bit where it raises an exception of its own somehow :p
speaker_id=0, | ||
) | ||
audio = b"".join(audio) | ||
audio = torch.frombuffer(audio, dtype=torch.int16).float() / 32767 # TODO silly |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should it be / 32768 ? (2^15)
not 32767?
not a big difference though
also whats # TODO silly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's silly because i'm undoing the conversion piper does (which uses 32767 btw)
) | ||
audio = b"".join(audio) | ||
audio = torch.frombuffer(audio, dtype=torch.int16).float() / 32767 # TODO silly | ||
audio = resample(audio, model.config.sample_rate, output_sample_rate) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we skip this step if the input and output sample rates are the same? (which I think it usually should be if they're both using 24000)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
torchaudio has an if-clause for that already https://pytorch.org/audio/stable/_modules/torchaudio/functional/functional.html#resample
XTTS-v2 via Coqui (works, but unstable)removed due to being a pain and not worth the effortstreaming with StyleTTSnot natively supported, will be separate pull request for sentence-based streamingstreaming with VITStechnically supported but sentence-based, see abovestreaming with XTTS (works, but not sure if it helps or if it's streaming)