Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More TTS architectures #29

Open
wants to merge 7 commits into
base: main
Choose a base branch
from
Open

More TTS architectures #29

wants to merge 7 commits into from

Conversation

Sobsz
Copy link
Contributor

@Sobsz Sobsz commented Mar 1, 2024

  • VITS via Piper (works)
  • XTTS-v2 via Coqui (works, but unstable) removed due to being a pain and not worth the effort

  • streaming with StyleTTS not natively supported, will be separate pull request for sentence-based streaming
  • streaming with VITS technically supported but sentence-based, see above
  • streaming with XTTS (works, but not sure if it helps or if it's streaming)

@Sobsz Sobsz marked this pull request as ready for review March 14, 2024 23:54
@@ -5,8 +5,8 @@ services:
build: .
command: >
bash -c "python setup.py develop && \
mkdir -p models/styletts2 && \
aws s3 sync s3://uberduck-models-us-west-2/prototype/styletts2 models/styletts2 && \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it looks like this branch is a bit out of date with main, can you run:

git checkout more-tts-archs
git pull --rebase origin main
<resolve any merge conflicts>
git push -f

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

aye aye captain

@@ -350,6 +366,7 @@ def _check_for_exceptions(response_task: Optional[asyncio.Task]) -> bool:
print("response task was cancelled")
except Exception as e:
print("response task raised an exception:", e)
print(traceback.format_exc(e))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

except for the bit where it raises an exception of its own somehow :p

speaker_id=0,
)
audio = b"".join(audio)
audio = torch.frombuffer(audio, dtype=torch.int16).float() / 32767 # TODO silly
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should it be / 32768 ? (2^15)
not 32767?

not a big difference though

also whats # TODO silly?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's silly because i'm undoing the conversion piper does (which uses 32767 btw)

)
audio = b"".join(audio)
audio = torch.frombuffer(audio, dtype=torch.int16).float() / 32767 # TODO silly
audio = resample(audio, model.config.sample_rate, output_sample_rate)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we skip this step if the input and output sample rates are the same? (which I think it usually should be if they're both using 24000)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants