Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect Number Pronunciation in Piper VITS with eSpeak-NG #747

Open
Muzaffar-x opened this issue Mar 12, 2025 · 0 comments
Open

Incorrect Number Pronunciation in Piper VITS with eSpeak-NG #747

Muzaffar-x opened this issue Mar 12, 2025 · 0 comments

Comments

@Muzaffar-x
Copy link

When training a model on Piper VITS, some eSpeak-NG functions do not work correctly. Specifically, numbers 10 and above are pronounced as separate digits (e.g., "10" is read as "one zero" instead of "ten").

Interestingly, when testing a similar model VITS by Coqui with the same dataset, this issue does not occur.

Expected Behavior:
Numbers should be pronounced correctly rather than as separate digits.

Steps to Reproduce:

  1. Train a model on Piper VITS using eSpeak-NG.
  2. Try synthesizing numbers 10 and above.
  3. Notice that instead of the expected pronunciation, numbers are read as separate digits.

Possible Causes:

Issues with text tokenization before being fed into the model.
Problems with eSpeak-NG integration in Piper VITS.
Missing number processing rules in the current configuration.

Question:

How can this issue be resolved? Are there any ways to explicitly configure eSpeak-NG in Piper VITS to ensure correct number pronunciation?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant