Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updates MFA scripts and pipeline #5670

Open
wants to merge 38 commits into
base: master
Choose a base branch
from

Conversation

iamanigeeit
Copy link
Contributor

I have updated the following to make MFA work and clean up MFA scripts.

  1. phoneme_tokenizer.py: added MFA G2P for all MFA languages. Added Espeak G2P including word separator, as training MFA with custom dictionary requires word splitting.
  2. mfa_cleaners.py: text cleaner added for MFA English
  3. cleaner.py includes MFA English and edited to put all cleaning functions in TextCleaner.init() instead of call()
  4. mfa_format.py: cleaned up help and comments. Allows both json and TextGrid formats (Praat only reads .TextGrid). make_dictionary allows multiple pronunciations per word.
  5. install_mfa.sh is updated to the latest working version of MFA. Disabled pip installation as it is not recommended.
  6. mfa.sh: cleaned up help and comments. Updated for new MFA syntax as some args were wrong. Added options for textgrid format and single speaker.
  7. local/run_mfa.sh and run_mfa.sh are updated to reflect above changes
  8. copy_data_dir.sh now has durations file

The simplified process to train TTS with MFA is now:

# cd to egs2/*/tts1/
./local/run_mfa.sh [MFA_OPTIONS]
./run_mfa.sh [OPTIONS]

To train with MFA, the new process is
./local/run_mfa.sh [MFA_OPTIONS]
./run_mfa.sh [OPTIONS]

1) phoneme_tokenizer.py: added MFA G2P for all MFA language. Added Espeak G2P including word separator, as training MFA with custom dictionary requires word splitting.
2) mfa_cleaners.py: text cleaner added for MFA English
3) cleaner.py includes MFA English and edited to put all cleaning functions in TextCleaner.__init__() instead of __call__()
4) mfa_format.py: cleaned up help and comments. Allows both json and TextGrid formats for checking in Praat. make_dictionary allows multiple pronunciations per word.
5) install_mfa.sh is updated to the latest working version of MFA. Disabled pip installation as it is not recommended.
6) mfa.sh: cleaned up help and comments. Updated for new MFA syntax (some args were wrong). Added options for textgrid format and single speaker.
7) local/run_mfa.sh and run_mfa.sh are updated to reflect above changes
8) copy_data_dir.sh now has durations file
@sw005320
Copy link
Contributor

Thanks for your PR, @iamanigeeit!
Can you fix the CI error?
https://github.com/espnet/espnet/actions/runs/7986631083/job/21807429857?pr=5670

@sw005320
Copy link
Contributor

@Fhrozen, can you review this PR?

@sw005320 sw005320 added this to the v.202405 milestone Feb 21, 2024
@sw005320 sw005320 added the TTS Text-to-speech label Feb 21, 2024
@Fhrozen
Copy link
Member

Fhrozen commented Feb 21, 2024

@sw005320 Sure, I will be checking it

@iamanigeeit
Copy link
Contributor Author

iamanigeeit commented Feb 22, 2024

@sw005320 @Fhrozen

For test python, I can modify the files to fit the 80 char limit per line. I was using 120 chars as the limit because it looks like most ESPnet files break 80 chars anyway (should we increase it to 120?).

For check_kaldi_symlinks, does it mean that copy_data_dir.sh must be identical between egs2/TEMPLATE/asr1/utils and tools/kaldi/egs/wsj/s5 ? Should I send a PR for kaldi to make them match?

@sw005320
Copy link
Contributor

For check_kaldi_symlinks, does it mean that copy_data_dir.sh must be identical between egs2/TEMPLATE/asr1/utils and tools/kaldi/egs/wsj/s5 ? Should I send a PR for kaldi to make them match?

Yes, it should be identical, but if you want to customize it for your own purpose, maybe you can put it in other places or rename it.

@iamanigeeit
Copy link
Contributor Author

For check_kaldi_symlinks, does it mean that copy_data_dir.sh must be identical between egs2/TEMPLATE/asr1/utils and tools/kaldi/egs/wsj/s5 ? Should I send a PR for kaldi to make them match?

Yes, it should be identical, but if you want to customize it for your own purpose, maybe you can put it in other places or rename it.

Hmm, in that case I propose creating a new file copy_tts_data_dir.sh and modifying tts.sh to use it. Is that better?

Copy link

codecov bot commented Feb 22, 2024

Codecov Report

Attention: Patch coverage is 35.93750% with 82 lines in your changes are missing coverage. Please review.

Project coverage is 35.81%. Comparing base (fa822d5) to head (d5d9cf5).

Files Patch % Lines
espnet2/text/phoneme_tokenizer.py 13.33% 78 Missing ⚠️
espnet2/text/cleaner.py 71.42% 4 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           master    #5670       +/-   ##
===========================================
- Coverage   72.21%   35.81%   -36.40%     
===========================================
  Files         760      759        -1     
  Lines       69840    69925       +85     
===========================================
- Hits        50435    25046    -25389     
- Misses      19405    44879    +25474     
Flag Coverage Δ
test_configuration_espnet2 ?
test_integration_espnet1 62.92% <ø> (ø)
test_integration_espnetez ?
test_python_espnet1 18.27% <26.56%> (+0.06%) ⬆️
test_python_espnet2 ?
test_python_espnetez 13.96% <16.40%> (+<0.01%) ⬆️
test_utils 20.91% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@iamanigeeit
Copy link
Contributor Author

It looks like the entire directory egs2/TEMPLATE/asr1/utils must be identical to tools/kaldi/egs/wsj/s5. In that case, i don't know where to add copy_tts_data_dir.sh. Can i just make the PR for Kaldi? It is a very simple change to add durations.

As for the other failing checks, i think it is not related to this PR.

@sw005320
Copy link
Contributor

It looks like the entire directory egs2/TEMPLATE/asr1/utils must be identical to tools/kaldi/egs/wsj/s5. In that case, i don't know where to add copy_tts_data_dir.sh. Can i just make the PR for Kaldi? It is a very simple change to add durations.

As for the other failing checks, i think it is not related to this PR.

You can just move it from egs2/TEMPLATE/asr1/utils/copy_tts_data_dir.sh to egs2/TEMPLATE/asr1/scripts/utils/copy_tts_data_dir.sh

Copy link
Member

@Fhrozen Fhrozen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, only couple of concerns, that may not affect the current code.

echo "Usage: $0 true/false"
exit 1;
fi
# This is the last stable version. MFA 3.0 depends on an unstable kaldi version that creates errors
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to confirm. Did you test the installation and execution with conda? Because, I remember it was generating an issue w.r.t the PostgreSQL or another kind of application. That is the reason for using pip installation.
But, if it is already fixed, the ok

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I started from a fresh install.

I spent hours making it work with pip and ended up with all the commented code.

@@ -0,0 +1,50 @@
#!/usr/bin/env bash
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it necessary this file? Cannot be added as an example in the documentation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might know better where to put it. I wanted to combine it with run.sh, but parse_options.sh is very strict, and the MFA options are different from the tts.sh options. So i thought having a separate file is better because defaults are different from run.sh. If someone wants to use MFA, they can run it directly.

@Fhrozen
Copy link
Member

Fhrozen commented Feb 23, 2024

@iamanigeeit , If possible, add some test for the espnet2/text/mfa_cleaners.py and the new lines you are adding at espnet2/text/cleaner.py

@iamanigeeit
Copy link
Contributor Author

@Fhrozen I have added test_cleaners.py. However, i don't know enough Japanese, Korean or Vietnamese to write tests for those.

Is there a way to pause or stop the CI checks before everything is finalized? I feel like i am wasting ESPnet project funds on minor changes...

@Fhrozen
Copy link
Member

Fhrozen commented Feb 26, 2024

You can use [skip ci] at the beginning of the commit message to avoid ci test.
Also, you can run the test script on your local environment using ./ci/test_*.sh from the root of your repo.
If it is necessary, let me know to share a docker image file for testing it.

@sw005320
Copy link
Contributor

Should we still support Python 3.7? If so, i can amend the test script.

Yeah, but it does not go through the CI test anyway due to pytorch<1.7.0
So, as a conclusion, we can skip the test...
We can revisit it once we safely update whisper

@mergify mergify bot added the CI Travis, Circle CI, etc label Feb 29, 2024
@iamanigeeit
Copy link
Contributor Author

iamanigeeit commented Mar 3, 2024

Conda installation problems were caused by installing ffmpeg from conda-forge (see installers/install_ffmpeg_conda.sh for details). I have updated the Makefile and ffmpeg installation to solve it. The root cause is conda-forge/ocl-icd-feedstock#29 and conda-forge/libarchive-feedstock#69

@iamanigeeit
Copy link
Contributor Author

Hi @sw005320 @Fhrozen -- would it be ok to merge now? The only CI checks that fail are
test_utils/test_compute-cmvn-stats_py.bats
test_utils/test_copy-feats_py.bats

Both are not related to this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CI Travis, Circle CI, etc ESPnet2 Installation TTS Text-to-speech
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants