-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
unclear librispeech data prepare scripts for owsm_v1/s2t1 #5686
Comments
@pyf98, can you answer it? |
Hi, thanks for the question! For LibriSpeech, I do not use the standard segmented version. Instead, I used the "original-mp3". I believe this is released along with the segmented version. You might need to check the original source of the LibriSpeech distribution. Here are some paragraphs of the
|
thanks |
There are several lines of codes unclear in espnet/egs2/owsm_v1/s2t1/local/prepare_librispeech.py
Is there a more accurate script to prepare librispeech for the owsm_v1 training?
FileNotFoundError: [Errno 2] No such file or directory: '/espnet/egs2/librispeech/asr1/downloads/mp3/1272/135031/1272-135031.sents.seg.txt'
preparing librispeech failed
e.g.,:
'''
for chapter in (data_dir / "mp3" / speaker).iterdir():
if chapter.is_dir():
utts = []
audio = str((chapter / f"{chapter.name}.mp3").resolve())
with open(
chapter / f"{speaker}-{chapter.name}.sents.seg.txt", "r"
) as seg_f, open(
chapter / f"{speaker}-{chapter.name}.sents.trans.txt", "r"
'''
The text was updated successfully, but these errors were encountered: