Open Seq2Seq Beam Search combined with Nemo Forced Aligner #5874

GreatHalf · 2023-01-27T13:41:33Z

GreatHalf
Jan 27, 2023

Hello,

the generation of timestamps has been asked multiple times and solved with external tools ,see parlance or MFA.

Now that NeMo contains a Forced Aligner, I would like to combine Seq2Seq Beam Search and the forced aligner. I asked this before and back then you had to fall back on offline_diar_with_asr_infer.py, avoiding Seg2Seq entirely.

One new solution would be to combine Seq2Seq with the Forced Aligner: run eval_beamsearch_ngram.py first, use the resulting text as base truth for the forced aligner align.py. This doubles the creation of probs, which seems redundant to me. The creation of probs is by far the most time intensive part of both tasks.

Looking at the new align.py I would like to do this:

Generate probs once, for example in this line
To be developed: Use the probs for a beam search as in eval_beamsearch_ngram.py, allowing you to use a kenlm_model_file, alpha and beta parameters. You can even use the same nemo CTC model. Save the resulting beam search transcript.
Use the beam search transcript as you would with align_using_pred_text.

Is this possible? Or are the probs tensors vastly different, e.g. by the size of the token alphabet?
I see how the code would fit together, but I might be missing a theoretical problem.

Cheers

GreatHalf · 2023-01-27T16:03:09Z

GreatHalf
Jan 27, 2023
Author

I solved it (albeit the other way round for now). You can reuse the data probs pickle from eval_beamsearch_ngram.py and skip the transcribe call in align.py entirely. Not much change is needed.

import pandas
…
hypotheses = pickle.load("pickleForProbs.pickle")
for hypothesis in hypotheses:

hypothesis = pandas.DataFrame(hypothesis)
hypothesis = torch.from_numpy(hypothesis.to_numpy())
log_probs_list_batch.append(hypothesis)
T_list_batch.append(hypothesis.shape[0])
pred_text_batch.append("irrelevant") # you could supply text from the *.tsv here. I suggest setting align_using_pred_text=FALSE and ignoring the pred_text_batch
…

You might face the problem that by default the pickle file is for the whole manifest, while the manifest_lines_batch is just a part of the manifest.

If there is interest I will add a pickle file parameter to align.py and contribute.

Cheers

2 replies

titu1994 Jan 27, 2023
Maintainer

@erastorgueva-nv to review

GreatHalf Apr 29, 2024
Author

PR is up #9056

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Open Seq2Seq Beam Search combined with Nemo Forced Aligner #5874

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment 2 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

Select a reply

Open Seq2Seq Beam Search combined with Nemo Forced Aligner #5874

GreatHalf Jan 27, 2023

Replies: 1 comment · 2 replies

GreatHalf Jan 27, 2023 Author

titu1994 Jan 27, 2023 Maintainer

GreatHalf Apr 29, 2024 Author

GreatHalf
Jan 27, 2023

Replies: 1 comment 2 replies

GreatHalf
Jan 27, 2023
Author

titu1994 Jan 27, 2023
Maintainer

GreatHalf Apr 29, 2024
Author