-
Notifications
You must be signed in to change notification settings - Fork 277
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Problems in concatenate_dataset #129
Comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
In
concatenate_dataset()
:distil-whisper/training/run_pseudo_labelling.py
Lines 644 to 671 in 66ac8dd
From my understanding, the logic in the for loop is
audio_sample
exceeds 30sspeaker
is different from previous (prev_speaker
)audio_sample
), excluding the current utterance.Since the concatenated sample does not contain the current utterance, we have:
previous_speaker
rather thanspeaker
condition_on_prev
signifies continuity at the start of current utterance, so this should be shifted to the right by 1 (e.g. initialize ascondition_on_prev = [0]
)Meanwhile, it seems that the very last accumulated sample in each batch did not get appended, i.e. when the for loop exits, there will be a
(audio_sample, text_sample)
pair that is <= 30s which should've been appended but didn't.These may not seem significant, but when finetuning on custom dataset with diverse speakers, and condition_on_prev is expected to be true alot, it will cause wrongful training signals.
The text was updated successfully, but these errors were encountered: