Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible bug in ASRDecoderTimeStamps - math.ceil on fractional tokens_per_chunk leads to timestamps displacements on long files #11604

Open
bene-ges opened this issue Dec 15, 2024 · 0 comments
Labels
bug Something isn't working

Comments

@bene-ges
Copy link
Contributor

Describe the bug

I used the class ASRDecoderTimeStamps with fastconformer model and observed incorrect timestamps on long (1 hour) audio. The timestamps near the end of file were offset by several seconds, bigger than the actual filesize.
I think the problem is in this expression when the result of division is fractional:

tokens_per_chunk = math.ceil(self.chunk_len_in_sec / self.model_stride_in_secs)

For example, default values for fastconformer self.chunk_len_in_sec = 15 and self.model_stride_in_secs = 0.08 lead to fractional 187.5 being rounded to 188. It seems that the rounding error somehow accumulates on long audios.
When I set self.chunk_len_in_sec = 14, 14/0.08=175 (whole number), all timestamps are exact.

@bene-ges bene-ges added the bug Something isn't working label Dec 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant