Possible bug in ASRDecoderTimeStamps - math.ceil on fractional tokens_per_chunk leads to timestamps displacements on long files #11604

bene-ges · 2024-12-15T17:12:11Z

Describe the bug

I used the class ASRDecoderTimeStamps with fastconformer model and observed incorrect timestamps on long (1 hour) audio. The timestamps near the end of file were offset by several seconds, bigger than the actual filesize.
I think the problem is in this expression when the result of division is fractional:

NeMo/nemo/collections/asr/parts/utils/decoder_timestamps_utils.py

Line 633 in 186a05e

tokens_per_chunk = math.ceil(self.chunk_len_in_sec / self.model_stride_in_secs)

For example, default values for fastconformer self.chunk_len_in_sec = 15 and self.model_stride_in_secs = 0.08 lead to fractional 187.5 being rounded to 188. It seems that the rounding error somehow accumulates on long audios.
When I set self.chunk_len_in_sec = 14, 14/0.08=175 (whole number), all timestamps are exact.

The text was updated successfully, but these errors were encountered:

bene-ges added the bug Something isn't working label Dec 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Possible bug in ASRDecoderTimeStamps - math.ceil on fractional tokens_per_chunk leads to timestamps displacements on long files #11604

Possible bug in ASRDecoderTimeStamps - math.ceil on fractional tokens_per_chunk leads to timestamps displacements on long files #11604

bene-ges commented Dec 15, 2024

Possible bug in ASRDecoderTimeStamps - math.ceil on fractional tokens_per_chunk leads to timestamps displacements on long files #11604

Possible bug in ASRDecoderTimeStamps - math.ceil on fractional tokens_per_chunk leads to timestamps displacements on long files #11604

Comments

bene-ges commented Dec 15, 2024