You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I used the class ASRDecoderTimeStamps with fastconformer model and observed incorrect timestamps on long (1 hour) audio. The timestamps near the end of file were offset by several seconds, bigger than the actual filesize.
I think the problem is in this expression when the result of division is fractional:
For example, default values for fastconformer self.chunk_len_in_sec = 15 and self.model_stride_in_secs = 0.08 lead to fractional 187.5 being rounded to 188. It seems that the rounding error somehow accumulates on long audios.
When I set self.chunk_len_in_sec = 14, 14/0.08=175 (whole number), all timestamps are exact.
The text was updated successfully, but these errors were encountered:
Describe the bug
I used the class
ASRDecoderTimeStamps
with fastconformer model and observed incorrect timestamps on long (1 hour) audio. The timestamps near the end of file were offset by several seconds, bigger than the actual filesize.I think the problem is in this expression when the result of division is fractional:
NeMo/nemo/collections/asr/parts/utils/decoder_timestamps_utils.py
Line 633 in 186a05e
For example, default values for fastconformer
self.chunk_len_in_sec = 15
andself.model_stride_in_secs = 0.08
lead to fractional 187.5 being rounded to 188. It seems that the rounding error somehow accumulates on long audios.When I set
self.chunk_len_in_sec = 14
,14/0.08=175
(whole number), all timestamps are exact.The text was updated successfully, but these errors were encountered: