Clip duration in processing and sample duration in training #1022
Unanswered
riffelmaria
asked this question in
Q&A
Replies: 1 comment
-
Hi @riffelmaria, the clip duration should always be the same for training and inference. Apologies if the documentation is confusing in this regard. The reason that they need to be the same length, is that the spectrogram in which the computer vision model tries to recognize a vocalization is created from an amount of audio equal to the clip duration. If a shorter clip duration is used during inference, the calls will be distorted in the horizontal dimension relative to the images that the model was trained to recognize. Please continue the thread if you have further questions or need clarification. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Dear opensoundscape community,
I have a question about clip duration and sample duration during preprocessing and training in opensoundscape v0.10.0.
I trained a few ml models with the opensoundscape package and only afterwards I realized, that I may have made a mistake.
In an earlier version of the documentation, it was recommended to use 2 to 3 times the length of a call (in my case small terrestrial mammals) for the sample duration.
I may have got caught up in the choice of words. Anyways, I selected a clip duration of 0.4s in preprocessing and a sample duration of 1.2s in model training. In any case, the 0.4s clips are aggregated to 1.2s samples for training.
But how are the annotations processed?
Is this even an acceptable procedure?
Could this lead to problems during training?
Looking forward to your answers!
Beta Was this translation helpful? Give feedback.
All reactions