Replies: 1 comment
-
NN-based pitch estimators do not calculate losses on Hz. Their losses are based on 2D probability graphs, where the pitch is represented by Gaussian-blurred bins, and bins are equidistant in log domain. For more details you can check out the CREPE paper: https://arxiv.org/abs/1802.06182 In DiffSinger, the pitch predictor processes pitch in MIDI domain. The acoustic model maps all f0 values to their mel frequencies (something similar to log domain) before sending them to NN. So there should be no problem with the sensitivity. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I was wondering when generating losses for the pitch estimator if there would be a benefit to using the "midi" number rather than Hz.
Depending on the note ranges being calculated the deviation from a "wanted" note in cents is non linear when using Hz.
As an example if a singer is trying to hit C3 and misses by 10% vs a singer attempting C6 and missing by 10%, the pitch loss calculated in Hz would be higher for the C6 singer. It seems sensible to do the pitch losses in the midi number (librosa.hz_to_midi)
I was also wondering if there would be a benefit to the model when estimating the f0 to do it in the more linear midi number domain as it would mean the model wouldn't have the complication of "inherently" learning the scaling between freq & notes, eg
freq =440⋅2(n−69)/12. Keeping all the note data in the midi number domain would mean it could learn linear relationships between different pitches more easily
Beta Was this translation helpful? Give feedback.
All reactions