-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Problematic spectrograms #6
Comments
Dear Deivison, could you possibly repeat this experiment with one of the publicly available datasets, to ensure we have the same audio? If you obtain the Harmonix dataset via the YouTube URLs, it's possible we end up with different files (at the very least because YouTube provides different formats to download). So please pick one of the datasets that has a source linked on the description at Zenodo. Then let us know the file you picked and again share if and how the spectrograms differ so we can look into it. It's possible we need to pin the torchaudio and/or ffmpeg version to ensure perfect reproduction. |
Ok, to be sure we have the same file, I get the following checksum:
But let's walk this through with a public file that is in .wav format, to rule out some of the possible distractors. I'll pick First, loading the file:
Resampling:
Quantizing to 16 bit and back (because the preprocessing script goes via 16-bit .wav files):
Spectrogram:
Comparing to the distributed data:
This is as close as I can get. There is still a small mismatch between the distributed files for guitarset (which were computed in June) and the one I get with the current code and dependencies, which are:
But I strongly doubt that these differences have an effect for training or evaluation.
After 30 epochs, you should see over .90 F1 score on the validation set (if using all datasets). Can you try a training run with only the datasets from Zenodo first? |
I am currently experiencing an issue during the preprocessing step that seems to be affecting the resulting metrics. Specifically, when I attempt to generate spectrograms from scratch (instead of using the spectrograms you provided), something seems to go awry, which negatively impacts the training process.
To elaborate, I conducted some tests where I downloaded audio from the Harmonix dataset (hung et. al version) , which provides specific URLs to ensure that the song versions are consistent. I then used your annotations, but the spectrograms I generated differed from those you provided. I believe this discrepancy should not be occurring. Moreover, this issue arises with every dataset I try to train the model on, suggesting that it might not be a data problem. to help, this is the spectrogram i generated using preprocess_audio.py and the spectrogram provided by you guys, respectively.
Generated: [0. 0. 0. ... 0. 0. 0. ]
[0. 0. 0. ... 0. 0. 0. ]
[0. 0. 0. ... 0. 0. 0. ]
...
[0.01976 0.02803 0.02625 ... 0.1076 0.07574 0.05112 ]
[0.005005 0.008064 0.01441 ... 0.05185 0.03864 0.02725 ]
[0.0004032 0.0003197 0.0002983 ... 0.004044 0.004147 0.004253 ]]
Provided: [[9.7046e-03 1.0124e-02 4.0359e-03 ... 8.3984e-02 1.0144e-01 6.7139e-02]
[1.8213e-01 3.6499e-01 5.6641e-01 ... 6.5735e-02 7.7759e-02 6.1279e-02]
[1.8584e+00 2.1426e+00 2.4414e+00 ... 1.3447e+00 8.3691e-01 4.7192e-01]
...
[6.7344e+00 6.2734e+00 6.8320e+00 ... 4.3516e+00 4.3516e+00 3.0762e+00]
[6.7617e+00 5.9805e+00 7.2148e+00 ... 4.4766e+00 3.9160e+00 2.4297e+00]
[6.1719e+00 6.1953e+00 6.8672e+00 ... 3.5137e+00 3.2461e+00 1.8340e+00]]
What I find puzzling is that I have not modified the preprocessing code at all. I am wondering if perhaps a different code was used to generate the spectrograms you provided?
Any guidance or insight you could provide on this matter would be greatly appreciated.
Thank you for your assistance.
The text was updated successfully, but these errors were encountered: