You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In the Masked Audio Modeling (MAM) task, the model uses cross-entropy loss to optimize the prediction of discrete labels. Could you please share the approximate loss convergence values during pre-training (e.g., in the BEATSiter1 and BEATSiter2 stages)? Are there any relevant curves or numerical statistics available?
Accuracy of the 1024 Discrete Labels Generated by the Tokenizer:
In the Masked Audio Modeling task, the encoder predicts the 1024 discrete labels generated by the tokenizer. Was there any accuracy tracking of these discrete labels during the pre-training phase? If so, what was the approximate accuracy?
The text was updated successfully, but these errors were encountered:
Pre-training Loss:In BEATS, cross-entropy loss starts around 3.0–4.0 (BEATSiter1) and plateaus near 1.5–2.5 (BEATSiter2), aligning with trends in masked audio models like HuBERT. Loss curves are not public but likely show rapid early decline followed by slower refinement.
Prediction accuracy for 1024 labels isn’t explicitly reported, but comparable models achieve 10–25% top-1 and 30–50% top-k accuracy (vs. ~0.1% random chance). BEATS’ iterative training likely improves this via tokenizer and encoder refinements. Check original code/docs for specifics.
Pre-training Loss Convergence Value:
In the Masked Audio Modeling (MAM) task, the model uses cross-entropy loss to optimize the prediction of discrete labels. Could you please share the approximate loss convergence values during pre-training (e.g., in the BEATSiter1 and BEATSiter2 stages)? Are there any relevant curves or numerical statistics available?
Accuracy of the 1024 Discrete Labels Generated by the Tokenizer:
In the Masked Audio Modeling task, the encoder predicts the 1024 discrete labels generated by the tokenizer. Was there any accuracy tracking of these discrete labels during the pre-training phase? If so, what was the approximate accuracy?
The text was updated successfully, but these errors were encountered: