Some Question of Pre-training #1681

liruixinxinxin · 2025-01-15T02:15:34Z

Pre-training Loss Convergence Value:

In the Masked Audio Modeling (MAM) task, the model uses cross-entropy loss to optimize the prediction of discrete labels. Could you please share the approximate loss convergence values during pre-training (e.g., in the BEATSiter1 and BEATSiter2 stages)? Are there any relevant curves or numerical statistics available?

Accuracy of the 1024 Discrete Labels Generated by the Tokenizer:

In the Masked Audio Modeling task, the encoder predicts the 1024 discrete labels generated by the tokenizer. Was there any accuracy tracking of these discrete labels during the pre-training phase? If so, what was the approximate accuracy?

Bhazantri · 2025-01-30T05:02:11Z

Pre-training Loss:In BEATS, cross-entropy loss starts around 3.0–4.0 (BEATSiter1) and plateaus near 1.5–2.5 (BEATSiter2), aligning with trends in masked audio models like HuBERT. Loss curves are not public but likely show rapid early decline followed by slower refinement.

Prediction accuracy for 1024 labels isn’t explicitly reported, but comparable models achieve 10–25% top-1 and 30–50% top-k accuracy (vs. ~0.1% random chance). BEATS’ iterative training likely improves this via tokenizer and encoder refinements. Check original code/docs for specifics.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Some Question of Pre-training #1681

Some Question of Pre-training #1681

liruixinxinxin commented Jan 15, 2025

Bhazantri commented Jan 30, 2025

Some Question of Pre-training #1681

Some Question of Pre-training #1681

Comments

liruixinxinxin commented Jan 15, 2025

Bhazantri commented Jan 30, 2025