Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some Question of Pre-training #1681

Open
liruixinxinxin opened this issue Jan 15, 2025 · 1 comment
Open

Some Question of Pre-training #1681

liruixinxinxin opened this issue Jan 15, 2025 · 1 comment

Comments

@liruixinxinxin
Copy link

Pre-training Loss Convergence Value:

In the Masked Audio Modeling (MAM) task, the model uses cross-entropy loss to optimize the prediction of discrete labels. Could you please share the approximate loss convergence values during pre-training (e.g., in the BEATSiter1 and BEATSiter2 stages)? Are there any relevant curves or numerical statistics available?

Accuracy of the 1024 Discrete Labels Generated by the Tokenizer:

In the Masked Audio Modeling task, the encoder predicts the 1024 discrete labels generated by the tokenizer. Was there any accuracy tracking of these discrete labels during the pre-training phase? If so, what was the approximate accuracy?

@Bhazantri
Copy link

Pre-training Loss:In BEATS, cross-entropy loss starts around 3.0–4.0 (BEATSiter1) and plateaus near 1.5–2.5 (BEATSiter2), aligning with trends in masked audio models like HuBERT. Loss curves are not public but likely show rapid early decline followed by slower refinement.

Prediction accuracy for 1024 labels isn’t explicitly reported, but comparable models achieve 10–25% top-1 and 30–50% top-k accuracy (vs. ~0.1% random chance). BEATS’ iterative training likely improves this via tokenizer and encoder refinements. Check original code/docs for specifics.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants