Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Training time #353

Open
merlinarer opened this issue Oct 30, 2024 · 1 comment
Open

Training time #353

merlinarer opened this issue Oct 30, 2024 · 1 comment

Comments

@merlinarer
Copy link

Hello, is there a reference training time ? I train qwen2-7B model 20th layers on 8*A100 80G (model on gpu 0-5, sae on gpu 6, act_store_device on gpu 7), with 0.12B tokens, it takes about 20 hours, is this nomal?

@chanind
Copy link
Collaborator

chanind commented Nov 1, 2024

That sounds slow to me for 120M tokens. How wide is the SAE? you can try setting autocast_lm=True and autocast=True in the runner config to help speed things up. You can also try setting compile_llm=True and compile_sae=True, that might make things a bit faster as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants