You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello, is there a reference training time ? I train qwen2-7B model 20th layers on 8*A100 80G (model on gpu 0-5, sae on gpu 6, act_store_device on gpu 7), with 0.12B tokens, it takes about 20 hours, is this nomal?
The text was updated successfully, but these errors were encountered:
That sounds slow to me for 120M tokens. How wide is the SAE? you can try setting autocast_lm=True and autocast=True in the runner config to help speed things up. You can also try setting compile_llm=True and compile_sae=True, that might make things a bit faster as well.
Hello, is there a reference training time ? I train qwen2-7B model 20th layers on 8*A100 80G (model on gpu 0-5, sae on gpu 6, act_store_device on gpu 7), with 0.12B tokens, it takes about 20 hours, is this nomal?
The text was updated successfully, but these errors were encountered: