Training time #353

merlinarer · 2024-10-30T07:48:11Z

Hello, is there a reference training time ? I train qwen2-7B model 20th layers on 8*A100 80G (model on gpu 0-5, sae on gpu 6, act_store_device on gpu 7), with 0.12B tokens, it takes about 20 hours, is this nomal?

chanind · 2024-11-01T19:41:07Z

That sounds slow to me for 120M tokens. How wide is the SAE? you can try setting autocast_lm=True and autocast=True in the runner config to help speed things up. You can also try setting compile_llm=True and compile_sae=True, that might make things a bit faster as well.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Training time #353

Training time #353

merlinarer commented Oct 30, 2024

chanind commented Nov 1, 2024

Training time #353

Training time #353

Comments

merlinarer commented Oct 30, 2024

chanind commented Nov 1, 2024