Can RWKV beat Flash Attention? #235

yxchng · 2024-04-13T16:13:58Z

I have been experimenting with RWKV v4 and v4neo but somehow it is using much more memory (about 2x) than my LM that uses Flash Attention. Not sure what I am doing wrong. Is this expected?

BlinkDL · 2024-04-16T09:52:43Z

Try v5 first. What's your model size, bsz, ctxlen

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can RWKV beat Flash Attention? #235

Can RWKV beat Flash Attention? #235

yxchng commented Apr 13, 2024

BlinkDL commented Apr 16, 2024

Can RWKV beat Flash Attention? #235

Can RWKV beat Flash Attention? #235

Comments

yxchng commented Apr 13, 2024

BlinkDL commented Apr 16, 2024