Skip to content

robflynnyh/hydra-linear-attention

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 

Repository files navigation

hydra-linear-attention

Implementation of the thingy described in this paper: https://arxiv.org/pdf/2209.07484.pdf

  • code is mostly taken from the appendix of the paper its pretty simple
  • basically its linear attention with heads equeal to the feature dim, they use l2 norm as the kernel fn rather than softmax as it allows you to scale the "head" dimension, which makes it faster
  • idk if it's descriptive to say stuff like this is similar to regular attention - I see it being more similar to something like squeeze and excite layers

Releases

No releases published

Packages

No packages published

Languages