Skip to content

feat(model): Add FFT, OmniNet, custom DataLoader, Windows-Support #22

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 35 commits into
base: master
Choose a base branch
from

Conversation

ClashLuke
Copy link
Member

This PR also

  • adds a new custom sum-based attention
  • changes a bunch of parameter names
  • changes small.yaml to integrate omnidirectional attention
  • breaks up our linear attention module into one ff and one attention module
  • removes DeepSpeed's broken CPUAdam
  • enforces full attention while removing autoregressive attention

The idea of FFT-based attention comes from FNet, LMU and On Learning the Transformer Kernel, but is implemented differently to optimize the expressivity of our model.
OmniNet attends to all previous hidden states instead of only the current hidden state, bridging the gap between linear attention and full attention.
A custom data loader is required as PyTorch's data loader gives CPU-OOMs, has a broken shuffling function and requires >8GiB RAM to instantiate 12 empty classes. While this wasn't the case in PyTorch 1.9, it is in 1.10 on WSL.
As WSL cannot deallocate GPU memory, we had to support windows natively.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants