Add Implementation of Native Sparse Attention#137
Open
yukavio wants to merge 26 commits intoHazyResearch:mainfrom
Open
Add Implementation of Native Sparse Attention#137yukavio wants to merge 26 commits intoHazyResearch:mainfrom
yukavio wants to merge 26 commits intoHazyResearch:mainfrom
Conversation
…128 need to improve
…ome optimization opportunity
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR try to add Implementation of Compressed Attention and Selected Attention of Native Sparse Attention

The hyperparameter of selected and compressed attention kernel is setting for good performance on H20. It should be changed if we want to get better performance on other devices.
This PR is not ready for merging. I will reorganize the code and add details of performance metrics for this PR this week.
The full implementation which could be used to training the Native Sparse Model could be find at https://github.com/yukavio/nsa/tree/main/. The current codebase is implemented with Triton, but we will soon switch to the kernel introduced in this PR for better performance. This is my first time contributing code to the ThunderKittens community, and I welcome any suggestions for improvement from the community.