flash-attention-2

Here are 6 public repositories matching this topic...

DefTruth / Awesome-LLM-Inference

📖A curated list of Awesome LLM Inference Paper with codes, TensorRT-LLM, vLLM, streaming-llm, AWQ, SmoothQuant, WINT8/4, Continuous Batching, FlashAttention, PagedAttention etc.

sora llm llms vllm llm-inference awesome-llm flash-attention flash-attention-2 tensorrt-llm paged-attention streaming-llm deepseek open-sora

Updated May 30, 2024

DefTruth / CUDA-Learn-Notes

Star

🎉CUDA 笔记 / 大模型手撕CUDA / C++笔记，更新随缘: flash_attn、sgemm、sgemv、warp reduce、block reduce、dot product、elementwise、softmax、layernorm、rmsnorm、hist etc.

cuda cuda-kernels gemm softmax cuda-programming layernorm gemv elementwise rmsnorm flash-attention flash-attention-2 warp-reduce block-reduce

Updated May 19, 2024
Cuda

arihanv / Shush

Star

Shush is an app that deploys a WhisperV3 model with Flash Attention v2 on Modal and makes requests to it via a NextJS app

machine-learning modal transcription whisper huggingface-transformers shadcn-ui flash-attention-2

Updated Dec 29, 2023
TypeScript

BBC-Esq / WhisperS2T-transcriber

Star

Uses the powerful WhisperS2T and Ctranslate2 libraries to batch transcribe multiple files

audio-recorder audio-recording transcription audio-transcribing transcriber audio-transcription transcr ctranslate2 flash-attention-2 whispers2t

Updated Mar 12, 2024
Python

graphcore-research / flash-attention-ipu

Star

Poplar implementation of FlashAttention for IPU

deep-learning transformers pytorch ipu graphcore poplar flash-attention flash-attention-2

Updated Mar 12, 2024
C++

nickpotafiy / illama

Star

A lightweight, fast, parallel inference server for Llama

server inference llama llm-inference exllama llama2 flash-attention-2 paged-attention llama3 exllamav2

Updated May 31, 2024
Python

Improve this page

Add a description, image, and links to the flash-attention-2 topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the flash-attention-2 topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

flash-attention-2

Here are 6 public repositories matching this topic...

DefTruth / Awesome-LLM-Inference

DefTruth / CUDA-Learn-Notes

arihanv / Shush

BBC-Esq / WhisperS2T-transcriber

graphcore-research / flash-attention-ipu

nickpotafiy / illama

Improve this page

Add this topic to your repo