Efficient Triton Kernels for LLM Training
-
Updated
Dec 23, 2024 - Python
Efficient Triton Kernels for LLM Training
Kernl lets you run PyTorch transformer models several times faster on GPU with a single line of code, and is designed to be easily hackable.
A service for autodiscovery and configuration of applications running in containers
Playing with the Tigress software protection. Break some of its protections and solve their reverse engineering challenges. Automatic deobfuscation using symbolic execution, taint analysis and LLVM.
Quantized Attention that achieves speedups of 2.1-3.1x and 2.7-5.1x compared to FlashAttention2 and xformers, respectively, without lossing end-to-end metrics across various models.
🚀🚀🚀A collection of some wesome public projects about Large Language Model(LLM), Visual Language Model(VLM), AI Generated Content(AIGC), the related Datasets and Applications.
A subset of PyTorch's neural network modules, written in Python using OpenAI's Triton.
FlagGems is an operator library for large language models implemented in Triton Language.
Automatic ROPChain Generation
SymGDB - symbolic execution plugin for gdb
LLVM based static binary analysis framework
A performance library for machine learning applications.
🔥🔥🔥 A collection of some awesome public CUDA, cuBLAS, cuDNN, CUTLASS, TensorRT, TensorRT-LLM, Triton, MLIR and High Performance Computing (HPC) projects.
ClearML - Model-Serving Orchestration and Repository Solution
(WIP)The deployment framework aims to provide a simple, lightweight, fast integrated, pipelined deployment framework for algorithm service that ensures reliability, high concurrency and scalability of services.
NVIDIA-accelerated, deep learned model support for image space object detection
Add a description, image, and links to the triton topic page so that developers can more easily learn about it.
To associate your repository with the triton topic, visit your repo's landing page and select "manage topics."