Stars
Ecosystem of libraries and tools for writing and executing fast GPU code fully in Rust.
A Datacenter Scale Distributed Inference Serving Framework
Cost-efficient and pluggable Infrastructure components for GenAI inference
Smart glasses OS, with dozens of built-in apps. Users get AI assistant, notifications, translation, screen mirror, captions, and more. Devs get to write 1 app that runs on any pair of smart glases.
User-friendly AI Interface (Supports Ollama, OpenAI API, ...)
A language for constraint-guided and efficient LLM programming.
SGLang is a fast serving framework for large language models and vision language models.
An extremely fast Python package and project manager, written in Rust.
🚀 Efficient implementations of state-of-the-art linear attention models in Torch and Triton
aider is AI pair programming in your terminal
A curated list of resources for learning and exploring Triton, OpenAI's programming language for writing efficient GPU code.
This repository is a curated collection of resources, tutorials, and practical examples designed to guide you through the journey of mastering CUDA programming. Whether you're just starting or look…
Copy a bunch of files into your clipboard to provide context for LLMs
Social networking technology created by Bluesky
It is said that, Ilya Sutskever gave John Carmack this reading list of ~ 30 research papers on deep learning.
[ICLR 2025] Pyramidal Flow Matching for Efficient Video Generative Modeling
DIAMOND (DIffusion As a Model Of eNvironment Dreams) is a reinforcement learning agent trained in a diffusion world model. NeurIPS 2024 Spotlight.
🗜️ Codebase-digest is your AI-friendly codebase packer and analyzer. Features 60+ coding prompts and generates structured overviews with metrics. Ideal for feeding projects to LLMs like GPT-4, Clau…
Whisper realtime streaming for long speech-to-text transcription and translation
Replace OpenAI GPT with another LLM in your app by changing a single line of code. Xinference gives you the freedom to use any LLM you need. With Xinference, you're empowered to run inference with …
A comprehensive repository of reasoning tasks for LLMs (and beyond)
Machine Learning Engineering Open Book
Tile primitives for speedy kernels
Flash Attention in raw Cuda C beating PyTorch