Drop-in, local AI alternative to the OpenAI stack. Multi-engine (llama.cpp, TensorRT-LLM). Powers 👋 Jan
-
Updated
May 17, 2024 - C++
Drop-in, local AI alternative to the OpenAI stack. Multi-engine (llama.cpp, TensorRT-LLM). Powers 👋 Jan
📖A curated list of Awesome LLM Inference Paper with codes, TensorRT-LLM, vLLM, streaming-llm, AWQ, SmoothQuant, WINT8/4, Continuous Batching, FlashAttention, PagedAttention etc.
A nearly-live implementation of OpenAI's Whisper.
A unified multi-backend utility for benchmarking Transformers, Timm, PEFT, Diffusers and Sentence-Transformers with full support of Optimum's hardware optimizations & quantization schemes.
An Optimized Speech-to-Text Pipeline for the Whisper Model Supporting Multiple Inference Engine
OpenAI compatible API for TensorRT LLM triton backend
Chat With RTX Python API
This repository is an AI Bootcamp material that consist of a workflow for LLM
Nitro is an C++ inference server on top of TensorRT-LLM. OpenAI-compatible API. Run blazing fast inference on Nvidia GPUs. Used in Jan
Whisper in TensorRT-LLM
大模型推理框架加速,让 LLM 飞起来
Add-in for new Outlook that adds LLM new features (Composition, Summarizing, Q&A). It uses a local LLM via Nvidia TensorRT-LLM
Getting started with TensorRT-LLM using BLOOM as a case study
Add a description, image, and links to the tensorrt-llm topic page so that developers can more easily learn about it.
To associate your repository with the tensorrt-llm topic, visit your repo's landing page and select "manage topics."