#

tensorrt-llm

Here are 15 public repositories matching this topic...

janhq / cortex

Drop-in, local AI alternative to the OpenAI stack. Multi-engine (llama.cpp, TensorRT-LLM). Powers 👋 Jan

ai cuda llama accelerated inference-engine openai-api llm stable-diffusion llms llamacpp llama2 gguf tensorrt-llm

Updated May 17, 2024
C++

DefTruth / Awesome-LLM-Inference

📖A curated list of Awesome LLM Inference Paper with codes, TensorRT-LLM, vLLM, streaming-llm, AWQ, SmoothQuant, WINT8/4, Continuous Batching, FlashAttention, PagedAttention etc.

sora llm llms vllm llm-inference awesome-llm flash-attention flash-attention-2 tensorrt-llm paged-attention streaming-llm deepseek open-sora

Updated May 15, 2024

collabora / WhisperLive

A nearly-live implementation of OpenAI's Whisper.

text-to-speech translation voice-recognition openai obs dictation whisper tensorrt tensorrt-llm whisper-tensorrt

Updated May 17, 2024
Python

huggingface / optimum-benchmark

A unified multi-backend utility for benchmarking Transformers, Timm, PEFT, Diffusers and Sentence-Transformers with full support of Optimum's hardware optimizations & quantization schemes.

benchmark pytorch openvino onnxruntime text-generation-inference neural-compressor tensorrt-llm

Updated May 17, 2024
Python

shashikg / WhisperS2T

An Optimized Speech-to-Text Pipeline for the Whisper Model Supporting Multiple Inference Engine

deep-learning speech-recognition vad speech-to-text whisper asr tensorrt voice-activity-detection tensorrt-llm

Updated Apr 5, 2024
Jupyter Notebook

npuichigo / openai_trtllm

OpenAI compatible API for TensorRT LLM triton backend

triton-inference-server openai-api llm langchain tensorrt-llm

Updated Apr 26, 2024
Rust

rpehkone / Chat-With-RTX-python-api

Chat With RTX Python API

tensorrt llm llm-inference tensorrt-llm mistral-7b llama2-13b chat-with-rtx nvidia-chat-with-rtx

Updated May 1, 2024
Python

openhackathons-org / End-to-End-LLM

This repository is an AI Bootcamp material that consist of a workflow for LLM

natural-language-processing deep-learning question-answering prompt-tuning p-tuning llm genai nemo-guardrails tensorrt-llm nemo-megatron

Updated May 16, 2024
Jupyter Notebook

janhq / nitro-tensorrt-llm

Nitro is an C++ inference server on top of TensorRT-LLM. OpenAI-compatible API. Run blazing fast inference on Nvidia GPUs. Used in Jan

jan tensorrt llm tensorrt-llm

Updated May 15, 2024
C++

EdVince / whisper-trtllm

Whisper in TensorRT-LLM

cuda transformers openai whisper asr tensorrt huggingface tensorrt-llm

Updated Sep 21, 2023
C++

zRzRzRzRzRzRzR / lm-fly

大模型推理框架加速，让 LLM 飞起来

mlx tgi openvino llm vllm llm-inference tensorrt-llm

Updated May 10, 2024
Python

fgblanch / OutlookLLM

Add-in for new Outlook that adds LLM new features (Composition, Summarizing, Q&A). It uses a local LLM via Nvidia TensorRT-LLM

outlook-addin tensorrt-llm

Updated Feb 24, 2024
Python

lix19937 / llm-deploy

AI Infra LLM infer/ tensorrt-llm/ vllm

llm llm-inference tensorrt-llm

Updated Mar 29, 2024
Python

CactusQ / TensorRT-LLM-Tutorial

Getting started with TensorRT-LLM using BLOOM as a case study

jupyter-notebook deeplearning tensorrt tensorrt-inference llms llm-inference tensorrt-llm

Updated Mar 7, 2024
Jupyter Notebook

cyanff / nyxt

ai tensorrt extension-chrome tensorrt-llm

Updated Feb 23, 2024
TypeScript

Improve this page

Add a description, image, and links to the tensorrt-llm topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the tensorrt-llm topic, visit your repo's landing page and select "manage topics."