#

vllm

Here are 52 public repositories matching this topic...

mustafaaljadery / llama3v

A SOTA vision model built on top of llama3 8B.

llama vllm llama3

Updated May 28, 2024
Python

meta-llama / llama-recipes

Scripts for fine-tuning Meta Llama3 with composable FSDP & PEFT methods to cover single/multi-node GPUs. Supports default & custom datasets for applications such as summarization and Q&A. Supporting a number of candid inference solutions such as HF TGI, VLLM for local or cloud deployment. Demo apps to showcase Meta Llama3 for WhatsApp & Messenger.

python machine-learning ai pytorch llama finetuning llm langchain vllm llama2

Updated May 28, 2024
Jupyter Notebook

xorbitsai / inference

Replace OpenAI GPT with another LLM in your app by changing a single line of code. Xinference gives you the freedom to use any LLM you need. With Xinference, you're empowered to run inference with any open-source language models, speech recognition models, and multimodal models, whether in the cloud, on-premises, or even on your laptop.

Updated May 28, 2024
Python

gotzmann / booster

Booster - open platform for serving LLM models

openai llama gpt llm chatgpt llamacpp llama-cpp vllm ggml exllama oobabooga ollama

Updated May 27, 2024
C++

bricks-cloud / BricksLLM

🔒 Enterprise-grade API gateway that helps you monitor and impose cost or rate limits per API key. Get fine-grained access control and monitoring per user, application, or environment. Supports OpenAI, Azure OpenAI, Anthropic, vLLM, and open-source LLMs.

api docker golang open-source security privacy ai azure rest-api postgresql self-hosted artificial-intelligence ycombinator openai gpt llm generative-ai anthropic vllm

Updated May 27, 2024
Go

julep-ai / standard-chatml

Standardized spec and vendor-specific transforms for ChatML

openai gemini-api anthropic vllm chatml litellm standard-chatml

Updated May 27, 2024
Python

DefTruth / Awesome-LLM-Inference

📖A curated list of Awesome LLM Inference Paper with codes, TensorRT-LLM, vLLM, streaming-llm, AWQ, SmoothQuant, WINT8/4, Continuous Batching, FlashAttention, PagedAttention etc.

sora llm llms vllm llm-inference awesome-llm flash-attention flash-attention-2 tensorrt-llm paged-attention streaming-llm deepseek open-sora

Updated May 27, 2024

jasonacox / TinyLLM

Setup and run a local LLM and Chatbot using consumer grade hardware.

chatbot artificial-intelligence openai rag large-language-models llm vllm retrieval-augmented-generation llama-cpp-python

Updated May 25, 2024
JavaScript

OpenLLMAI / OpenRLHF

An Easy-to-use, Scalable and High-performance RLHF Framework (Support 70B+ full tuning & LoRA & Mixtral & KTO)

reinforcement-learning raylib transformers deepspeed large-language-models reinforcement-learning-from-human-feedback vllm

Updated May 25, 2024
Python

idncsk / canvas

Context layer on top of your unstructured universe

electron nodejs javascript ai notes roaring-bitmaps lmdb svelte tabs-management llms vllm

Updated May 23, 2024
JavaScript

AgnostiqHQ / tutorials_covalent_pycon_2024

ai hpc gpu ml llama covalent agents autonomous-agents huggingface large-language-models llm chatgpt llamacpp vllm ai-foundry

Updated May 22, 2024
Jupyter Notebook

runpod-workers / worker-vllm

The RunPod worker template for serving our large language model endpoints. Powered by vLLM.

language-model llm runpod vllm

Updated May 19, 2024
Python

stableagents / stableagents-basic

VLLMs that can browse the web with voice

Updated May 18, 2024

Climatik-Project / Climatik-Project

Carbon Limiting Auto Tuning for Kubernetes

kubernetes sustainability kepler kubernetes-operator power-capping green-computing keda kserve llm vllm llm-inference

Updated May 18, 2024
Python

OSS-Pole-Emploi / happy_vllm

A REST API for vLLM, production ready

production transformers api-rest serving mlops llm llm-serving vllm

Updated May 17, 2024
Python

OpenCSGs / llm-inference

llm-inference is a platform for publishing and managing llm inference, providing a wide range of out-of-the-box features for model deployment, such as UI, RESTful API, auto-scaling, computing resource management, monitoring, and more.

transformer ray deepspeed llama-cpp vllm llm-inference

Updated May 17, 2024
Python

microsoft / vidur

A large-scale simulation framework for LLM inference

simulation inference transformer llm vllm

Updated May 15, 2024
Python

EvilPsyCHo / Open-LLM-Benchmark

Evaluate open-source language models on Agent, formatted output, command following, long text, multilingual, coding, and custom task capabilities. 开源语言模型在Agent，格式化输出，指令追随，长文本，多语言，代码，自定义任务的能力基准测试。

openai evaluation-framework huggingface large-language-models llamacpp vllm llm-agent llms-benchmarking

Updated May 10, 2024
Python

zRzRzRzRzRzRzR / lm-fly

大模型推理框架加速，让 LLM 飞起来

mlx tgi openvino llm vllm llm-inference tensorrt-llm

Updated May 10, 2024
Python

atfortes / Awesome-LLM-Reasoning

Reasoning in Large Language Models: Papers and Resources, including Chain-of-Thought, Instruction-Tuning and Multimodality.

awesome prompt question-answering gpt papers language-models reasoning cot multimodal gpt-4 in-context-learning prompt-engineering chain-of-thought chatgpt mllm vllm

Updated May 8, 2024

Improve this page

Add a description, image, and links to the vllm topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the vllm topic, visit your repo's landing page and select "manage topics."