vllm
Here are 52 public repositories matching this topic...
Scripts for fine-tuning Meta Llama3 with composable FSDP & PEFT methods to cover single/multi-node GPUs. Supports default & custom datasets for applications such as summarization and Q&A. Supporting a number of candid inference solutions such as HF TGI, VLLM for local or cloud deployment. Demo apps to showcase Meta Llama3 for WhatsApp & Messenger.
-
Updated
May 28, 2024 - Jupyter Notebook
Replace OpenAI GPT with another LLM in your app by changing a single line of code. Xinference gives you the freedom to use any LLM you need. With Xinference, you're empowered to run inference with any open-source language models, speech recognition models, and multimodal models, whether in the cloud, on-premises, or even on your laptop.
-
Updated
May 28, 2024 - Python
🔒 Enterprise-grade API gateway that helps you monitor and impose cost or rate limits per API key. Get fine-grained access control and monitoring per user, application, or environment. Supports OpenAI, Azure OpenAI, Anthropic, vLLM, and open-source LLMs.
-
Updated
May 27, 2024 - Go
Standardized spec and vendor-specific transforms for ChatML
-
Updated
May 27, 2024 - Python
📖A curated list of Awesome LLM Inference Paper with codes, TensorRT-LLM, vLLM, streaming-llm, AWQ, SmoothQuant, WINT8/4, Continuous Batching, FlashAttention, PagedAttention etc.
-
Updated
May 27, 2024
Setup and run a local LLM and Chatbot using consumer grade hardware.
-
Updated
May 25, 2024 - JavaScript
An Easy-to-use, Scalable and High-performance RLHF Framework (Support 70B+ full tuning & LoRA & Mixtral & KTO)
-
Updated
May 25, 2024 - Python
Context layer on top of your unstructured universe
-
Updated
May 23, 2024 - JavaScript
-
Updated
May 22, 2024 - Jupyter Notebook
The RunPod worker template for serving our large language model endpoints. Powered by vLLM.
-
Updated
May 19, 2024 - Python
Carbon Limiting Auto Tuning for Kubernetes
-
Updated
May 18, 2024 - Python
A REST API for vLLM, production ready
-
Updated
May 17, 2024 - Python
llm-inference is a platform for publishing and managing llm inference, providing a wide range of out-of-the-box features for model deployment, such as UI, RESTful API, auto-scaling, computing resource management, monitoring, and more.
-
Updated
May 17, 2024 - Python
A large-scale simulation framework for LLM inference
-
Updated
May 15, 2024 - Python
Evaluate open-source language models on Agent, formatted output, command following, long text, multilingual, coding, and custom task capabilities. 开源语言模型在Agent,格式化输出,指令追随,长文本,多语言,代码,自定义任务的能力基准测试。
-
Updated
May 10, 2024 - Python
大模型推理框架加速,让 LLM 飞起来
-
Updated
May 10, 2024 - Python
Reasoning in Large Language Models: Papers and Resources, including Chain-of-Thought, Instruction-Tuning and Multimodality.
-
Updated
May 8, 2024
Improve this page
Add a description, image, and links to the vllm topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with the vllm topic, visit your repo's landing page and select "manage topics."