-
Updated
May 31, 2024 - C++
model-serving
Here are 129 public repositories matching this topic...
A high-throughput and memory-efficient inference and serving engine for LLMs
-
Updated
May 31, 2024 - Python
⛅ Versatile Data Pipeline (VDP) console website
-
Updated
May 31, 2024 - TypeScript
The easiest way to serve AI/ML models in production - Build Model Inference Service, LLM APIs, Multi-model Inference Graph/Pipelines, LLM/RAG apps, and more!
-
Updated
May 31, 2024 - Python
Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs
-
Updated
May 31, 2024 - Python
🏡 Instill AI organisation profile and default configuration
-
Updated
May 31, 2024
The simplest way to serve AI/ML models in production
-
Updated
May 30, 2024 - Python
Standardized Serverless ML Inference Platform on Kubernetes
-
Updated
May 30, 2024 - Python
MLRun is an open source MLOps platform for quickly building and managing continuous ML applications across their lifecycle. MLRun integrates into your development and CI/CD environment and automates the delivery of production data, ML pipelines, and online applications.
-
Updated
May 31, 2024 - Python
PyTorch/XLA integration with JetStream (https://github.com/google/JetStream) for LLM inference"
-
Updated
May 31, 2024 - Python
Tools for easing the handoff between AI/ML and App/SRE teams.
-
Updated
May 30, 2024 - Go
FEDML - The unified and scalable ML library for large-scale distributed training, model serving, and federated learning. FEDML Launch, a cross-cloud scheduler, further enables running any AI jobs on any GPU cloud or on-premise cluster. Built on this library, TensorOpera AI (https://TensorOpera.ai) is your generative AI platform at scale.
-
Updated
May 31, 2024 - Python
OneDiffusion: Run any Stable Diffusion models and fine-tuned weights with ease
-
Updated
May 30, 2024 - Python
LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.
-
Updated
May 29, 2024 - Python
JetStream is a throughput and memory optimized engine for LLM inference on XLA devices, starting with TPUs (and GPUs in future -- PRs welcome).
-
Updated
May 30, 2024 - Python
A scalable inference server for models optimized with OpenVINO™
-
Updated
May 30, 2024 - C++
Okik is a command-line interface (CLI) tool for LLM, RAG and model serving.
-
Updated
May 30, 2024 - Python
Hopsworks - Data-Intensive AI platform with a Feature Store
-
Updated
May 30, 2024 - Java
AICI: Prompts as (Wasm) Programs
-
Updated
May 28, 2024 - Rust
🏕️ Reproducible development environment
-
Updated
May 27, 2024 - Go
Improve this page
Add a description, image, and links to the model-serving topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with the model-serving topic, visit your repo's landing page and select "manage topics."