Description
This is added to OSSP program, so you need to go to https://summer-ospp.ac.cn/org/prodetail/257c80106?list=org&navpage=org to know the details. Remove the OSSP tag as no one applied for this task.
(1) Background: llmaz is a lightweight inference platform based on Kubernetes, focused on efficient deployment and inference of large language models (https://github.com/InftyAI/llmaz). Dragonfly is an open-source P2P file distribution and image acceleration system suitable for cloud-native environments, enhancing model and image distribution efficiency. llmaz has integrated Manta as a lightweight model caching system, but support for image and model distribution needs further optimization.
(2) Existing Work: llmaz supports multiple model providers (e.g., HuggingFace) and inference backends (e.g., vLLM), with Manta providing model caching and distribution. Manta leverages P2P technology for model sharding and preheating but focuses solely on models, not container images, and its functionality is still under refactoring.
What would you like to be added:
(4) Desired Improvements: Integrate Dragonfly to optimize llmaz’s image and model distribution efficiency, supporting unified P2P caching and acceleration. Referencing Manta’s lightweight design, ensure Dragonfly integration maintains low resource usage while improving speed and stability.
(5) Ultimate Goal: Implement efficient image and model distribution for llmaz using Dragonfly, enhance P2P caching and acceleration, and build a lightweight, versatile solution referencing Manta to improve deployment efficiency and reduce resource costs.
Why is this needed:
llmaz currently lacks efficient container image distribution support. Model distribution relies on Manta, which is incomplete and does not handle images. Dragonfly’s P2P distribution capabilities are not yet integrated, resulting in slow image and model loading, impacting deployment efficiency.
Completion requirements:
This enhancement requires the following artifacts:
- Design doc
- API change
- Docs update
The artifacts should be linked in subsequent comments.
- Integrate Dragonfly into llmaz for P2P distribution of images and models.
- Develop lightweight Dragonfly configuration for efficient caching and acceleration.
- Provide a unified interface for image and model distribution management in llmaz.
- Optimize llmaz deployment speed using Dragonfly and generate performance reports.
- Write Dragonfly integration documentation and produce deployment test reports.
@carlory will be the mentor of this task.