Skip to content

Commit 29855ee

Browse files
authored
Create DeepSeek-vLLM-RayServe.md
This is a how to to run the Deepseek with vLLM and RayServe
1 parent e7d6fb4 commit 29855ee

File tree

1 file changed

+269
-0
lines changed

1 file changed

+269
-0
lines changed
Lines changed: 269 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,269 @@
1+
---
2+
title: DeepSeek LLM with RayServe and vLLM
3+
sidebar_position: 1
4+
---
5+
import CollapsibleContent from '../../../../src/components/CollapsibleContent';
6+
7+
:::warning
8+
Deployment of ML models on EKS requires access to GPUs or Neuron instances. If your deployment isn't working, it’s often due to missing access to these resources. Also, some deployment patterns rely on Karpenter autoscaling and static node groups; if nodes aren't initializing, check the logs for Karpenter or Node groups to resolve the issue.
9+
:::
10+
11+
# Deploying DeepSeek LLM with RayServe and vLLM
12+
13+
This guide will walk you through deploying the DeepSeek-R1-Distill-Llama-8B model using RayServe and vLLM on Amazon EKS.
14+
15+
<CollapsibleContent header={<h2><span>Deploying the Solution</span></h2>}>
16+
17+
We are utilizing Terraform Infrastructure as Code (IaC) templates to deploy an Amazon EKS cluster, and we dynamically scale GPU nodes using Karpenter when the model is deployed using RayServe YAML configurations.
18+
19+
To get started with deploying mistralai/Mistral-7B-Instruct-v0.2 on Amazon EKS, this guide will cover the necessary prerequisites and walk you through the deployment process step by step. This process includes setting up the infrastructure, deploying the Ray cluster, and creating the client Python application that sends HTTP requests to the RayServe endpoint for inferencing.
20+
21+
22+
:::danger
23+
24+
Important: Deploying on `g5.8xlarge` instances can be expensive. Ensure you carefully monitor and manage your usage to avoid unexpected costs. Consider setting budget alerts and usage limits to keep track of your expenditures.
25+
26+
:::
27+
28+
### Prerequisites
29+
Before we begin, ensure you have all the necessary prerequisites in place to make the deployment process smooth. Make sure you have installed the following tools on your machine:
30+
31+
:::info
32+
33+
To simplify the demo process, we assume the use of an IAM role with administrative privileges due to the complexity of creating minimal IAM roles for each blueprint that may create various AWS services. However, for production deployments, it is strongly advised to create an IAM role with only the necessary permissions. Employing tools such as [IAM Access Analyzer](https://aws.amazon.com/iam/access-analyzer/) can assist in ensuring a least-privilege approach.
34+
35+
:::
36+
37+
1. [aws cli](https://docs.aws.amazon.com/cli/latest/userguide/install-cliv2.html)
38+
2. [kubectl](https://Kubernetes.io/docs/tasks/tools/)
39+
3. [terraform](https://learn.hashicorp.com/tutorials/terraform/install-cli)
40+
4. [envsubst](https://pypi.org/project/envsubst/)
41+
42+
### Deploy
43+
44+
Clone the repository
45+
46+
```bash
47+
git clone https://github.com/awslabs/data-on-eks.git
48+
```
49+
50+
**Important Note:**
51+
52+
**Step1**: Ensure that you update the region in the `variables.tf` file before deploying the blueprint.
53+
Additionally, confirm that your local region setting matches the specified region to prevent any discrepancies.
54+
55+
For example, set your `export AWS_DEFAULT_REGION="<REGION>"` to the desired region:
56+
57+
58+
**Step2**: Run the installation script.
59+
60+
```bash
61+
cd data-on-eks/ai-ml/jark-stack/terraform && chmod +x install.sh
62+
```
63+
64+
```bash
65+
./install.sh
66+
```
67+
68+
### Verify the resources
69+
70+
Once the installation finishes, verify the Amazon EKS Cluster.
71+
72+
Creates k8s config file to authenticate with EKS.
73+
74+
```bash
75+
aws eks --region us-west-2 update-kubeconfig --name jark-stack
76+
```
77+
78+
```bash
79+
kubectl get nodes
80+
```
81+
82+
```text
83+
NAME STATUS ROLES AGE VERSION
84+
ip-100-64-118-130.us-west-2.compute.internal Ready <none> 3h9m v1.30.0-eks-036c24b
85+
ip-100-64-127-174.us-west-2.compute.internal Ready <none> 9h v1.30.0-eks-036c24b
86+
ip-100-64-132-168.us-west-2.compute.internal Ready <none> 9h v1.30.0-eks-036c24b
87+
```
88+
89+
Verify the Karpenter autosclaer Nodepools
90+
91+
```bash
92+
kubectl get nodepools
93+
```
94+
95+
```text
96+
NAME NODECLASS
97+
g5-gpu-karpenter g5-gpu-karpenter
98+
x86-cpu-karpenter x86-cpu-karpenter
99+
```
100+
101+
Verify the NVIDIA Device plugin
102+
103+
```bash
104+
kubectl get pods -n nvidia-device-plugin
105+
```
106+
```text
107+
NAME READY STATUS RESTARTS AGE
108+
nvidia-device-plugin-gpu-feature-discovery-b4clk 1/1 Running 0 3h13m
109+
nvidia-device-plugin-node-feature-discovery-master-568b49722ldt 1/1 Running 0 9h
110+
nvidia-device-plugin-node-feature-discovery-worker-clk9b 1/1 Running 0 3h13m
111+
nvidia-device-plugin-node-feature-discovery-worker-cwg28 1/1 Running 0 9h
112+
nvidia-device-plugin-node-feature-discovery-worker-ng52l 1/1 Running 0 9h
113+
nvidia-device-plugin-p56jj 1/1 Running 0 3h13m
114+
```
115+
116+
Verify Kuberay Operator which is used to create Ray Clusters
117+
118+
```bash
119+
kubectl get pods -n kuberay-operator
120+
```
121+
122+
```text
123+
NAME READY STATUS RESTARTS AGE
124+
kuberay-operator-7894df98dc-447pm 1/1 Running 0 9h
125+
```
126+
127+
</CollapsibleContent>
128+
129+
130+
## Step-by-Step Deployment
131+
132+
### 1. Create ECR Repository
133+
134+
First, create an ECR repository to store your custom container image:
135+
136+
```bash
137+
aws ecr create-repository \
138+
--repository-name vllm-rayserve \
139+
--image-scanning-configuration scanOnPush=true \
140+
--region <your-region>
141+
```
142+
143+
### 2. Go to Directory
144+
145+
```bash
146+
cd data-on-eks/gen-ai/inference/vllm-rayserve-gpu
147+
```
148+
149+
### 3. Update Dockerfile
150+
151+
Edit the Dockerfile and update the following lines:
152+
153+
```dockerfile
154+
# Update base image
155+
FROM rayproject/ray:2.41.0-py310-cu118 AS base
156+
157+
# Update library versions
158+
RUN pip install vllm==0.7.0 huggingface_hub==0.27.1
159+
```
160+
161+
### 4. Modify vllm_serve.py
162+
163+
Edit vllm_serve.py and remove the route prefix:
164+
165+
```python
166+
# Before:
167+
# @serve.deployment(num_replicas=1, route_prefix="/vllm")
168+
169+
# After:
170+
@serve.deployment(num_replicas=1)
171+
```
172+
173+
### 5. Build and Push Container Image
174+
175+
```bash
176+
# Get ECR login credentials
177+
aws ecr get-login-password --region <your-region> | docker login --username AWS --password-stdin <your-account-id>.dkr.ecr.<your-region>.amazonaws.com
178+
179+
# Build the image
180+
docker build -t vllm-rayserve .
181+
182+
# Tag the image
183+
docker tag vllm-rayserve:latest <your-account-id>.dkr.ecr.<your-region>.amazonaws.com/vllm-rayserve:latest
184+
185+
# Push to ECR
186+
docker push <your-account-id>.dkr.ecr.<your-region>.amazonaws.com/vllm-rayserve:latest
187+
```
188+
189+
### 6. Update ray-service-vllm.yaml
190+
191+
Edit ray-service-vllm.yaml with the following changes:
192+
193+
```yaml
194+
# Update model configuration
195+
spec:
196+
rayStartParams:
197+
env:
198+
- name: MODEL_ID
199+
value: "deepseek-ai/DeepSeek-R1-Distill-Llama-8B"
200+
- name: MAX_MODEL_LEN
201+
value: "8192"
202+
203+
# Update container image in both head and worker sections
204+
containers:
205+
- image: <your-account-id>.dkr.ecr.<your-region>.amazonaws.com/vllm-rayserve:latest
206+
```
207+
208+
### 7. Deploy the Updated Configuration
209+
210+
```bash
211+
kubectl apply -f ray-service-vllm.yaml
212+
```
213+
214+
### 8. Verify Deployment
215+
216+
```bash
217+
# Check pods status
218+
kubectl get pods -n rayserve-vllm
219+
220+
# Check services
221+
kubectl get svc -n rayserve-vllm
222+
```
223+
224+
## Testing the DeepSeek Model
225+
226+
The testing process remains similar to the original deployment, but now using the DeepSeek model:
227+
228+
```bash
229+
# Port forward the service
230+
kubectl -n rayserve-vllm port-forward svc/vllm-serve-svc 8000:8000
231+
232+
# Run the test client
233+
python3 client.py
234+
```
235+
236+
:::note
237+
The DeepSeek-R1-Distill-Llama-8B model may have different performance characteristics and memory requirements compared to Mistral. Ensure your cluster has adequate resources.
238+
:::
239+
240+
## Resource Requirements
241+
242+
- Minimum GPU: NVIDIA GPU with at least 16GB VRAM
243+
- Recommended instance type: g5.2xlarge or better
244+
- Minimum memory: 32GB RAM
245+
246+
## Monitoring and Observability
247+
248+
The monitoring setup remains the same as the original deployment, using Prometheus and Grafana. The metrics will now reflect the DeepSeek model's performance.
249+
250+
## Cleanup
251+
252+
To remove the deployment:
253+
254+
```bash
255+
# Delete the Ray service
256+
kubectl delete -f ray-service-vllm.yaml
257+
258+
# Delete the ECR repository if no longer needed
259+
aws ecr delete-repository \
260+
--repository-name vllm-rayserve \
261+
--force \
262+
--region <your-region>
263+
```
264+
265+
:::warning
266+
Make sure to monitor GPU utilization and memory usage when first deploying the DeepSeek model, as it may have different resource requirements than Mistral.
267+
:::
268+
269+
This adaptation maintains the core functionality while updating the necessary components for the DeepSeek model. The main differences are in the model configuration and resource requirements, while the deployment structure remains largely the same.

0 commit comments

Comments
 (0)