Skip to content

Commit 08725d7

Browse files
committed
Modify user-stories, add device-plugins and DRA comparision, update Alternatives
Signed-off-by: Alay Patel <[email protected]>
1 parent 1c24759 commit 08725d7

File tree

1 file changed

+173
-13
lines changed

1 file changed

+173
-13
lines changed

design-proposals/dra.md

Lines changed: 173 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -35,9 +35,10 @@ control of their devices using Virtual Machines and Containers.
3535

3636
- As a user, I would like to use my GPU dra driver with KubeVirt
3737
- As a user, I would like to use KubeVirt's default driver
38+
- As a user, in heterogeneous clusters, i.e. clusters made of nodes with different hardware managed through DRA drivers,
39+
I should be able to easily identify what hardware was allocated to the VMI
3840
- As a developer, I would like APIs to be extensible so I can develop drivers/webhooks/automation for custom use-cases
39-
- As a device-plugin author, I would like to have an easy way to support KubeVirt
40-
- As a device-plugin author, I would like to have a common mechanism for exposing devices for containers and VMs
41+
- As a device-plugin author, I would like to have a well documented way, intuitive way to support devices in KubeVirt
4142

4243
## Use Cases
4344

@@ -233,16 +234,12 @@ status:
233234
gpuStatuses:
234235
- deviceResourceClaimStatus:
235236
deviceAttributes:
236-
driverVersion:
237-
version: 1.0.0
238-
index:
239-
int: 0
240-
model:
241-
string: LATEST-GPU-MODEL
242-
uuid:
243-
string: gpu-8e942949-f10b-d871-09b0-ee0657e28f90
244-
pciAddress:
245-
string: 0000:01:00.0
237+
pciAddress:
238+
string: 0000:65:00.0
239+
productName:
240+
string: RTX 4080
241+
type:
242+
string: gpu
246243
deviceName: gpu-0
247244
resourceClaimName: virt-launcher-vmi-fedora-9bjwb-gpu-resource-claim-m4k28
248245
name: pgpu
@@ -340,6 +337,117 @@ spec:
340337
resourceClaimTemplateName: test-pci-claim-template
341338
```
342339

340+
#### Comparing DRA APIs with Device Plugins
341+
342+
In the case of device plugins, a pre-defined status resource which is usually identified by a device model, e.g.
343+
`nvidia.com/GP102GL_Tesla_P40` is configured. Users consume this device via the following spec:
344+
```yaml
345+
apiVersion: kubevirt.io/v1alpha3
346+
kind: VirtualMachineInstance
347+
metadata:
348+
labels:
349+
special: vmi-gpu
350+
name: vmi-gpu
351+
spec:
352+
domain:
353+
devices:
354+
gpus:
355+
- deviceName: nvidia.com/GP102GL_Tesla_P40
356+
name: pgpu
357+
```
358+
359+
In the case of DRA there is a level of indirection, where the information about what device is allocated to the VMI
360+
could be lost in the resource claim object. For example, consider a ResourceClaimTemplate:
361+
362+
```yaml
363+
apiVersion: resource.k8s.io/v1alpha3
364+
kind: ResourceClaimTemplate
365+
metadata:
366+
name: single-gpu
367+
namespace: gpu-test1
368+
spec:
369+
spec:
370+
devices:
371+
requests:
372+
- allocationMode: ExactCount
373+
count: 1
374+
deviceClassName: vfiopci.nvidia.com
375+
name: gpu
376+
---
377+
apiVersion: resource.k8s.io/v1alpha3
378+
kind: DeviceClass
379+
metadata:
380+
name: vfiopci.example.com
381+
spec:
382+
config:
383+
- opaque:
384+
driver: gpu.nvidia.com
385+
parameters:
386+
apiVersion: gpu.nvidia.com/v1alpha1
387+
driverConfig:
388+
driver: vfio-pci
389+
kind: GpuConfig
390+
selectors:
391+
- cel:
392+
expression: device.driver == 'gpu.nvidia.com' && device.attributes['gpu.nvidia.com'].type == 'gpu'
393+
```
394+
395+
If the above driver is deployed in a cluster with three nodes with two different GPUs, say `RTX 4080` and `RTX 3080`.
396+
397+
The user consumes the GPU using the following spec:
398+
```yaml
399+
apiVersion: kubevirt.io/v1
400+
kind: VirtualMachineInstance
401+
metadata:
402+
name: vmi-fedora
403+
namespace: gpu-test1
404+
spec:
405+
resourceClaims:
406+
- name: gpu-resource-claim
407+
resourceClaimTemplateName: single-gpu
408+
domain:
409+
gpus:
410+
- claim:
411+
name: gpu-resource-claim
412+
request: gpu
413+
name: example-pgpu
414+
```
415+
416+
The user will then wait for devices to be allocated. The device made available to the VMI will be available in the
417+
status:
418+
419+
```yaml
420+
apiVersion: kubevirt.io/v1
421+
kind: VirtualMachineInstance
422+
metadata:
423+
name: vmi-fedora
424+
namespace: gpu-test1
425+
spec:
426+
resourceClaims:
427+
- name: gpu-resource-claim
428+
resourceClaimTemplateName: single-gpu
429+
domain:
430+
gpus:
431+
- claim:
432+
name: gpu-resource-claim
433+
request: gpu
434+
name: example-pgpu
435+
status:
436+
deviceStatus:
437+
gpuStatuses:
438+
- deviceResourceClaimStatus:
439+
deviceAttributes:
440+
pciAddress:
441+
string: 0000:01:00.0
442+
productName:
443+
string: RTX 4080
444+
type:
445+
string: gpu
446+
deviceName: gpu-0
447+
resourceClaimName: virt-launcher-vmi-fedora-hhzgn-gpu-resource-claim-c26kh
448+
name: example-pgpu
449+
```
450+
343451
### DRA API for reading device related information
344452

345453
The examples below shows the APIs used to generate the vmi.status.deviceStatuses section:
@@ -521,7 +629,6 @@ type ResourceClaimSource struct {
521629
// be set.
522630
ResourceClaimTemplateName string `json:"resourceClaimTemplateName"`
523631
}
524-
525632
```
526633
527634
This design misses the use-case where more than one DRA device is specified in the claim template, as each
@@ -530,6 +637,57 @@ device will have its own template in the API.
530637
This design also assumes that the deviceName will be provided in the ClaimParameters, which requires the DRA drivers
531638
to have a ClaimParameters.spec.deviceName in their spec.
532639
640+
641+
## Alternative 2
642+
643+
Asking the dra plugin authors to inject env variable to CDI spec.
644+
645+
In order to uniquely identify the device required by the vmi spec, the follow env variable will have to be constructed:
646+
647+
```
648+
PCI_RESOURCE_<RESOURCE-CLAIM-NAME>_<REQUEST-NAME>="0000:01:00.0"
649+
```
650+
651+
Where the RESOURCE-CLAIM-NAME is the name of the ResourceClaim k8s object created either from ResourceClaimTemplate, or
652+
directly by the user. The REQUEST-NAME is the name of the request available in `vmi.spec.domain.devices.gpu/hostdevices.claims[*].request`
653+
654+
In the case of MDEV devices it will be:
655+
656+
```
657+
MDEV_PCI_RESOURCE_<RESOURCE-CLAIM-NAME>_<REQUEST-NAME>="uuid"
658+
```
659+
660+
For this approach the following static fields are required in VMI
661+
```go
662+
type VirtualMachineInstanceStatus struct {
663+
..
664+
..
665+
// ResourceClaimStatuses reflects the state of devices resourceClaims defined in virt-launcher pod.
666+
// This is an optional field available only when DRA feature gate is enabled
667+
// +optional
668+
ResourceClaimStatuses []PodResourceClaimStatus `json:"resourceClaimStatuses,omitempty"`
669+
}
670+
671+
type PodResourceClaimStatus struct {
672+
// Name uniquely identifies this resource claim inside the pod.
673+
// This must match the name of an entry in pod.spec.resourceClaims,
674+
// which implies that the string must be a DNS_LABEL.
675+
Name string `json:"name" protobuf:"bytes,1,name=name"`
676+
677+
// ResourceClaimName is the name of the ResourceClaim that was
678+
// generated for the Pod in the namespace of the Pod. If this is
679+
// unset, then generating a ResourceClaim was not necessary. The
680+
// pod.spec.resourceClaims entry can be ignored in this case.
681+
//
682+
// +optional
683+
ResourceClaimName *string `json:"resourceClaimName,omitempty" protobuf:"bytes,2,opt,name=resourceClaimName"`
684+
}
685+
```
686+
687+
virt-launcher will use the `vmistatus.resourClaimStatuses[*].ResourceClaimName` and `vmi.spec.domain.devices.gpu/hostdevices.claims[*].request`
688+
to look up the env variable: `PCI_RESOURCE_<RESOURCE-CLAIM-NAME>_<REQUEST-NAME>` or
689+
`MDEV_PCI_RESOURCE_<RESOURCE-CLAIM-NAME>_<REQUEST-NAME>="uuid"` and generate the correct domxml
690+
533691
# References
534692

535693
- Structured parameters
@@ -538,3 +696,5 @@ to have a ClaimParameters.spec.deviceName in their spec.
538696
https://github.com/kubernetes/enhancements/issues/4381
539697
- DRA
540698
https://kubernetes.io/docs/concepts/scheduling-eviction/dynamic-resource-allocation/
699+
- NVIDIA DRA driver
700+
https://github.com/NVIDIA/k8s-dra-driver

0 commit comments

Comments
 (0)