@@ -35,9 +35,10 @@ control of their devices using Virtual Machines and Containers.
3535
3636- As a user, I would like to use my GPU dra driver with KubeVirt
3737- As a user, I would like to use KubeVirt's default driver
38+ - As a user, in heterogeneous clusters, i.e. clusters made of nodes with different hardware managed through DRA drivers,
39+ I should be able to easily identify what hardware was allocated to the VMI
3840- As a developer, I would like APIs to be extensible so I can develop drivers/webhooks/automation for custom use-cases
39- - As a device-plugin author, I would like to have an easy way to support KubeVirt
40- - As a device-plugin author, I would like to have a common mechanism for exposing devices for containers and VMs
41+ - As a device-plugin author, I would like to have a well documented way, intuitive way to support devices in KubeVirt
4142
4243## Use Cases
4344
@@ -233,16 +234,12 @@ status:
233234 gpuStatuses:
234235 - deviceResourceClaimStatus:
235236 deviceAttributes:
236- driverVersion:
237- version: 1.0.0
238- index:
239- int: 0
240- model:
241- string: LATEST-GPU-MODEL
242- uuid:
243- string: gpu-8e942949-f10b-d871-09b0-ee0657e28f90
244- pciAddress:
245- string: 0000:01:00.0
237+ pciAddress:
238+ string: 0000:65:00.0
239+ productName:
240+ string: RTX 4080
241+ type:
242+ string: gpu
246243 deviceName: gpu-0
247244 resourceClaimName: virt-launcher-vmi-fedora-9bjwb-gpu-resource-claim-m4k28
248245 name: pgpu
@@ -340,6 +337,117 @@ spec:
340337 resourceClaimTemplateName: test-pci-claim-template
341338```
342339
340+ #### Comparing DRA APIs with Device Plugins
341+
342+ In the case of device plugins, a pre-defined status resource which is usually identified by a device model, e.g.
343+ ` nvidia.com/GP102GL_Tesla_P40 ` is configured. Users consume this device via the following spec:
344+ ``` yaml
345+ apiVersion : kubevirt.io/v1alpha3
346+ kind : VirtualMachineInstance
347+ metadata :
348+ labels :
349+ special : vmi-gpu
350+ name : vmi-gpu
351+ spec :
352+ domain :
353+ devices :
354+ gpus :
355+ - deviceName : nvidia.com/GP102GL_Tesla_P40
356+ name : pgpu
357+ ` ` `
358+
359+ In the case of DRA there is a level of indirection, where the information about what device is allocated to the VMI
360+ could be lost in the resource claim object. For example, consider a ResourceClaimTemplate:
361+
362+ ` ` ` yaml
363+ apiVersion : resource.k8s.io/v1alpha3
364+ kind : ResourceClaimTemplate
365+ metadata :
366+ name : single-gpu
367+ namespace : gpu-test1
368+ spec :
369+ spec :
370+ devices :
371+ requests :
372+ - allocationMode : ExactCount
373+ count : 1
374+ deviceClassName : vfiopci.nvidia.com
375+ name : gpu
376+ ---
377+ apiVersion : resource.k8s.io/v1alpha3
378+ kind : DeviceClass
379+ metadata :
380+ name : vfiopci.example.com
381+ spec :
382+ config :
383+ - opaque :
384+ driver : gpu.nvidia.com
385+ parameters :
386+ apiVersion : gpu.nvidia.com/v1alpha1
387+ driverConfig :
388+ driver : vfio-pci
389+ kind : GpuConfig
390+ selectors :
391+ - cel :
392+ expression : device.driver == 'gpu.nvidia.com' && device.attributes['gpu.nvidia.com'].type == 'gpu'
393+ ` ` `
394+
395+ If the above driver is deployed in a cluster with three nodes with two different GPUs, say ` RTX 4080` and `RTX 3080`.
396+
397+ The user consumes the GPU using the following spec :
398+ ` ` ` yaml
399+ apiVersion: kubevirt.io/v1
400+ kind: VirtualMachineInstance
401+ metadata:
402+ name: vmi-fedora
403+ namespace: gpu-test1
404+ spec:
405+ resourceClaims:
406+ - name: gpu-resource-claim
407+ resourceClaimTemplateName: single-gpu
408+ domain:
409+ gpus:
410+ - claim:
411+ name: gpu-resource-claim
412+ request: gpu
413+ name: example-pgpu
414+ ` ` `
415+
416+ The user will then wait for devices to be allocated. The device made available to the VMI will be available in the
417+ status :
418+
419+ ` ` ` yaml
420+ apiVersion: kubevirt.io/v1
421+ kind: VirtualMachineInstance
422+ metadata:
423+ name: vmi-fedora
424+ namespace: gpu-test1
425+ spec:
426+ resourceClaims:
427+ - name: gpu-resource-claim
428+ resourceClaimTemplateName: single-gpu
429+ domain:
430+ gpus:
431+ - claim:
432+ name: gpu-resource-claim
433+ request: gpu
434+ name: example-pgpu
435+ status:
436+ deviceStatus:
437+ gpuStatuses:
438+ - deviceResourceClaimStatus:
439+ deviceAttributes:
440+ pciAddress:
441+ string: 0000:01:00.0
442+ productName:
443+ string: RTX 4080
444+ type:
445+ string: gpu
446+ deviceName: gpu-0
447+ resourceClaimName: virt-launcher-vmi-fedora-hhzgn-gpu-resource-claim-c26kh
448+ name: example-pgpu
449+ ` ` `
450+
343451# ## DRA API for reading device related information
344452
345453The examples below shows the APIs used to generate the vmi.status.deviceStatuses section :
@@ -521,7 +629,6 @@ type ResourceClaimSource struct {
521629 // be set.
522630 ResourceClaimTemplateName string ` json:"resourceClaimTemplateName" `
523631}
524-
525632```
526633
527634This design misses the use-case where more than one DRA device is specified in the claim template, as each
@@ -530,6 +637,57 @@ device will have its own template in the API.
530637This design also assumes that the deviceName will be provided in the ClaimParameters, which requires the DRA drivers
531638to have a ClaimParameters.spec.deviceName in their spec.
532639
640+
641+ ## Alternative 2
642+
643+ Asking the dra plugin authors to inject env variable to CDI spec.
644+
645+ In order to uniquely identify the device required by the vmi spec, the follow env variable will have to be constructed:
646+
647+ ```
648+ PCI_RESOURCE_ <RESOURCE-CLAIM-NAME >_ <REQUEST-NAME >="0000:01:00.0"
649+ ```
650+
651+ Where the RESOURCE-CLAIM-NAME is the name of the ResourceClaim k8s object created either from ResourceClaimTemplate, or
652+ directly by the user. The REQUEST-NAME is the name of the request available in `vmi.spec.domain.devices.gpu/hostdevices.claims[*].request`
653+
654+ In the case of MDEV devices it will be:
655+
656+ ```
657+ MDEV_PCI_RESOURCE_ <RESOURCE-CLAIM-NAME >_ <REQUEST-NAME >="uuid"
658+ ```
659+
660+ For this approach the following static fields are required in VMI
661+ ```go
662+ type VirtualMachineInstanceStatus struct {
663+ ..
664+ ..
665+ // ResourceClaimStatuses reflects the state of devices resourceClaims defined in virt-launcher pod.
666+ // This is an optional field available only when DRA feature gate is enabled
667+ // +optional
668+ ResourceClaimStatuses []PodResourceClaimStatus `json:"resourceClaimStatuses,omitempty"`
669+ }
670+
671+ type PodResourceClaimStatus struct {
672+ // Name uniquely identifies this resource claim inside the pod.
673+ // This must match the name of an entry in pod.spec.resourceClaims,
674+ // which implies that the string must be a DNS_LABEL.
675+ Name string `json:"name" protobuf:"bytes,1,name=name"`
676+
677+ // ResourceClaimName is the name of the ResourceClaim that was
678+ // generated for the Pod in the namespace of the Pod. If this is
679+ // unset, then generating a ResourceClaim was not necessary. The
680+ // pod.spec.resourceClaims entry can be ignored in this case.
681+ //
682+ // +optional
683+ ResourceClaimName *string `json:"resourceClaimName,omitempty" protobuf:"bytes,2,opt,name=resourceClaimName"`
684+ }
685+ ```
686+
687+ virt-launcher will use the ` vmistatus.resourClaimStatuses[*].ResourceClaimName ` and ` vmi.spec.domain.devices.gpu/hostdevices.claims[*].request `
688+ to look up the env variable: ` PCI_RESOURCE_<RESOURCE-CLAIM-NAME>_<REQUEST-NAME> ` or
689+ ` MDEV_PCI_RESOURCE_<RESOURCE-CLAIM-NAME>_<REQUEST-NAME>="uuid" ` and generate the correct domxml
690+
533691# References
534692
535693- Structured parameters
@@ -538,3 +696,5 @@ to have a ClaimParameters.spec.deviceName in their spec.
538696 https://github.com/kubernetes/enhancements/issues/4381
539697- DRA
540698 https://kubernetes.io/docs/concepts/scheduling-eviction/dynamic-resource-allocation/
699+ - NVIDIA DRA driver
700+ https://github.com/NVIDIA/k8s-dra-driver
0 commit comments