@@ -4,7 +4,7 @@ and removing - network interfaces from running Virtual Machines, without
44requiring a restart.
55
66## Motivation
7- Hot-plug / hot-unplug (add / remove) nics to running VMs is an industry
7+ Hot-plug / hot-unplug (add / remove) NICs to running VMs is an industry
88standard available in multiple platforms, allowing the dynamic attachment of L2
99networks. This is useful when the workload (VM) cannot tolerate a restart when
1010attaching / removing networks, of for scenarios where, for instance, the
@@ -56,7 +56,7 @@ administrator to provision it for them.
5656## Multus
5757Multus - a CNI plugin - only handles the ADD / REMOVE verb, and is triggered
5858by kubelet only when the pod's sandbox is created - or removed. Given its
59- simplicity, it assumes no networks exist whenever it is executed, and procceeds
59+ simplicity, it assumes no networks exist whenever it is executed, and proceeds
6060to call the ADD / DEL for ** all** networks listed in its
6161` k8s.v1.cni.cncf.io/networks ` annotation.
6262
@@ -65,7 +65,7 @@ must be refactored to enable it to be triggered not only when the pod's sandbox
6565is created, but also on-demand - i.e. whenever the pod's
6666` k8s.v1.cni.cncf.io/networks ` are updated.
6767
68- To do that, a controller residing on a long lived process must be introduced.
68+ To do that, a controller residing on a long- lived process must be introduced.
6969An important detail to take into account is this controller will end up being a
7070CNI client; as a result, it needs to instruct CNI with parameters such as
7171container id, and container netns path (which are CNI inputs). For this, this
@@ -101,13 +101,13 @@ A "thin CNI plugin" runs as a one-shot process, typically as a binary on disk
101101executed on a Kubernetes host machine.
102102
103103A "thick CNI Plugin", on the other hand, is a CNI component composed of two (or
104- more) parts, usually composed of "shim", and a long lived process (daemon)
104+ more) parts, usually composed of "shim", and a long- lived process (daemon)
105105resident in memory. The "shim" is a lightweight "thin CNI plugin" component that
106106simply passes CNI parameters (such as JSON configuration, and environment
107107variables) to the daemon component, which then processes the CNI request.
108108
109- To transform multus into a thick plugin, it is needed to instantiate a long
110- lived process - which will be the multus pod entrypoint - listening to a unix
109+ To transform multus into a thick plugin, it is needed to instantiate a long-lived
110+ process - which will be the multus pod entrypoint - listening to a unix
111111domain socket - this socket must be available both in the multus pod and the
112112hosts's mount namespaces; as such, a bind mount to host this socket must be
113113provided for the multus pod.
@@ -143,6 +143,74 @@ The proposed API changes for VM objects can be seen in
143143[ the VM API examples section] ( #vms ) , while the proposed API changes for the VMI
144144object can be seen in [ the VMI API examples section] ( #vmis ) .
145145
146+ ### Pod interface naming
147+ The VMs’ pod interfaces names are ordinal based (` net1 ` , ` net2 ` , …, ` netX ` ),
148+ derived from their order in the VMI spec.
149+ They are requested from Multus by specifying then in virt-launcher pod
150+ ` k8s.v1.cni.cncf.io/networks ` annotation (which is created by virt-controller).
151+
152+ Given a VM with three secondary interfaces:
153+ ``` yaml
154+ spec :
155+ networks :
156+ - name : blue-network
157+ multus : ...
158+ - name : red-network
159+ multus : ...
160+ - name : green-network
161+ multus : ...
162+ ` ` `
163+
164+ The pod's Multus networks annotation will look like so:
165+ ` ` ` json
166+ " k8s.v1.cni.cncf.io/networks " : [
167+ {"interface": "net1", ...},
168+ {"interface": "net2", ...},
169+ {"interface": "net3", ...}
170+ ]
171+ ```
172+
173+ In the scenario where "red-network" interface is unplugged, the annotation will change as follows:
174+ ``` json
175+ "k8s.v1.cni.cncf.io/networks" : [
176+ {"interface" : " net1" , ... },
177+ {"interface" : " net3" , ... }
178+ ]
179+ ```
180+ Now it's impossible to associate between all VMI spec and pod interfaces.
181+
182+ Thus, the name of the ` virt-launcher ` pod network interfaces must be generated
183+ without relying on the interface order in the spec, allowing the unplug feature.
184+
185+ The pod interface name will be derived from the ` kubevirt-spec-iface-name ` ;
186+ we'll simply compute a Hash of the interface name
187+ (which is guaranteed to be unique within each VMI), and ensure all generated
188+ names for the pod's networking infrastructure are accepted by the kernel.
189+ Refer to the following list for examples of names on pod networking infra:
190+ - VM interface name: ` iface1 `
191+ - pod interface name: ` pod7e0055a6880 `
192+ - in-pod bridge name: ` k6t-7e0055a6880 `
193+ - dummy pod nic name: ` 7e0055a6880-nic `
194+
195+ > ** Note:** The kernel limitation for max interface name length is 15 characters.
196+
197+ > ** Note:**
198+ > Since the bridge and dummy interfaces used internally they can be changed in the future to have 3 chacter digits and align with the new formatting:
199+ > - in-pod bridge name: ` bri7e0055a6880 `
200+ > - dummy pod nic name: ` dum7e0055a6880 `
201+
202+
203+ The proposed algorithm is SHA256; here's a minimal implementation:
204+ ``` golang
205+ func PodInterfaceName (vmiSpecInterfaceName string ) string {
206+ // allows the dummy pod suffix (`-nic`) to fit the kernel limitation of 15 chars.
207+ const MaxIfaceNameLen = 11
208+ hash := sha256.New ()
209+ _, _ = io.WriteString (hash, vmiSpecInterfaceName)
210+ return fmt.Sprintf (" %x " , hash.Sum (nil ))[:MaxIfaceNameLen]
211+ }
212+ ```
213+
146214## VMI flows
147215
148216### virtctl
@@ -162,28 +230,27 @@ The `virt-api` subresource handlers will then proceed to patch the VMI spec
162230` spec.domain.devices.interfaces ` , and ` spec.networks ` .
163231
164232### virt-controller
165- A VMI update will be trigered in virt-controller, during which we must patch
233+ A VMI update will be triggered in virt-controller, during which we must patch
166234the ` k8s.v1.cni.cncf.io/networks ` annotation on the pod holding the VM, which
167235in turn causes multus to hotplug an interface into the pod.
168236
169- The request to plug this newly created pod interface into the VM will then be
170- forwarded to the correct ` virt-handler ` .
237+ The request to plug/unplug will then be forwarded to the correct ` virt-handler ` .
171238
172239### virt-handler
173240Finally, ` KubeVirt ` s agent in the node will create - and configure - any
174241required networking infrastructure, and finally tap into the correct
175242` virt-launcher ` s namespaces to execute the commands required to hot plug / hot
176243unplug the network interfaces.
177244
178- ** NOTE:** The feature is protected by the ` HotplugInterfaces ` feature gate.
245+ ** NOTE:** The feature is protected by the ` HotplugNICs ` feature gate.
179246
180247## VM flows
181248The flows to patch up the VMI object are a subset of the steps required to
182249hot-plug an interface into a VM. This means that some extra initial steps are
183250required to update the corresponding VMI networks and interfaces specs, but
184251afterwards, the flows are common.
185252
186- As with VMIs, it starts with issueing a ` virtctl ` command.
253+ As with VMIs, it starts with issuing a ` virtctl ` command.
187254
188255### virtctl
189256To hot-plug a new NIC into a running VMI, the user would execute the following
@@ -197,28 +264,6 @@ $ virtctl addinterface <vmi-name> \
197264
198265For hot-unplugging, use the ` removeinterface ` command instead.
199266
200- ** NOTE** : the pod interface name will be derived from the
201- ` kubevirt-spec-iface-name ` ; we'll simply compute an Hash of the interface name
202- (which is guaranteed to be unique within each VMI), and ensure all generated
203- names for the pod's networking infrastructure are accepted by the kernel. Refer
204- to the following list for examples of names on pod networking infra:
205-
206- - VM interface name: iface1
207- - pod interface name: net7e0055a6
208- - in-pod bridge name: k6t-net7e0055a6
209- - dummy pod nic name: net7e0055a6-nic
210-
211- The proposed algorithm is SHA256; here's a minimal implementation:
212- ``` golang
213- func PodInterfaceName (vmiSpecInterfaceName string ) string {
214- const MaxIfaceNameLen = 11 // allows the dummy pod sufix (`-nic`) to fit the
215- // kernel limitation of 15 chars.
216- hash := sha256.New ()
217- _, _ = io.WriteString (hash, vmiSpecInterfaceName)
218- return fmt.Sprintf (" net%x " , hash.Sum (nil ))[:MaxIfaceNameLen]
219- }
220- ```
221-
222267### virt-api
223268The ` virt-api ` subresource handlers will then proceed to patch the VM status
224269with a ` VirtualMachineInterfaceRequest ` .
@@ -240,29 +285,29 @@ type VirtualMachineStatus struct {
240285}
241286
242287type VirtualMachineInterfaceRequest struct {
243- // AddInterfaceOptions when set indicates an interface should be added.
244- AddInterfaceOptions *AddInterfaceOptions ` json:"addInterfaceOptions,omitempty" optional:"true"`
245-
246- // RemoveInterfaceOptions when set indicates an interface should be removed.
247- RemoveInterfaceOptions *RemoveInterfaceOptions ` json:"removeInterfaceOptions,omitempty" optional:"true"`
288+ // AddInterfaceOptions when set indicates a network interface should be added.
289+ // The details within this field specify how to add the interface
290+ AddInterfaceOptions *AddInterfaceOptions ` json:"addInterfaceOptions,omitempty" optional:"true"`
291+ // RemoveInterfaceOptions when set indicates a network interface should be removed.
292+ // The details within this field specify how to remove the interface
293+ RemoveInterfaceOptions *RemoveInterfaceOptions ` json:"removeInterfaceOptions,omitempty" optional:"true"`
248294}
249295
296+
250297// AddInterfaceOptions is provided when dynamically hot plugging a network interface
251298type AddInterfaceOptions struct {
252- // NetworkName indicates the name of the multus network - i.e. the network-attachment-definition name
253- NetworkName string ` json:"networkName"`
254-
255- // InterfaceName indicates the name of the network / interface in the KubeVirt VMI spec
256- InterfaceName string ` json:"interfaceName"`
299+ // NetworkAttachmentDefinitionName references a NetworkAttachmentDefinition CRD object. Format:
300+ // <networkAttachmentDefinitionName>, <namespace>/<networkAttachmentDefinitionName>. If namespace is not
301+ // specified, VMI namespace is assumed.
302+ NetworkAttachmentDefinitionName string ` json:"networkAttachmentDefinitionName"`
303+ // Name indicates the logical name of the interface.
304+ Name string ` json:"name"`
257305}
258306
259307// RemoveInterfaceOptions is provided when dynamically hot unplugging a network interface
260308type RemoveInterfaceOptions struct {
261- // NetworkName indicates the name of the multus network - i.e. the network-attachment-definition name
262- NetworkName string ` json:"networkName"`
263-
264- // InterfaceName indicates the name of the network / interface in the KubeVirt VMI spec
265- InterfaceName string ` json:"interfaceName"`
309+ // Name indicates the logical name of the interface.
310+ Name string ` json:"name"`
266311}
267312```
268313
@@ -282,7 +327,7 @@ type VirtualMachineInstanceNetworkInterface struct {
282327```
283328
284329The proposed ` VirtualMachineInstanceNetworkInterface ` status change is required
285- to block the the ` virt-handler ` component until it realizes the multus dynamic
330+ to block the ` virt-handler ` component until it realizes the multus dynamic
286331networks controller has already finished configuring the pod interface
287332accordingly - there would otherwise be a race between the CNI plugin and
288333` virt-handler ` (virt-handler could see the pod interface created but ** missing**
@@ -377,6 +422,119 @@ spec:
377422The aforementioned update will trigger multus to start the CNI ADD flow for the
378423network named ` macvlan-conf-2`.
379424
425+ # ## Unplug for pods
426+ Following [hotplug for pods example](#hotplug-for-pods), to unplug an interface, update the pod to:
427+ ` ` ` yaml
428+ apiVersion: v1
429+ kind: Pod
430+ metadata:
431+ name: pod-case-03
432+ annotations:
433+ k8s.v1.cni.cncf.io/networks: macvlan-conf-2
434+ spec:
435+ containers:
436+ - name: pod-case-03
437+ image: docker.io/centos/tools:latest
438+ command:
439+ - /sbin/init
440+ ` ` `
441+
442+ The aforementioned update will trigger multus to start the CNI DEL flow for the
443+ network named `macvlan-conf-1`.
444+
445+ # # Backward Compatibility
446+ # ## Legacy VM's virt-launcher pods interface naming
447+ *Legacy VM - running VMs prior to KubeVirt version that introduce
448+ the pod interface naming change.
449+ These VMs run in an old virt-launcher pod*
450+
451+ Changing the virt-launcher pod interface name scheme breaks backward compatibility
452+ in a way that legacy VMs won't be able to migrate, See the diagram below :
453+ 
454+
455+ 1. The VM originally runs on top of virt-launcher pod from version v0.59.0
456+ 2. Kubevirt upgrades to v0.60.0.
457+ 3. The VM is migrated.
458+ 4. The migration target pod is created from the new image (v0.60.0) with interface names in form of the new name-scheme - `7e0055a6880`.
459+ But the interface name in the migration domain XML is in form of the old name scheme - `tap1`.
460+
461+ The proposed solution is having virt-controller to create the migration target pod with
462+ ` k8s.v1.cni.cncf.io/networks` annotation with same pod interface names as the migration
463+ source pod annotation.
464+
465+ The migration target pod interfaces names will then match the names in the
466+ incoming migration domain XML and the migration process will start.
467+
468+ In case the user migrates the VM again, same as before, the migration target
469+ ` pods k8s.v1.cni.cncf.io/networks` annotation value will have the same interface
470+ names as in the migration source annotation.
471+
472+ # ## Unplug interface of a legacy VM
473+ # ### Story 1
474+ Running legacy VM with secondary networks, with the following networks in the spec :
475+ ` ` ` yaml
476+ spec:
477+ networks:
478+ - name: blue-network
479+ multus:
480+ networkName: blue-net-br
481+ - name: red-network
482+ multus:
483+ networkName: red-net-br
484+ ` ` `
485+ >**Note**: `blue-net-br` and `red-net-br` are the `NetworkAttachmentDefinition` name.
486+
487+ The VM pod network-status annotation will look as follows :
488+ ` ` ` json
489+ "k8s.v1.cni.cncf.io/networks-status": [
490+ { "interface": "net1", "name": "blue-net-br", ...},
491+ { "interface": "net2", "name": "red-net-br", ...},
492+ ]
493+ ` ` `
494+
495+ Unplugging `net1` and `net2` should be blocked because it makes mapping between
496+ the VMI networks names and the pod interfaces names impossible.
497+
498+ # ### Story 2
499+ Running legacy VM that was migrated following Kubevirt upgrade, and has new interfaces that were hot-plugged into it.
500+ The VM pod will have interfaces with names in the form of the old name scheme, and some in the form of the new name scheme.
501+
502+ The VMI networks spec will be like so :
503+ ` ` ` yaml
504+ spec:
505+ networks:
506+ - name: blue-network
507+ multus:
508+ networkName: blue-net-br
509+ - name: red-network
510+ multus:
511+ networkName: red-net-br
512+ - name: green-network
513+ multus:
514+ networkName: green-net-br
515+ - name: yellow-network
516+ multus:
517+ networkName: yellow-net-br
518+ ` ` `
519+ The VM pod network-status annotation will look as follows :
520+ ` ` ` json
521+ "k8s.v1.cni.cncf.io/networks-status": [
522+ { "interface": "net1", "name": "blue-net-br", ...},
523+ { "interface": "net2", "name": "red-net-br", ...},
524+ { "interface": "netXYZ123", "name": "green-net-br", ...},
525+ { "interface": "netABC456", "name": "yellow-net-br", ...},
526+ ]
527+ ` ` `
528+ Similar to the [story 1](#story-1), unplugging `net1` or `net2` will make it impossible to map
529+ between the VMI networks and the pod interfaces names.
530+
531+ The proposed solution is to block unplug for VMs' interfaces whose pod network
532+ interface name is in form of the ordinal naming scheme (i.e `net1`, `net2`, ...).
533+
534+ 1. virt-controller shall check VM's pod `k8s.v1.cni.cncf.io/network-status`
535+ annotation, for interfaces named using the ordinal naming scheme.
536+ 2. if any are found, reject the request and raise warning an event.
537+
380538# # Functional Testing Approach
381539Functional testing will use the network sig KubeVirt lanes -
382540` k8s-<x.y>-sig-network` . These lanes must be used since this feature is network
@@ -387,11 +545,19 @@ performed:
387545* plug a new NIC into a running VM
388546* unplug a NIC from a running VM (can be performed in the previous test
389547 teardown)
390- * migrate a VM having an hot-plugged interface
548+ * migrate a VM having a hot-plugged interface
391549
392- All these tests have as pre-requirements that the `HotplugInterfaces ` feature
550+ All these tests have as pre-requirements that the `HotplugNICs ` feature
393551gate is enabled, **and** a secondary network provisioned.
394552
553+ The pod network interfaces naming tests shall cover :
554+ * Running VMs prior to Kubevirt version that introduce the naming change,
555+ can be migrated following KubeVirt upgrade (when workload-strategy is set to Migrate), and after
556+ Kubevirt upgrade following user request.
557+
558+ It should cover changes around virt-launcher pod interface name change and in general cover
559+ backward compatibility for changes related to the networking code.
560+
395561# ## Multus functional tests
396562In multus, new functional tests must be added that cover the following
397563scenarios :
@@ -407,10 +573,20 @@ scenarios:
4075733. Add a controller monitoring pod attachment updates
4085744. **C** Consume this dynamic networks functionality via CNAO
4095755. **K** Add the hot-plug functionality to KubeVirt for L2 and L3 networks
410- (with IPAM enabled on the pod interface)
411- 6. **K** Add the hot-unplug functionality to KubeVirt for L2 and L3 networks
576+ (with IPAM enabled on the pod interface)`
577+ 6. **K** Change virt-launcher pod network interfaces name scheme
578+ 7. **K** Add a remove-interface command at `virtctl` and correspond endpoints at `virt-api`, **support VMI objects only**
579+ 8. **K** Detach the requested interface from the guest through Libvirt API, **support VMI objects only**.
580+ 9. **K** Extend the `InterfaceRequests` API to support remove-interface requests.
581+ 10. **K** Extend the remove-interface `virtctl`'s command and `virt-api`'s endpoints to support `VirtualMachine`.
582+ 11. **K** Implement `virt-controller` pod annotation patching for unplug requests.
583+ 12. **K** Cleanup the unplugged interface's bridge and tap-device from virt-launcher pods.
584+ 13. **K** Shut down the unplugged interface IPAM DHCP server instance.
412585
413586**Notes:**
414587* the action items listed above have either `M`, `K`, or `C` to
415588indicate in which project should it be implemented.
416589* the MVP version would be composed of steps 1 through 4, inclusive.
590+ * the MVP for unplug functionality would be composed of steps 6 through 8, inclusive.
591+ * Until step 12 and 13 are implemented, the unplugged interface's bridge, tap device
592+ and DHCP server will remain in the launcher pod, until the VM is migrated.
0 commit comments