Skip to content

Commit a961ef1

Browse files
authored
hotplug-nics: Add pod interface naming change content (#217)
* hotplug-nics: Add pod network name-scheme change content Signed-off-by: Or Mergi <ormergi@redhat.com> * hotplug-nics: Update API objects Signed-off-by: Or Mergi <ormergi@redhat.com> * hotplug-nics: Update feature-gate name Signed-off-by: Or Mergi <ormergi@redhat.com> * hotplug-nics: Fix typos Signed-off-by: Or Mergi <ormergi@redhat.com> --------- Signed-off-by: Or Mergi <ormergi@redhat.com>
1 parent f52f91a commit a961ef1

File tree

2 files changed

+229
-53
lines changed

2 files changed

+229
-53
lines changed

design-proposals/nic-hotplug/nic-hotplug.md

Lines changed: 229 additions & 53 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ and removing - network interfaces from running Virtual Machines, without
44
requiring a restart.
55

66
## Motivation
7-
Hot-plug / hot-unplug (add / remove) nics to running VMs is an industry
7+
Hot-plug / hot-unplug (add / remove) NICs to running VMs is an industry
88
standard available in multiple platforms, allowing the dynamic attachment of L2
99
networks. This is useful when the workload (VM) cannot tolerate a restart when
1010
attaching / removing networks, of for scenarios where, for instance, the
@@ -56,7 +56,7 @@ administrator to provision it for them.
5656
## Multus
5757
Multus - a CNI plugin - only handles the ADD / REMOVE verb, and is triggered
5858
by kubelet only when the pod's sandbox is created - or removed. Given its
59-
simplicity, it assumes no networks exist whenever it is executed, and procceeds
59+
simplicity, it assumes no networks exist whenever it is executed, and proceeds
6060
to call the ADD / DEL for **all** networks listed in its
6161
`k8s.v1.cni.cncf.io/networks` annotation.
6262

@@ -65,7 +65,7 @@ must be refactored to enable it to be triggered not only when the pod's sandbox
6565
is created, but also on-demand - i.e. whenever the pod's
6666
`k8s.v1.cni.cncf.io/networks` are updated.
6767

68-
To do that, a controller residing on a long lived process must be introduced.
68+
To do that, a controller residing on a long-lived process must be introduced.
6969
An important detail to take into account is this controller will end up being a
7070
CNI client; as a result, it needs to instruct CNI with parameters such as
7171
container id, and container netns path (which are CNI inputs). For this, this
@@ -101,13 +101,13 @@ A "thin CNI plugin" runs as a one-shot process, typically as a binary on disk
101101
executed on a Kubernetes host machine.
102102

103103
A "thick CNI Plugin", on the other hand, is a CNI component composed of two (or
104-
more) parts, usually composed of "shim", and a long lived process (daemon)
104+
more) parts, usually composed of "shim", and a long-lived process (daemon)
105105
resident in memory. The "shim" is a lightweight "thin CNI plugin" component that
106106
simply passes CNI parameters (such as JSON configuration, and environment
107107
variables) to the daemon component, which then processes the CNI request.
108108

109-
To transform multus into a thick plugin, it is needed to instantiate a long
110-
lived process - which will be the multus pod entrypoint - listening to a unix
109+
To transform multus into a thick plugin, it is needed to instantiate a long-lived
110+
process - which will be the multus pod entrypoint - listening to a unix
111111
domain socket - this socket must be available both in the multus pod and the
112112
hosts's mount namespaces; as such, a bind mount to host this socket must be
113113
provided for the multus pod.
@@ -143,6 +143,74 @@ The proposed API changes for VM objects can be seen in
143143
[the VM API examples section](#vms), while the proposed API changes for the VMI
144144
object can be seen in [the VMI API examples section](#vmis).
145145

146+
### Pod interface naming
147+
The VMs’ pod interfaces names are ordinal based (`net1`, `net2`, …, `netX`),
148+
derived from their order in the VMI spec.
149+
They are requested from Multus by specifying then in virt-launcher pod
150+
`k8s.v1.cni.cncf.io/networks` annotation (which is created by virt-controller).
151+
152+
Given a VM with three secondary interfaces:
153+
```yaml
154+
spec:
155+
networks:
156+
- name: blue-network
157+
multus: ...
158+
- name: red-network
159+
multus: ...
160+
- name: green-network
161+
multus: ...
162+
```
163+
164+
The pod's Multus networks annotation will look like so:
165+
```json
166+
"k8s.v1.cni.cncf.io/networks": [
167+
{"interface": "net1", ...},
168+
{"interface": "net2", ...},
169+
{"interface": "net3", ...}
170+
]
171+
```
172+
173+
In the scenario where "red-network" interface is unplugged, the annotation will change as follows:
174+
```json
175+
"k8s.v1.cni.cncf.io/networks": [
176+
{"interface": "net1", ...},
177+
{"interface": "net3", ...}
178+
]
179+
```
180+
Now it's impossible to associate between all VMI spec and pod interfaces.
181+
182+
Thus, the name of the `virt-launcher` pod network interfaces must be generated
183+
without relying on the interface order in the spec, allowing the unplug feature.
184+
185+
The pod interface name will be derived from the `kubevirt-spec-iface-name`;
186+
we'll simply compute a Hash of the interface name
187+
(which is guaranteed to be unique within each VMI), and ensure all generated
188+
names for the pod's networking infrastructure are accepted by the kernel.
189+
Refer to the following list for examples of names on pod networking infra:
190+
- VM interface name: `iface1`
191+
- pod interface name: `pod7e0055a6880`
192+
- in-pod bridge name: `k6t-7e0055a6880`
193+
- dummy pod nic name: `7e0055a6880-nic`
194+
195+
>**Note:** The kernel limitation for max interface name length is 15 characters.
196+
197+
>**Note:**
198+
> Since the bridge and dummy interfaces used internally they can be changed in the future to have 3 chacter digits and align with the new formatting:
199+
> - in-pod bridge name: `bri7e0055a6880`
200+
> - dummy pod nic name: `dum7e0055a6880`
201+
202+
203+
The proposed algorithm is SHA256; here's a minimal implementation:
204+
```golang
205+
func PodInterfaceName(vmiSpecInterfaceName string) string {
206+
// allows the dummy pod suffix (`-nic`) to fit the kernel limitation of 15 chars.
207+
const MaxIfaceNameLen = 11
208+
hash := sha256.New()
209+
_, _ = io.WriteString(hash, vmiSpecInterfaceName)
210+
return fmt.Sprintf("%x", hash.Sum(nil))[:MaxIfaceNameLen]
211+
}
212+
```
213+
146214
## VMI flows
147215

148216
### virtctl
@@ -162,28 +230,27 @@ The `virt-api` subresource handlers will then proceed to patch the VMI spec
162230
`spec.domain.devices.interfaces`, and `spec.networks`.
163231

164232
### virt-controller
165-
A VMI update will be trigered in virt-controller, during which we must patch
233+
A VMI update will be triggered in virt-controller, during which we must patch
166234
the `k8s.v1.cni.cncf.io/networks` annotation on the pod holding the VM, which
167235
in turn causes multus to hotplug an interface into the pod.
168236

169-
The request to plug this newly created pod interface into the VM will then be
170-
forwarded to the correct `virt-handler`.
237+
The request to plug/unplug will then be forwarded to the correct `virt-handler`.
171238

172239
### virt-handler
173240
Finally, `KubeVirt`s agent in the node will create - and configure - any
174241
required networking infrastructure, and finally tap into the correct
175242
`virt-launcher`s namespaces to execute the commands required to hot plug / hot
176243
unplug the network interfaces.
177244

178-
**NOTE:** The feature is protected by the `HotplugInterfaces` feature gate.
245+
**NOTE:** The feature is protected by the `HotplugNICs` feature gate.
179246

180247
## VM flows
181248
The flows to patch up the VMI object are a subset of the steps required to
182249
hot-plug an interface into a VM. This means that some extra initial steps are
183250
required to update the corresponding VMI networks and interfaces specs, but
184251
afterwards, the flows are common.
185252

186-
As with VMIs, it starts with issueing a `virtctl` command.
253+
As with VMIs, it starts with issuing a `virtctl` command.
187254

188255
### virtctl
189256
To hot-plug a new NIC into a running VMI, the user would execute the following
@@ -197,28 +264,6 @@ $ virtctl addinterface <vmi-name> \
197264

198265
For hot-unplugging, use the `removeinterface` command instead.
199266

200-
**NOTE**: the pod interface name will be derived from the
201-
`kubevirt-spec-iface-name`; we'll simply compute an Hash of the interface name
202-
(which is guaranteed to be unique within each VMI), and ensure all generated
203-
names for the pod's networking infrastructure are accepted by the kernel. Refer
204-
to the following list for examples of names on pod networking infra:
205-
206-
- VM interface name: iface1
207-
- pod interface name: net7e0055a6
208-
- in-pod bridge name: k6t-net7e0055a6
209-
- dummy pod nic name: net7e0055a6-nic
210-
211-
The proposed algorithm is SHA256; here's a minimal implementation:
212-
```golang
213-
func PodInterfaceName(vmiSpecInterfaceName string) string {
214-
const MaxIfaceNameLen = 11 // allows the dummy pod sufix (`-nic`) to fit the
215-
// kernel limitation of 15 chars.
216-
hash := sha256.New()
217-
_, _ = io.WriteString(hash, vmiSpecInterfaceName)
218-
return fmt.Sprintf("net%x", hash.Sum(nil))[:MaxIfaceNameLen]
219-
}
220-
```
221-
222267
### virt-api
223268
The `virt-api` subresource handlers will then proceed to patch the VM status
224269
with a `VirtualMachineInterfaceRequest`.
@@ -240,29 +285,29 @@ type VirtualMachineStatus struct {
240285
}
241286

242287
type VirtualMachineInterfaceRequest struct {
243-
// AddInterfaceOptions when set indicates an interface should be added.
244-
AddInterfaceOptions *AddInterfaceOptions `json:"addInterfaceOptions,omitempty" optional:"true"`
245-
246-
// RemoveInterfaceOptions when set indicates an interface should be removed.
247-
RemoveInterfaceOptions *RemoveInterfaceOptions `json:"removeInterfaceOptions,omitempty" optional:"true"`
288+
// AddInterfaceOptions when set indicates a network interface should be added.
289+
// The details within this field specify how to add the interface
290+
AddInterfaceOptions *AddInterfaceOptions `json:"addInterfaceOptions,omitempty" optional:"true"`
291+
// RemoveInterfaceOptions when set indicates a network interface should be removed.
292+
// The details within this field specify how to remove the interface
293+
RemoveInterfaceOptions *RemoveInterfaceOptions `json:"removeInterfaceOptions,omitempty" optional:"true"`
248294
}
249295

296+
250297
// AddInterfaceOptions is provided when dynamically hot plugging a network interface
251298
type AddInterfaceOptions struct {
252-
// NetworkName indicates the name of the multus network - i.e. the network-attachment-definition name
253-
NetworkName string `json:"networkName"`
254-
255-
// InterfaceName indicates the name of the network / interface in the KubeVirt VMI spec
256-
InterfaceName string `json:"interfaceName"`
299+
// NetworkAttachmentDefinitionName references a NetworkAttachmentDefinition CRD object. Format:
300+
// <networkAttachmentDefinitionName>, <namespace>/<networkAttachmentDefinitionName>. If namespace is not
301+
// specified, VMI namespace is assumed.
302+
NetworkAttachmentDefinitionName string `json:"networkAttachmentDefinitionName"`
303+
// Name indicates the logical name of the interface.
304+
Name string `json:"name"`
257305
}
258306

259307
// RemoveInterfaceOptions is provided when dynamically hot unplugging a network interface
260308
type RemoveInterfaceOptions struct {
261-
// NetworkName indicates the name of the multus network - i.e. the network-attachment-definition name
262-
NetworkName string `json:"networkName"`
263-
264-
// InterfaceName indicates the name of the network / interface in the KubeVirt VMI spec
265-
InterfaceName string `json:"interfaceName"`
309+
// Name indicates the logical name of the interface.
310+
Name string `json:"name"`
266311
}
267312
```
268313

@@ -282,7 +327,7 @@ type VirtualMachineInstanceNetworkInterface struct {
282327
```
283328

284329
The proposed `VirtualMachineInstanceNetworkInterface` status change is required
285-
to block the the `virt-handler` component until it realizes the multus dynamic
330+
to block the `virt-handler` component until it realizes the multus dynamic
286331
networks controller has already finished configuring the pod interface
287332
accordingly - there would otherwise be a race between the CNI plugin and
288333
`virt-handler` (virt-handler could see the pod interface created but **missing**
@@ -377,6 +422,119 @@ spec:
377422
The aforementioned update will trigger multus to start the CNI ADD flow for the
378423
network named `macvlan-conf-2`.
379424

425+
### Unplug for pods
426+
Following [hotplug for pods example](#hotplug-for-pods), to unplug an interface, update the pod to:
427+
```yaml
428+
apiVersion: v1
429+
kind: Pod
430+
metadata:
431+
name: pod-case-03
432+
annotations:
433+
k8s.v1.cni.cncf.io/networks: macvlan-conf-2
434+
spec:
435+
containers:
436+
- name: pod-case-03
437+
image: docker.io/centos/tools:latest
438+
command:
439+
- /sbin/init
440+
```
441+
442+
The aforementioned update will trigger multus to start the CNI DEL flow for the
443+
network named `macvlan-conf-1`.
444+
445+
## Backward Compatibility
446+
### Legacy VM's virt-launcher pods interface naming
447+
*Legacy VM - running VMs prior to KubeVirt version that introduce
448+
the pod interface naming change.
449+
These VMs run in an old virt-launcher pod*
450+
451+
Changing the virt-launcher pod interface name scheme breaks backward compatibility
452+
in a way that legacy VMs won't be able to migrate, See the diagram below:
453+
![](upgrades-and-pod-iface-nameing-issue.png)
454+
455+
1. The VM originally runs on top of virt-launcher pod from version v0.59.0
456+
2. Kubevirt upgrades to v0.60.0.
457+
3. The VM is migrated.
458+
4. The migration target pod is created from the new image (v0.60.0) with interface names in form of the new name-scheme - `7e0055a6880`.
459+
But the interface name in the migration domain XML is in form of the old name scheme - `tap1`.
460+
461+
The proposed solution is having virt-controller to create the migration target pod with
462+
`k8s.v1.cni.cncf.io/networks` annotation with same pod interface names as the migration
463+
source pod annotation.
464+
465+
The migration target pod interfaces names will then match the names in the
466+
incoming migration domain XML and the migration process will start.
467+
468+
In case the user migrates the VM again, same as before, the migration target
469+
`pods k8s.v1.cni.cncf.io/networks` annotation value will have the same interface
470+
names as in the migration source annotation.
471+
472+
### Unplug interface of a legacy VM
473+
#### Story 1
474+
Running legacy VM with secondary networks, with the following networks in the spec:
475+
```yaml
476+
spec:
477+
networks:
478+
- name: blue-network
479+
multus:
480+
networkName: blue-net-br
481+
- name: red-network
482+
multus:
483+
networkName: red-net-br
484+
```
485+
>**Note**: `blue-net-br` and `red-net-br` are the `NetworkAttachmentDefinition` name.
486+
487+
The VM pod network-status annotation will look as follows:
488+
```json
489+
"k8s.v1.cni.cncf.io/networks-status": [
490+
{ "interface": "net1", "name": "blue-net-br", ...},
491+
{ "interface": "net2", "name": "red-net-br", ...},
492+
]
493+
```
494+
495+
Unplugging `net1` and `net2` should be blocked because it makes mapping between
496+
the VMI networks names and the pod interfaces names impossible.
497+
498+
#### Story 2
499+
Running legacy VM that was migrated following Kubevirt upgrade, and has new interfaces that were hot-plugged into it.
500+
The VM pod will have interfaces with names in the form of the old name scheme, and some in the form of the new name scheme.
501+
502+
The VMI networks spec will be like so:
503+
```yaml
504+
spec:
505+
networks:
506+
- name: blue-network
507+
multus:
508+
networkName: blue-net-br
509+
- name: red-network
510+
multus:
511+
networkName: red-net-br
512+
- name: green-network
513+
multus:
514+
networkName: green-net-br
515+
- name: yellow-network
516+
multus:
517+
networkName: yellow-net-br
518+
```
519+
The VM pod network-status annotation will look as follows:
520+
```json
521+
"k8s.v1.cni.cncf.io/networks-status": [
522+
{ "interface": "net1", "name": "blue-net-br", ...},
523+
{ "interface": "net2", "name": "red-net-br", ...},
524+
{ "interface": "netXYZ123", "name": "green-net-br", ...},
525+
{ "interface": "netABC456", "name": "yellow-net-br", ...},
526+
]
527+
```
528+
Similar to the [story 1](#story-1), unplugging `net1` or `net2` will make it impossible to map
529+
between the VMI networks and the pod interfaces names.
530+
531+
The proposed solution is to block unplug for VMs' interfaces whose pod network
532+
interface name is in form of the ordinal naming scheme (i.e `net1`, `net2`, ...).
533+
534+
1. virt-controller shall check VM's pod `k8s.v1.cni.cncf.io/network-status`
535+
annotation, for interfaces named using the ordinal naming scheme.
536+
2. if any are found, reject the request and raise warning an event.
537+
380538
## Functional Testing Approach
381539
Functional testing will use the network sig KubeVirt lanes -
382540
`k8s-<x.y>-sig-network`. These lanes must be used since this feature is network
@@ -387,11 +545,19 @@ performed:
387545
* plug a new NIC into a running VM
388546
* unplug a NIC from a running VM (can be performed in the previous test
389547
teardown)
390-
* migrate a VM having an hot-plugged interface
548+
* migrate a VM having a hot-plugged interface
391549

392-
All these tests have as pre-requirements that the `HotplugInterfaces` feature
550+
All these tests have as pre-requirements that the `HotplugNICs` feature
393551
gate is enabled, **and** a secondary network provisioned.
394552

553+
The pod network interfaces naming tests shall cover:
554+
* Running VMs prior to Kubevirt version that introduce the naming change,
555+
can be migrated following KubeVirt upgrade (when workload-strategy is set to Migrate), and after
556+
Kubevirt upgrade following user request.
557+
558+
It should cover changes around virt-launcher pod interface name change and in general cover
559+
backward compatibility for changes related to the networking code.
560+
395561
### Multus functional tests
396562
In multus, new functional tests must be added that cover the following
397563
scenarios:
@@ -407,10 +573,20 @@ scenarios:
407573
3. Add a controller monitoring pod attachment updates
408574
4. **C** Consume this dynamic networks functionality via CNAO
409575
5. **K** Add the hot-plug functionality to KubeVirt for L2 and L3 networks
410-
(with IPAM enabled on the pod interface)
411-
6. **K** Add the hot-unplug functionality to KubeVirt for L2 and L3 networks
576+
(with IPAM enabled on the pod interface)`
577+
6. **K** Change virt-launcher pod network interfaces name scheme
578+
7. **K** Add a remove-interface command at `virtctl` and correspond endpoints at `virt-api`, **support VMI objects only**
579+
8. **K** Detach the requested interface from the guest through Libvirt API, **support VMI objects only**.
580+
9. **K** Extend the `InterfaceRequests` API to support remove-interface requests.
581+
10. **K** Extend the remove-interface `virtctl`'s command and `virt-api`'s endpoints to support `VirtualMachine`.
582+
11. **K** Implement `virt-controller` pod annotation patching for unplug requests.
583+
12. **K** Cleanup the unplugged interface's bridge and tap-device from virt-launcher pods.
584+
13. **K** Shut down the unplugged interface IPAM DHCP server instance.
412585

413586
**Notes:**
414587
* the action items listed above have either `M`, `K`, or `C` to
415588
indicate in which project should it be implemented.
416589
* the MVP version would be composed of steps 1 through 4, inclusive.
590+
* the MVP for unplug functionality would be composed of steps 6 through 8, inclusive.
591+
* Until step 12 and 13 are implemented, the unplugged interface's bridge, tap device
592+
and DHCP server will remain in the launcher pod, until the VM is migrated.
70.2 KB
Loading

0 commit comments

Comments
 (0)