Skip to content

Commit 7b2c0f0

Browse files
committed
design-proposal: Feature configurables
This design document states how features that require to have a mechanism to change it's state, e.g., enabled/disabled should be implemented in KubeVirt. Signed-off-by: Javier Cano Cano <[email protected]>
1 parent db2ea07 commit 7b2c0f0

File tree

1 file changed

+230
-0
lines changed

1 file changed

+230
-0
lines changed
Lines changed: 230 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,230 @@
1+
# Overview
2+
3+
With the introduction
4+
of [KubeVirt Feature Lifecycle](https://github.com/kubevirt/community/blob/main/design-proposals/feature-lifecycle.md)
5+
policy, features reaching General Availability (GA) need to drop their use of feature gates. This applies also to
6+
configurable features that we may want to disable.
7+
8+
## Motivation
9+
10+
Users may want certain features to be configurable, for example to make the best use out of given
11+
resources or for compliance reasons features may expose sensitive information from the host to the virtual machines
12+
instances (VMI) or add additional containers to the launcher pod, which are not required by the user.
13+
14+
The downward metrics feature is a good example of why some clusters may want to have it enabled or disabled.
15+
The downward metrics feature exposes some metrics about the host node where the VMI is running to the guest. This
16+
information may be considered sensitive information.
17+
If there is no mechanism to disable the feature, any VMI could request the metrics and inspect information that, in some
18+
cases, the admin would like to hide, creating a potential security issue, "need-to-know principle".
19+
20+
The behavior of other features might be changed by editing configurables, e.g. the maximum of CPU sockets allowed for
21+
each VMI can be configured.
22+
23+
Before the introduction
24+
of [KubeVirt Feature Lifecycle](https://github.com/kubevirt/community/blob/main/design-proposals/feature-lifecycle.md)
25+
policy, many feature gates remained after feature's graduation to GA with the sole purpose of acting as a switch for the
26+
feature. Generally speaking, this is a bad practice, because feature gates should be reserved for controlling a feature
27+
until it reaches maturity. i.e., GA. Therefore, in the case that a developer wants to provide the ability to tune/change
28+
a feature, configurables exposed in the KubeVirt CR should be provided. This should be
29+
accomplished while achieving [eventually consistency](https://en.wikipedia.org/wiki/Eventual_consistency). This forces
30+
us to avoid the feature configuration control checking on webhooks and moving the feature configuration control closer to the
31+
responsible code. Moreover, it has to be decided how the system should behave if a VMI is
32+
requiring a feature in a configuration different from what was expressed in the KubeVirt CR, or what should happen if the
33+
configuration of a feature in use is changed. (see matrix below).
34+
35+
## Goals
36+
37+
- Get a clear understanding about the features configurations.
38+
- Establish how the feature configurables should work.
39+
- Describe how the system should react in these scenarios in the case that the VMI exposes an API field to configure the
40+
features:
41+
- A feature in KubeVirt is set to state A and a VMI requests the feature to be in state B.
42+
- A feature in KubeVirt is set to state A, there are running VMIs using the feature in state A, and the feature is
43+
changed in KubeVirt to state B.
44+
- A feature in KubeVirt is set to state A, and pending VMIs want to use it.
45+
- A feature in KubeVirt is set to state A, and running VMIs using the feature in state B wants to live migrate.
46+
- Graduate features by dropping their gates and (optionally) adding spec options for them.
47+
48+
## Non Goals
49+
50+
- Describe how features protected with features gates should work.
51+
- Change how feature gates are managed. Feature gating and configuration are two completely distinct issues.
52+
53+
## Definition of Users
54+
55+
Development contributors.
56+
57+
Cluster administrators.
58+
59+
## User Stories
60+
61+
* As a cluster administrator, I want to be able to change the cluster-wide configuration of a feature by editing configurables.
62+
63+
* As VMI owner, I want to use a given feature.
64+
65+
* As a VMI owner / cluster admin, I want to understand what's the current configuration of the various features.
66+
67+
## Repos
68+
69+
Kubevirt/Kubevirt
70+
71+
# Design
72+
73+
Ideally, a graduated feature would just work out the box, with no further complexity to the cluster admin.
74+
Features that must be configured must add new fields to the KubeVirt CR under `spec`:
75+
76+
```yaml
77+
apiVersion: kubevirt.io/v1
78+
kind: KubeVirt
79+
[...]
80+
spec:
81+
certificateRotateStrategy: {}
82+
feature-A: {}
83+
feature-C:
84+
configA: integer
85+
configB: string
86+
[...]
87+
```
88+
89+
The VMI object may or may not include a configuration field inside the relevant spec.
90+
91+
> **NOTE:** The inclusion of these new KubeVirt API fields should be carefully considered and justified. The feature
92+
> configurables should be avoided as much as possible.
93+
94+
95+
Current feature gates will require an evaluation to determine if they need to be dropped or graduated to a configurable.
96+
This is current list of GA'd features present in KubeVirt/KubeVirt which are still using feature gates and are shown as
97+
[configurables in HCO](https://github.com/kubevirt/hyperconverged-cluster-operator/blob/main/controllers/operands/kubevirt.go#L166-L174):
98+
99+
- DownwardMetrics
100+
- Root (not sure about this one)
101+
- DisableMDEVConfiguration
102+
- PersistentReservation
103+
- AutoResourceLimitsGate
104+
- AlignCPUs
105+
106+
This is the current list of GA'd features present in KubeVirt/KubeVirt which are still using feature gates and are [always
107+
enabled by HCO](https://github.com/kubevirt/hyperconverged-cluster-operator/blob/main/controllers/operands/kubevirt.go#L125-L142):
108+
109+
- CPUManager
110+
- Snapshot
111+
- HotplugVolumes
112+
- GPU
113+
- HostDevices
114+
- NUMA
115+
- VMExport
116+
- DisableCustomSELinuxPolicy
117+
- KubevirtSeccompProfile
118+
- HotplugNICs
119+
- VMPersistentState
120+
- NetworkBindingPlugins
121+
- VMLiveUpdateFeatures
122+
123+
Please note that only feature gates included in KubeVirt/KubeVirt are listed here.
124+
125+
Section [Interactions with the VMIs requests](#interactions-with-the-vmis-requests) details how the system should
126+
react to the different scenarios different to scenarios where the VMI feature configuration is different from what it is
127+
configured in the KubeVirt CR. Also, Section [Update/Rollback Compatibility](#updaterollback-compatibility) explains how
128+
feature gates should be graduated to configurables.
129+
130+
## Interactions with the VMIs requests
131+
132+
In case that, the VMI exposes a configuration field to request the feature as well as the KubeVirt CRD, the system may
133+
encounter some inconsistent states that should be handled in the following way:
134+
135+
- If the feature is set to state A in the KubeVirt CR and the VMI is requesting the feature in state B, the VMIs must
136+
stay in `Pending` state. The VMI status should be updated, showing a status message, highlighting the reason(s) for the
137+
`Pending` state. Moreover, an event could be triggered. For instance, in the following KubeVirt CR, `feature-B` is not
138+
enabled:
139+
140+
```yaml
141+
apiVersion: kubevirt.io/v1
142+
kind: KubeVirt
143+
[...]
144+
spec:
145+
certificateRotateStrategy: {}
146+
feature-A: {}
147+
```
148+
but a given VMI is requesting it:
149+
150+
```yaml
151+
apiVersion: kubevirt.io/v1
152+
kind: VirtualMachineInstance
153+
metadata:
154+
name: vmi-feature-b
155+
spec:
156+
domain:
157+
feature-B: {}
158+
[...]
159+
```
160+
Therefore, the VMI PHASE should stay in `Pending` until `feature-B` is enabled in KubeVirt CR:
161+
162+
```bash
163+
$ kubectl get vmis
164+
NAME AGE PHASE IP NODENAME READY
165+
vmi-feature-b 2s Pending False
166+
```
167+
Moreover, the VMI status should reflect the specific feature configuration that is preventing VMI to start:
168+
```bash
169+
$ kubectl get vmis vmi-feature-b
170+
[...]
171+
status:
172+
conditions:
173+
- lastProbeTime: "2024-08-28T10:16:57Z"
174+
lastTransitionTime: "2024-08-28T10:16:57Z"
175+
message: virtual machine is requesting the disabled feature: feature-B
176+
reason: FeatureNotEnabled
177+
status: "False"
178+
type: Synchronized
179+
```
180+
181+
and a warning event is triggered:
182+
183+
```event
184+
LAST SEEN TYPE REASON OBJECT MESSAGE
185+
[...]
186+
2s Warning FeatureNotEnabled virtualmachineinstance/vmi-feature-b feature-B feature not enabled
187+
```
188+
189+
- Feature configuration checks that could prevent a VMI from starting should only be performed during the VMI
190+
reconciliation process, and not at runtime if the changes cannot be applied without restarting the VMI. While this
191+
approach ensures that the system does not actively block, stop, or kill running VMIs due to configuration changes in
192+
the KubeVirt CR, it is important to note that VMIs may still experience issues or termination if critical features
193+
become unavailable or incompatible.
194+
- The system should not block live migration unless the requested feature
195+
is not supported in the destination host. However, as stated before, if the changes can be applied without
196+
restarting VMI, it can be done at runtime.
197+
- Updates to KubeVirt CR to update a feature configuration should not be rejected.
198+
199+
## Scalability
200+
201+
The feature configurables should not affect in a meaningful way the cluster resource usage.
202+
203+
## Update/Rollback Compatibility
204+
205+
The feature configurables should not affect forward or backward compatibility once the feature GA. A given feature,
206+
after 3 releases in Beta, all feature gates must be dropped. Those features that need a configurable should define it ahead
207+
of time.
208+
209+
## Functional Testing Approach
210+
211+
The unit and functional testing frameworks should cover the relevant scenarios for each feature.
212+
213+
# Implementation Phases
214+
215+
The feature configuration checks should be placed in the VMI reconciliation loop. In this way, the feature configuration
216+
evaluation is close to the VMI scheduling process, as well as allowing KubeVirt to reconcile itself if it is out of sync
217+
temporally.
218+
219+
Regarding already existing features transitioning from feature gates as a way to enable/disable a feature to configurable
220+
fields, this change is acceptable, but it should be marked as a breaking change and documented. Moreover, all feature
221+
gates should be evaluated to determine if they need to be dropped and transitioned to configurables.
222+
223+
## About implementing the checking logic in the VM controller
224+
225+
KubeVirt should not allow starting a VM if it is requesting a feature that it is not available in the cluster.
226+
The VM controller must report the reasons in the `status` field of the VM.
227+
228+
Optionally, another check in the VM controller could be added to let the user know if a VM has requested a feature
229+
configuration which is different from what it is specified in the KubeVirt CR. This check would be performed when the
230+
user creates the VM, and it should update the `status` field of the VM.

0 commit comments

Comments
 (0)