Skip to content

Commit a825f81

Browse files
committed
design-proposal: Feature configurables
This design document states how features that require to have a mechanism to change it's state, e.g., enabled/disabled should be implemented in KubeVirt. Signed-off-by: Javier Cano Cano <[email protected]>
1 parent db2ea07 commit a825f81

File tree

1 file changed

+233
-0
lines changed

1 file changed

+233
-0
lines changed
Lines changed: 233 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,233 @@
1+
# Overview
2+
3+
With the introduction
4+
of [KubeVirt Feature Lifecycle](https://github.com/kubevirt/community/blob/main/design-proposals/feature-lifecycle.md)
5+
policy, features reaching General Availability (GA) need to drop their use of feature gates. This applies also to
6+
configurable features that we may want to disable.
7+
8+
## Motivation
9+
10+
Users may want certain features to be in a given state, for example to make the best use out of given
11+
resources or for compliance reasons features may expose sensitive information from the host to the virtual machines
12+
instances (VMI) or add additional containers to the launcher pod, which are not required by the user.
13+
14+
The downward metrics feature is a good example of why some clusters may want to have it enabled or disabled.
15+
The downward metrics feature exposes some metrics about the host node where the VMI is running to the guest. This
16+
information may be considered sensitive information.
17+
If there is no mechanism to disable the feature, any VMI could request the metrics and inspect information that, in some
18+
cases, the admin would like to hide, creating a potential security issue.
19+
20+
The behavior of other features might be changed by editing configurables, e.g. the maximum of CPU sockets allowed for
21+
each VMI can be configured.
22+
23+
Before the introduction
24+
of [KubeVirt Feature Lifecycle](https://github.com/kubevirt/community/blob/main/design-proposals/feature-lifecycle.md)
25+
policy, many feature gates remained after feature's graduation to GA with the sole purpose of acting as a switch for the
26+
feature. Generally speaking, this is a bad practice, because feature gates should be reserved for controlling a feature
27+
until it reaches maturity. i.e., GA. Therefore, in the case that a developer wants to provide the ability to tune/change
28+
the state of the feature, configurables exposed in the KubeVirt CR should be provided. This should be
29+
accomplished while achieving [eventually consistency](https://en.wikipedia.org/wiki/Eventual_consistency). This forces
30+
us to avoid the feature state control checking on webhooks and moving the feature state control closer to the
31+
responsible code. Moreover, it has to be decided how the system should behave if a VMI is
32+
requiring a feature in a state different from what was configured in the KubeVirt CR, or what should happen if the
33+
configuration of a feature in use is changed. (see matrix below).
34+
35+
## Goals
36+
37+
- Get a clear understanding about the features configuration status.
38+
- Establish how the feature configurables should work.
39+
- Describe how the system should react in these scenarios in the case that the VMI exposes an API field to configure the
40+
feature status:
41+
- A feature in KubeVirt is set to state A and a VMI requests the feature to be in state B.
42+
- A feature in KubeVirt is set to state A, there are running VMIs using the feature in state A, and the feature is
43+
changed in KubeVirt to state B.
44+
- A feature in KubeVirt is set to state A, and pending VMIs want to use it.
45+
- A feature in KubeVirt is set to state A, and running VMIs using the feature in state B wants to live migrate.
46+
- Graduate features status swapping from features gates to configurables.
47+
48+
## Non Goals
49+
50+
- Describe how features protected with features gates should work.
51+
- Change how feature gates are managed. Feature gating and configuration are two completely distinct issues.
52+
53+
## Definition of Users
54+
55+
Development contributors.
56+
57+
Cluster administrators.
58+
59+
## User Stories
60+
61+
* As a cluster administrator, I want to be able to change the cluster-wide state of a feature by editing configurables.
62+
63+
* As VMI owner, I want to use a given feature.
64+
65+
* As a VMI owner / cluster admin, I want to understand what's the current configuration of the various features.
66+
67+
## Repos
68+
69+
Kubevirt/Kubevirt
70+
71+
# Design
72+
73+
The feature status swapping must be done by adding new fields to the KubeVirt CR under
74+
`spec`:
75+
76+
```yaml
77+
apiVersion: kubevirt.io/v1
78+
kind: KubeVirt
79+
[...]
80+
spec:
81+
certificateRotateStrategy: {}
82+
feature-A: {}
83+
feature-C:
84+
configA: integer
85+
configB: string
86+
[...]
87+
```
88+
Please note that if the feature spec field is not present, the feature status is assumed to be completely disabled.
89+
For instance, in the KubeVirt CR manifest provided above, `feature-B` is not enabled.
90+
91+
The VMI object may or may not include a configuration field inside the relevant spec.
92+
93+
> **NOTE:** The inclusion of these new KubeVirt API fields should be carefully considered and justified. The feature
94+
> configurables should be avoided as much as possible.
95+
96+
Moreover, the KubeVir CR `status` field should clearly indicate the current state of each feature, providing a
97+
comprehensive overview of the operational status of these features:
98+
99+
```yaml
100+
apiVersion: kubevirt.io/v1
101+
kind: KubeVirt
102+
[...]
103+
status:
104+
featureStatus:
105+
featureA:
106+
status: Enabled
107+
featureB:
108+
status: Disabled
109+
featureC:
110+
status: 0.5GiB
111+
[...]
112+
```
113+
114+
Current feature gates will require an evaluation to determine if they need to be dropped or graduated to a configurable.
115+
This is current list of GA'd features present in KubeVirt/KubeVirt which are still using feature gates and are shown as
116+
[configurables in HCO](https://github.com/kubevirt/hyperconverged-cluster-operator/blob/main/controllers/operands/kubevirt.go#L166-L174):
117+
118+
- DownwardMetrics
119+
- Root (not sure about this one)
120+
- DisableMDEVConfiguration
121+
- PersistentReservation
122+
- AutoResourceLimitsGate
123+
- AlignCPUs
124+
125+
This is the current list of GA'd features present in KubeVirt/KubeVirt which are still using feature gates and are [always
126+
enabled by HCO](https://github.com/kubevirt/hyperconverged-cluster-operator/blob/main/controllers/operands/kubevirt.go#L125-L142):
127+
128+
- CPUManager
129+
- Snapshot
130+
- HotplugVolumes
131+
- GPU
132+
- HostDevices
133+
- NUMA
134+
- VMExport
135+
- DisableCustomSELinuxPolicy
136+
- KubevirtSeccompProfile
137+
- HotplugNICs
138+
- VMPersistentState
139+
- NetworkBindingPlugins
140+
- VMLiveUpdateFeatures
141+
142+
Please note that only feature gates included in KubeVirt/KubeVirt are listed here.
143+
144+
Section [Interactions with the VMIs requests](#interactions-with-the-vmis-requests) details how the system should
145+
react to the different scenarios different to scenarios where the VMI feature status is different from what it is
146+
configured in the KubeVirt CR. Also, Section [Update/Rollback Compatibility](#updaterollback-compatibility) explains how
147+
feature gates should be graduated to configurables.
148+
149+
## Interactions with the VMIs requests
150+
151+
In case that, the VMI exposes a configuration field to request the feature as well as the KubeVirt CRD, the system may
152+
encounter some inconsistent states that should be handled in the following way:
153+
154+
- If the feature is set to state A in the KubeVirt CR and the VMI is requesting the feature in state B, the VMIs must
155+
stay in Pending state. The VMI status should be updated, showing a status message, highlighting the reason(s) for the
156+
Pending state. For instance, in the following KubeVirt CR, `feature-B` is not enabled:
157+
```yaml
158+
apiVersion: kubevirt.io/v1
159+
kind: KubeVirt
160+
[...]
161+
spec:
162+
certificateRotateStrategy: {}
163+
feature-A: {}
164+
```
165+
but a given VMI is requesting it:
166+
167+
```yaml
168+
apiVersion: kubevirt.io/v1
169+
kind: VirtualMachineInstance
170+
metadata:
171+
name: vmi-feature-b
172+
spec:
173+
domain:
174+
feature-B: {}
175+
[...]
176+
```
177+
Therefore, the VMI PHASE should stay in `Pending` until `feature-B` is enabled in KubeVirt CR:
178+
179+
```bash
180+
$ kubectl get vmis
181+
NAME AGE PHASE IP NODENAME READY
182+
vmi-feature-b 2s Pending False
183+
```
184+
Moreover, the VMI status should reflect the specific feature configuration that is preventing VMI to start:
185+
```bash
186+
$ kubectl get vmis vmi-feature-b
187+
[...]
188+
status:
189+
conditions:
190+
- lastProbeTime: "2024-08-28T10:16:57Z"
191+
lastTransitionTime: "2024-08-28T10:16:57Z"
192+
message: virtual machine is requesting the disabled feature: feature-B
193+
reason: FeatureNotEnabled
194+
status: "False"
195+
type: Synchronized
196+
```
197+
198+
- Feature status checks should only be performed during the VMI reconciliation process, and not at runtime if the changes
199+
cannot be applied without restarting the VMI. Therefore, the
200+
feature status changes in the KubeVirt CR should not affect running VMIs. Moreover, the VMI should still be able to
201+
live migrate, preserving its original feature state. However, as stated before, if the changes can be applied without
202+
restarting VMI, it can be done at runtime.
203+
- Optionally, it could enable the possibility to reject the KubeVirt CR change request if running VMIs are using the
204+
feature in a given state. However, by the default the request should be accepted.
205+
206+
## Scalability
207+
208+
The feature state swapping should not affect in a meaningful way the cluster resource usage.
209+
210+
## Update/Rollback Compatibility
211+
212+
The feature state swapping should not affect forward or backward compatibility once the feature GA. A given feature,
213+
after 3 releases in Beta, all feature gates must be dropped. Those features that need a configurable should define it ahead
214+
of time.
215+
216+
## Functional Testing Approach
217+
218+
The unit and functional testing frameworks should cover the relevant scenarios for each feature.
219+
220+
# Implementation Phases
221+
222+
The feature status check should be placed in the VMI reconciliation loop. In this way, the feature status evaluation is
223+
close to the VMI scheduling process, as well as allowing KubeVirt to reconcile itself if it is out of sync temporally.
224+
225+
Regarding already existing features transitioning from feature gates as a way to set the feature status to configurable
226+
fields, this change is acceptable, but it should be marked as a breaking change and documented. Moreover, all feature
227+
gates should be evaluated to determine if they need to be dropped and transitioned to configurables.
228+
229+
## About implementing the checking logic in the VM controller
230+
231+
The checking in the VM controller could be added to let the user know if a VM has requested a feature in a state which
232+
is different from what it is specified in the KubeVirt CR. The VM will update the VM status, showing a status message
233+
highlighting the misconfiguration.

0 commit comments

Comments
 (0)