-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CRD Validation for Etcd resource using CEL expressions #995
base: master
Are you sure you want to change the base?
Conversation
@Shreyas-s14 Thank you for your contribution. |
Thank you @Shreyas-s14 for your contribution. Before I can start building your PR, a member of the organization must set the required label(s) {'reviewed/ok-to-test'}. Once started, you can check the build status in the PR checks section below. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, @Shreyas-s14, for adding the validations for the etcd
CR. I've provided a few initial review comments based on my first glance. I will share more detailed feedback after a thorough review.
api/v1alpha1/etcd.go
Outdated
ReelectionPeriod *metav1.Duration `json:"reelectionPeriod,omitempty"` | ||
// EtcdConnectionTimeout defines the timeout duration for etcd client connection during leader election. | ||
// +optional | ||
// +kubebuilder:validation:Type=string | ||
// +kubebuilder:validationXValidation:message="Invalid duration given for etcd.spec.backup.leaderElection.etcdCOnnectionTimeout",rule="self.matches('^([0-9]+([.][0-9]+)?h)?([0-9]+([.][0-9]+)?m)?([0-9]+([.][0-9]+)?s)?([0-9]+([.][0-9]+)?d)?$')" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// +kubebuilder:validationXValidation:message="Invalid duration given for etcd.spec.backup.leaderElection.etcdCOnnectionTimeout",rule="self.matches('^([0-9]+([.][0-9]+)?h)?([0-9]+([.][0-9]+)?m)?([0-9]+([.][0-9]+)?s)?([0-9]+([.][0-9]+)?d)?$')" | |
// +kubebuilder:validationXValidation:message="Invalid duration given for etcd.spec.backup.leaderElection.etcdConnectionTimeout",rule="self.matches('^([0-9]+([.][0-9]+)?h)?([0-9]+([.][0-9]+)?m)?([0-9]+([.][0-9]+)?s)?([0-9]+([.][0-9]+)?d)?$')" |
|
||
### Field validations | ||
- The fields which expect only a particular set of values are checked by using the kubebuilder marker: `+kubebuilder:validation:Enum=<value1>;<value2>` | ||
* The `etcd.spec.etcd.metrics` can only be set as either `basic` or `extesnive`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* The `etcd.spec.etcd.metrics` can only be set as either `basic` or `extesnive`. | |
* The `etcd.spec.etcd.metrics` can only be set as either `basic` or `extensive`. |
g.Expect(cl.Create(ctx, etcd)).To(Succeed()) | ||
|
||
etcd.Spec.Replicas = int32(test.updatedReplicas) | ||
time.Sleep(30) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do not use time.sleep
, For more detailed guidance, please refer to the
etcd-druid/docs/development/testing.md
Line 25 in 6b3b94e
* Do not use `time.Sleep` and friends as it renders the tests flaky. |
/test pull-etcd-druid-e2e-kind |
* Added `itTestEnv.StartManager()` to start the manager in `setupTestEnvironment()`. * Upgraded teh env-test k8s version to `1.30`. * Changed the timeout to `20s` in `assertReconcileSuspensionEventRecorded` to prevent the test timing out. * Removed an unnecessary `time.Sleep` in `TestValidateSpecReplicas`. * Uncommented previously commented options which caused regressions.
2) added version compatibility docs 3) reduced code reuse in specUpdate_test.go
1) increased value of timeout passed in test/it/helper.go createAndAssertEtcdAndAllManagedResources 2) increased the timeout in test/e2e/suite_test.go to 20 minutes each
Reverted the timeout value in test/it/helper.go which was increased to handle failure of prow jobs
…sion-compatibility-matrix.md
c6a0f93
to
a0c45a7
Compare
@@ -0,0 +1,37 @@ | |||
# CRD Validations for etcd | |||
|
|||
* The validations for the fields within the `etcd` resource are done via [kubebuilder markers for CRD validation](https://book.kubebuilder.io/reference/markers/crd-validation). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some where in this doc you must mention that this is meant for k8s clusters with version >= 1.30.
What should consumers who are using k8s versions older than 1.30 do? This needs to be clearly mentioned. I think we will in general need to enhance the CRD apis to have a k8s version otherwise with this change etcd-druid can no longer work with k8s clusters with version < 1.30 which is going to be an issue.
So if we wish to have etcd-druid be consumed outside gardener then it is essential that we have some sort of validation for all the CRs. Unfortunately we do not have that today as well for non-gardener use cases. What should be our approach for non-gardener users using k8s version < 1.30? - do we say your CRs will never get validated at the time of submission to KAPI and you will only come to know of any problem during runtime or during provisioning an etcd cluster?
@@ -162,12 +162,13 @@ spec: | |||
deltaSnapshotPeriod: | |||
description: DeltaSnapshotPeriod defines the period after which | |||
delta snapshots will be taken | |||
pattern: ^([0-9]+([.][0-9]+)?h)?([0-9]+([.][0-9]+)?m)?([0-9]+([.][0-9]+)?s)?([0-9]+([.][0-9]+)?d)?$ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you not introduced CEL expressions for EtcdCopyBackupsTask
CRD? There should be uniformity in the way all CRs are validated, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@unmarshall I would prefer not increasing the scope of this PR now by introducing validations for EtcdCopyBackupsTask
resources. We never ran validations for those, so we can continue live without validations for another release or two.
To clarify, gardener runs GRM webhook that uses the validation functions provided in api/validation/etcdcopybackupstask.go, but for open-source users of etcd-druid, currently there's no validation run for either Etcd
or EtcdCopyBackupsTask
resources, sicne druid doesn't run a validating webhook of its own against these resources. So this PR is simply a bonus for open-source users of etcd-druid. And as for gardener, if we choose to retain the the GRM webhook validation for a couple more druid releases, then we are completely safe. And even if we remove GRM webhook validation for k8s versions v1.30 and above, since we know the EtcdCopyBackupsTask
resource is well-formed in gardener, it should be ok to live without validating these resources for another release or two, and only for k8s clusters running v1.30 or above.
I would rather want focus on #995 (comment) since that is more pressing right now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Shreyas-s14 can you confirm that you have tested with g/g locally setup that already has a GRM validating webhook enabled? Test it with the following combination:
- CRD with CEL expressions & GRM webhook
- CRD without CEL expressions & GRM webhook
I would prefer not increasing the scope of this PR now by introducing validations for EtcdCopyBackupsTask resources.
Sure. Please create an issue to add validations for EtcdCopyBackupsTask
so that this is not forgotten. This should be done before we remove the validating webhooks from GRM.
1) Renamed "test/it/crd-validation" to "test/crdvalidation" 2) Removed Camel casing from the file names in test/crdvalidation/etcd 3) Changed version-compatibility matrix to 1.29 since CEL expressions are valid from this version onwards 4) Removed config/crd/bases/crd-druid.gardener.cloud_etcds.yaml
1) make prepare-helm-charts now takes an additional parameter to deploy the crd either with or without the CEL expressions based on the k8s version. 2) modified api/hack/generate.sh to generate 2 versions of the etcd crd. One with and one without CEL expressions 3) added deploy.sh which is called when make deploy is called. Additional functionality: get k8s version and deploy accordingly.
/retest |
…lidation for requests field in etcd 1) Changed the Makefile target for dpeloy to install the dependecies if not present 2) Removed existing validations for fields of type resources where the request field was mandatory. 3) reverted the cron expressions in examples/etcd/druid_v1alpha1_etcd_localstack.yaml 4) exported the kubernetes version variable. 5) removed duplicate variable in hack/tools.mk 6) Added path to etcd crd without CEL as a return type in assets.go if the k8s version is < 1.29. Added the function to get k8s version as well 7) Changes to test/it/controller/etcd/reconciler_test.go and test/it/crdvalidation/etcd/validation_test.go based on changes to test/it/assets/assets.go 8) Skip test cases based on k8s versions
os.Exit(1) | ||
} | ||
|
||
k8sVersionAbove129, err = crds.IsK8sVersionAbove129(k8sVersion) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why can't you re-use crds.GetAll
function?
…Versions + added comment(description) to exported function
…ed comment addressing duplication of code in test/it/assets/assets.go
@Shreyas-s14: The following test failed, say
Full PR test history. Your PR dashboard. Command help for this repository. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
How to categorize this PR?
/area quality
/area control-plane
/area open-source
/kind enhancement
What this PR does / why we need it:
kubebuilder markers
, using a combination of CEL Expressions via thex-validation
and thevalidation
markers.api/core/crds
is changed to now have API methods which takes in a k8s version to provide the appropriate CRD.make deploy-*
andmake prepare-helm-chart
targets to work with CRDs with and without CEL expressions.Which issue(s) this PR fixes:
Fixes #989
Special notes for your reviewer:
Release note: