-
Notifications
You must be signed in to change notification settings - Fork 2.8k
Follow-up: Refine DRA test job filters to use an allow-list approach #35225
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Follow-up: Refine DRA test job filters to use an allow-list approach #35225
Conversation
This pull request is a follow-up to kubernetes#35195 and addresses feedback from @pohly. The previous PR used a "deny-list" to exclude `ResourceHealthStatus` tests from DRA jobs. This change implements a more robust "allow-list" approach by using the `!Alpha` filter for node jobs, ensuring that only tests for on-by-default features are run. This prevents jobs from breaking when new, unrelated Alpha features are added in the future, improving the long-term maintainability of the test configuration.
Skipping CI for Draft Pull Request. |
Keywords which can automatically close issues and at(@) or hashtag(#) mentions are not allowed in commit messages. The list of commits with invalid commit messages:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/approve
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: Jpsassine, SergeyKanzhelev The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@@ -100,7 +100,7 @@ presubmits: | |||
- --repo-root=. | |||
- --gcp-zone=us-central1-b | |||
- --parallelism=1 | |||
- '--label-filter=DRA && Feature: isSubsetOf { OffByDefault, DynamicResourceAllocation } && !FeatureGate:ResourceHealthStatus && !Flaky {%- if not ci %} && !Slow {%- endif %}' | |||
- '--label-filter=DRA && Feature: isSubsetOf { OffByDefault, DynamicResourceAllocation } && !Alpha && !Flaky {%- if not ci %} && !Slow {%- endif %}' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The OffByDefault
became wrong here because the feature gates did not enable all off-by-default features at all, only DynamicResourceAllocation
. This is the root cause of the problem.
Please remove it, then you don't need the !Alpha
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also note that we normally put such changes into canary jobs first, try that, then roll out to the actual jobs. Helps prevent mistakes...
As it's already merged into the real jobs let's fix those directly again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see. Fixing now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the explanation above, it helps clarify the root cause and how the OffByDefault
label is misleading.
In the e2e tests I added for ResourceHealthStatus here they have both the [Feature:DynamicResourceAllocation]
and [Alpha]
labels.
My question is: if we only remove OffByDefault
from the filter, the test would still be selected because it matches isSubsetOf { DynamicResourceAllocation }
, right? If so, it would likely continue to fail since the ResourceHealthStatus
feature gate isn't enabled in this job.
It seems like the most direct way to ensure these jobs only run stable, on-by-default tests is to add the !Alpha
filter to the node jobs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You don't see the OffByDefault
, it's not getting copied into the test name. But it is set as a Ginkgo label, which is what matters for --label-filter
.
@BenTheElder: we may have to extend --list-tests
because this doesn't show it either - I wasn't sure what the visual markup for labels should be:
go test -v ./test/e2e_node -args --list-tests
ginkgo --dry-run
shows them in color.
@Jpsassine: see https://docs.google.com/document/d/1YkU7EusfOZO9jgJbHvCwMCfMZ5CxYMxZHAblpDdKbUQ/edit?tab=t.0#heading=h.kvj5zur6ebww for more information.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I mixed up ResourceHealthStatus
(alpha) with KubeletPodResourcesDynamicResources
(beta).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can test a filter like: _output/bin/e2e.test --ginkgo.dry-run --ginkgo.v --ginkgo.silence-skips --ginkgo.label-filter='Feature: isSubSetOf LoadBalancer’
(from the doc linked above)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would be like go test -v ./test/e2e_node -args --ginkgo.v --ginkgo.silence-skips --ginkgo.label-filter='Feature: isSubsetOf { OffByDefault, DynamicResourceAllocation } && !Alpha && !Flaky’
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A dry run test with the current label filter shows the proper DRA tests are grabbed and only the 1 e2e ResourceHealthStatus:Disabled
test is covered.
go test -v ./test/e2e_node -args --ginkgo.dry-run --ginkgo.v --ginkgo.silence-skips --ginkgo.label-filter='Feature: isSubsetOf { OffByDefault, DynamicResourceAllocation } && !Alpha && !Flaky'
It seems like the current approach and label filter in this diff is correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OffByDefault
should not be in isSubsetOf
because the job does not change feature gates and thus can only run tests for features which are on by default. Adding
!Alpha` hides that.
As mentioned in #35195 (comment), we need two sets of DRA e2e_node jobs, just as we do now for e2e:
- Only on-by-default features. That can be merge-blocking.
- All features, i.e. including alpha features. Not merge blocking.
IMHO we don't need to duplicate all e2e_node jobs like that because for alpha coverage with one runtime should be enough. I don't think we have ever caught any bug that occurred only with one runtime, so running alpha tests with all of them wouldn't be very good "bang for the buck".
Which runtime should be the "main" one?
cc @bart0sh
/lgtm cancel |
@Jpsassine please fix the commit messge and comment if this is needed soon or the next release |
The commit message is appropriate and is not immediately necessary but would be good to have for the next release as it implements a long term solution for removing the |
/hold This is covered by #35425 together with several other changes. |
PR needs rebase. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
This pull request is a follow-up to #35195 and addresses feedback from @pohly.
The previous PR used a "deny-list" to exclude
ResourceHealthStatus
tests from DRA jobs. This change implements a more robust "allow-list" approach by using the!Alpha
filter for node jobs, ensuring that only tests for on-by-default features are run.This prevents jobs from breaking when new, unrelated Alpha features are added in the future, improving the long-term maintainability of the test configuration.