Skip to content

Conversation

Jpsassine
Copy link
Contributor

This pull request is a follow-up to #35195 and addresses feedback from @pohly.

The previous PR used a "deny-list" to exclude ResourceHealthStatus tests from DRA jobs. This change implements a more robust "allow-list" approach by using the !Alpha filter for node jobs, ensuring that only tests for on-by-default features are run.

This prevents jobs from breaking when new, unrelated Alpha features are added in the future, improving the long-term maintainability of the test configuration.

This pull request is a follow-up to kubernetes#35195 and addresses feedback from @pohly.

The previous PR used a "deny-list" to exclude `ResourceHealthStatus` tests from DRA jobs. This change implements a more robust "allow-list" approach by using the `!Alpha` filter for node jobs, ensuring that only tests for on-by-default features are run.

This prevents jobs from breaking when new, unrelated Alpha features are added in the future, improving the long-term maintainability of the test configuration.
@k8s-ci-robot
Copy link
Contributor

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. do-not-merge/invalid-commit-message Indicates that a PR should not merge because it has an invalid commit message. labels Jul 25, 2025
@k8s-ci-robot
Copy link
Contributor

Keywords which can automatically close issues and at(@) or hashtag(#) mentions are not allowed in commit messages.

The list of commits with invalid commit messages:

  • afd74ab Follow-up: Refine DRA test job filters to use an allow-list approach

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Jul 25, 2025
@k8s-ci-robot k8s-ci-robot requested review from dims and sjenning July 25, 2025 18:20
@k8s-ci-robot k8s-ci-robot added area/config Issues or PRs related to code in /config area/jobs sig/node Categorizes an issue or PR as relevant to SIG Node. sig/testing Categorizes an issue or PR as relevant to SIG Testing. labels Jul 25, 2025
@Jpsassine
Copy link
Contributor Author

cc. @pohly @SergeyKanzhelev

@Jpsassine Jpsassine marked this pull request as ready for review July 25, 2025 18:53
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jul 25, 2025
Copy link
Member

@SergeyKanzhelev SergeyKanzhelev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jul 25, 2025
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Jpsassine, SergeyKanzhelev

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 25, 2025
@@ -100,7 +100,7 @@ presubmits:
- --repo-root=.
- --gcp-zone=us-central1-b
- --parallelism=1
- '--label-filter=DRA && Feature: isSubsetOf { OffByDefault, DynamicResourceAllocation } && !FeatureGate:ResourceHealthStatus && !Flaky {%- if not ci %} && !Slow {%- endif %}'
- '--label-filter=DRA && Feature: isSubsetOf { OffByDefault, DynamicResourceAllocation } && !Alpha && !Flaky {%- if not ci %} && !Slow {%- endif %}'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The OffByDefault became wrong here because the feature gates did not enable all off-by-default features at all, only DynamicResourceAllocation. This is the root cause of the problem.

Please remove it, then you don't need the !Alpha.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also note that we normally put such changes into canary jobs first, try that, then roll out to the actual jobs. Helps prevent mistakes...

As it's already merged into the real jobs let's fix those directly again.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see. Fixing now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explanation above, it helps clarify the root cause and how the OffByDefault label is misleading.

In the e2e tests I added for ResourceHealthStatus here they have both the [Feature:DynamicResourceAllocation] and [Alpha] labels.

My question is: if we only remove OffByDefault from the filter, the test would still be selected because it matches isSubsetOf { DynamicResourceAllocation }, right? If so, it would likely continue to fail since the ResourceHealthStatus feature gate isn't enabled in this job.

It seems like the most direct way to ensure these jobs only run stable, on-by-default tests is to add the !Alpha filter to the node jobs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't see the OffByDefault, it's not getting copied into the test name. But it is set as a Ginkgo label, which is what matters for --label-filter.

@BenTheElder: we may have to extend --list-tests because this doesn't show it either - I wasn't sure what the visual markup for labels should be:

go test -v ./test/e2e_node -args --list-tests

ginkgo --dry-run shows them in color.

@Jpsassine: see https://docs.google.com/document/d/1YkU7EusfOZO9jgJbHvCwMCfMZ5CxYMxZHAblpDdKbUQ/edit?tab=t.0#heading=h.kvj5zur6ebww for more information.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I mixed up ResourceHealthStatus (alpha) with KubeletPodResourcesDynamicResources (beta).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can test a filter like: _output/bin/e2e.test --ginkgo.dry-run --ginkgo.v --ginkgo.silence-skips --ginkgo.label-filter='Feature: isSubSetOf LoadBalancer’ (from the doc linked above)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would be like go test -v ./test/e2e_node -args --ginkgo.v --ginkgo.silence-skips --ginkgo.label-filter='Feature: isSubsetOf { OffByDefault, DynamicResourceAllocation } && !Alpha && !Flaky’

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A dry run test with the current label filter shows the proper DRA tests are grabbed and only the 1 e2e ResourceHealthStatus:Disabled test is covered.

go test -v ./test/e2e_node -args --ginkgo.dry-run --ginkgo.v --ginkgo.silence-skips --ginkgo.label-filter='Feature: isSubsetOf { OffByDefault, DynamicResourceAllocation } && !Alpha && !Flaky'

It seems like the current approach and label filter in this diff is correct.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OffByDefault should not be in isSubsetOf because the job does not change feature gates and thus can only run tests for features which are on by default. Adding !Alpha` hides that.

As mentioned in #35195 (comment), we need two sets of DRA e2e_node jobs, just as we do now for e2e:

  • Only on-by-default features. That can be merge-blocking.
  • All features, i.e. including alpha features. Not merge blocking.

IMHO we don't need to duplicate all e2e_node jobs like that because for alpha coverage with one runtime should be enough. I don't think we have ever caught any bug that occurred only with one runtime, so running alpha tests with all of them wouldn't be very good "bang for the buck".

Which runtime should be the "main" one?

cc @bart0sh

@pohly
Copy link
Contributor

pohly commented Jul 25, 2025

/lgtm cancel

@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jul 25, 2025
@SergeyKanzhelev SergeyKanzhelev moved this from Triage to PRs - Needs Approver in SIG Node CI/Test Board Jul 30, 2025
@SergeyKanzhelev
Copy link
Member

@Jpsassine please fix the commit messge and comment if this is needed soon or the next release

@Jpsassine
Copy link
Contributor Author

@Jpsassine please fix the commit messge and comment if this is needed soon or the next release

The commit message is appropriate and is not immediately necessary but would be good to have for the next release as it implements a long term solution for removing the ResourceHealthStatus tests.

@pohly
Copy link
Contributor

pohly commented Aug 29, 2025

/hold

This is covered by #35425 together with several other changes.

@k8s-ci-robot k8s-ci-robot added do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Aug 29, 2025
@k8s-ci-robot
Copy link
Contributor

PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/config Issues or PRs related to code in /config area/jobs cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. do-not-merge/invalid-commit-message Indicates that a PR should not merge because it has an invalid commit message. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. sig/node Categorizes an issue or PR as relevant to SIG Node. sig/testing Categorizes an issue or PR as relevant to SIG Testing. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
Status: PRs - Needs Approver
Development

Successfully merging this pull request may close these issues.

5 participants