Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CNF-11815: e2e: Added node inspector for inspecting nodes configuration #1008

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

rbaturov
Copy link
Contributor

@rbaturov rbaturov commented Apr 1, 2024

On hypershift there is no MCO, hence there are no machine-config-daemon pods.
A different resolution is needed for accessing the underlying node for inspecting configurations.
This commit introduces a node inspector implemented as a daemonset.
Upon execution of test suites, a pod with elevated privileges and host filesystem mounted will be deployed on every node.
Also I have added Z-deconfig suite ('Z' prefix, will guarantee that it will be the last suite run) that will be used for cleanup.
This API will be used for both hypershift and non-hypershift systems.

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 1, 2024
@openshift-ci openshift-ci bot requested review from kpouget and Tal-or April 1, 2024 07:44
Copy link
Contributor

openshift-ci bot commented Apr 1, 2024

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: rbaturov
Once this PR has been reviewed and has the lgtm label, please assign kpouget for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copy link
Contributor

@ffromani ffromani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, possible improvements inside

@rbaturov rbaturov changed the title WIP: Added daemonset for inspecting nodes configuration CNF-11815: Added daemonset for inspecting nodes configuration Apr 2, 2024
@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Apr 2, 2024

@rbaturov: This pull request references CNF-11815 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.16.0" version, but no target version was set.

In response to this:

On hypershift there is no MCO, hence there are no machine-config-daemon pods. A different resolution is needed for accessing the underlying node for inspecting configurations. This commit adds a daemonset that will start upon test suites - on every node we will deploy a pod with escalated privileges and host fs mounted. This API will be used for both hypershift and non-hypershift systems.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Apr 2, 2024
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 2, 2024
@rbaturov
Copy link
Contributor Author

rbaturov commented Apr 2, 2024

/hold
Depends on #1004

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 2, 2024
@rbaturov rbaturov force-pushed the add-daemonset-for-tests branch 2 times, most recently from d61ea00 to 294c4af Compare April 2, 2024 13:17
@rbaturov rbaturov changed the title CNF-11815: Added daemonset for inspecting nodes configuration CNF-11815: e2e: Added daemonset for inspecting nodes configuration Apr 2, 2024
@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 14, 2024
}

// ExecCommandOnMachineConfigDaemon returns the output of the command execution on the machine-config-daemon pod that runs on the specified node
func ExecCommandOnMachineConfigDaemon(ctx context.Context, node *corev1.Node, command []string) ([]byte, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both ExecCommandOnTestingDaemonPod and ExecCommandOnNode have the same purpose.
Unless there is a very good reason to have both functions, I would merge them into one named:
ExecCommandOnNode which declare explicitly where the command is about to be running.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Tal-or
ExecCommandOnTestingDaemonPod is being used as an underlying function for ExecCommandOnNode.
The difference between the two is that ExecCommandOnNode returns a string (after some formatting) rather than a bytes slice (ExecCommandOnTestingDaemonPod).
As throughout the repo, developers have chosen to use both of them, I think we should keep both of these functions and just rename them appropriately. (Another option is to unify them to one function that returns []bytes and change all calls to target this func, following a call if needed to a util function that will do the string format that is left). But again the latter option would require some work for refactoring.
WDYT?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then let's keep them and just rename ExecCommandOnTestingDaemonPod.
We don't want to steer too much from the scope of this work

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Tal-or I'll need to rename ExecCommandOnNode to ExecCommandOnNodeToString and ExecCommandOnTestingDaemonPod which returns a []byte to ExecCommandOnNode.
This however, may cause a slight confusion for developers that got used that ExecCommandOnNode returns a string. but I think this is not a big deal.
Another option could be to change ExecCommandOnTestingDaemonPod to ExecCommandOnNodeToByteSlice. I personally think we should go with the first option, even if it might cause a bit of confusion.

Copy link
Contributor

@Tal-or Tal-or Apr 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since the name of the package is nodes I would rename ExecCommandOnTestingDaemonPod to ExecCommand and call it a day.
From the caller site it looks like:
nodes.ExecCommand() The package name provides a clear indication about where this command is about to be running.

Copy link
Contributor Author

@rbaturov rbaturov Apr 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But what will be the name of the second func?
Anyways I was thinking that we should move these functions to the node inspector package, as they rely on it running.
So it's going to be more of nodeInspector.ExecCommand...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The node inspector is an implementation detail. you can call nodeInspector.ExecCommand from nodes.ExecCommand

@rbaturov rbaturov changed the title CNF-11815: e2e: Added daemonset for inspecting nodes configuration WIP: CNF-11815: e2e: Added daemonset for inspecting nodes configuration Apr 15, 2024
@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 15, 2024
@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 15, 2024
@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Apr 15, 2024

@rbaturov: This pull request references CNF-11815 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.16.0" version, but no target version was set.

In response to this:

On hypershift there is no MCO, hence there are no machine-config-daemon pods.
A different resolution is needed for accessing the underlying node for inspecting configurations.
This commit introduces a node inspector implemented as a daemonset.
Upon execution of test suites, a pod with elevated privileges and host filesystem mounted will be deployed on every node.
Also I have added Z-deconfig suite ('Z' prefix, will guarantee that it will be the last suite run) that will be used for cleanup.
This API will be used for both hypershift and non-hypershift systems.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@rbaturov rbaturov changed the title WIP: CNF-11815: e2e: Added daemonset for inspecting nodes configuration WIP: CNF-11815: e2e: Added node inspector for inspecting nodes configuration Apr 15, 2024
@rbaturov
Copy link
Contributor Author

/retest-required

@rbaturov rbaturov changed the title WIP: CNF-11815: e2e: Added node inspector for inspecting nodes configuration CNF-11815: e2e: Added node inspector for inspecting nodes configuration Apr 16, 2024
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 16, 2024
@rbaturov
Copy link
Contributor Author

@Tal-or ready for another review iteration

Signed-off-by: Ronny Baturov <[email protected]>
On hypershift there is no MCO, hence there are no machine-config-daemon pods.
A different resolution is needed for accessing the underlying node for inspecting configurations.
This commit introduces a node inspector implemented as a daemonset.
Upon execution of test suites, a pod with elevated privileges and host filesystem mounted will be deployed on every node.
Also I have added Z-deconfig suite ('Z' prefix, will guarantee that it will be the last suite run) that will be used for cleanup.
This API will be used for both hypershift and non-hypershift systems.

Signed-off-by: Ronny Baturov <[email protected]>
@rbaturov
Copy link
Contributor Author

/unhold

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 23, 2024
Updated these functions to use the node inspector rather than the MCD.
Furthermore, I have renamed these functions to better reflect their purposes and usage.

Signed-off-by: Ronny Baturov <[email protected]>
@rbaturov
Copy link
Contributor Author

/retest-required

Copy link
Contributor

openshift-ci bot commented May 27, 2024

@rbaturov: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-hypershift-pao 14f169e link true /test e2e-hypershift-pao
ci/prow/e2e-aws-ovn 14f169e link true /test e2e-aws-ovn
ci/prow/e2e-hypershift 14f169e link true /test e2e-hypershift

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants