Skip to content

OCPBUGS-28647: tuned: operand: add support for deferred updates #1019

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 19 commits into from
Jul 22, 2024

Conversation

ffromani
Copy link
Contributor

@ffromani ffromani commented Apr 4, 2024

Users are able to make tuning changes through NTO/TuneD by updating the CRs on a running system.
When the tuning change is done, the NTO causes the tuned process to restarts, whose unwanted side effect is the tuned process rolling back ALL the tuning and then reapplying it.
In some key use cases, this disrupts the workload behavior.

We add an annotation that cause the update not to be applied immediately, but to be deferred till the next node restart. The NTO operator will continue handle the tuned objects as it does today.
This includes unpacking all profiles on the node.
The deferred annotation would be deduplicated from the tuned object to all the contained tunedprofiles. If the recommended profile computed for a node is originated from a tuned object which has the annotation, then the deferred annotation will be "sticky", and the resulting update will be deferred.

However, if there is a deferred updated pending but the tuned objects are edited to remove the annotation, then the update will become immediate (like they were before this feature and like they are by default without any annotation), and they be applied overwriting the pending deferred update, which will then be lost.
This applies in the case of the recommended profile will be computed from a non-deferred update.

The deferred updates can stack, meaning a deferred update can be applied on top of a previous deferred update; the latter will just overwrite the former as it never was received. An immediate update can overwrite a deferred update, and clear it.

Like the immediate update, the NTO operand will take care of applying the deferred updates, as such:

The NTO operand will gain another chance to process tuned objects at startup, right before the main loops start.

If a tuned object is detected and it has NOT the deferred annotation:

  • the regular reconcile code will work as today
  • the startup reconcile code will not engage

If a tuned object is detected and it DOES HAVE the deferred annotation:

  • the regular reconcile code will set a filesystem flag, will move the status DEGRADED and will ignore the object.
  • at the first followup operand restart, the startup reconcile code will notice the filesystem flag, apply the rendered tuned objects (granted it DOES have the deferred flag, otherwise it will log and skip), clear the filesystem flag. The DEGRADED state will be cleared.

IOW, the startup reconcile code will engage IFF the rendered tuned has the deferred annotation and the filesystem flag does exist. Likewise, the regular reconcile loop will only set the filesystem flag if detects a rendered tuned with the deferred annotation.

The default is obviously kept to immediate updates, for backward compatibility.

@openshift-ci-robot openshift-ci-robot added jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels Apr 4, 2024
@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 4, 2024
@openshift-ci-robot
Copy link
Contributor

@ffromani: This pull request references Jira Issue OCPBUGS-28647, which is invalid:

  • expected the bug to target the "4.16.0" version, but no target version was set

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

The bug has been updated to refer to the pull request using the external bug tracker.

In response to this:

Users are able to make tuning changes through NTO/TuneD by updating the CRs on a running system.
When the tuning change is done, the NTO sends a SIGHUP to the tuned process, which results in the tuned process rolling back ALL the tuning and then reapplying it.
In some key use cases, this disrupts the workload behavior.

We add an annotation that cause the update not to be applied immediately, but to be deferred till the next tuned restart. The NTO operator will continue merging the tuned objects into the special rendered object as it does today.
The deferred annotation will be "sticky". If even just one of the current tuned object (e.g. just the perfprofile one) have the annotation, the rendered tuned will.

However, if there is a deferred updated pending but the tuned objects are edited in such a way the final rendered update will be immediate (= no deferred annotation), then the immediate update will be applied overwriting the pending deferred update, which will then be lost.

The deferred updates can stack, meaning a deferred update can be applied on top of a previous deferred update; the latter will just overwrite the former as it never was received. An immediate update can overwrite a deferred update, and clear it.

Like the immediate update, the NTO operand will take care of applying the deferred updates, as such:

The NTO operand will gain another chance to process tuned objects at startup, right before the main loops start.

If a tuned object is detected and it has NOT the deferred annotation:

  • the regular reconcile code will work as today
  • the startup reconcile code will not engage

If a tuned object is detected and it DOES HAVE the deferred annotation:

  • the regular reconcile code will set a filesystem flag, will move the status DEGRADED and will ignore the object.
  • at the first followup operand restart, the startup reconcile code will notice the filesystem flag, apply the rendered tuned objects (granted it DOES have the deferred flag, otherwise it will log and skip), clear the filesystem flag. The DEGRADED state will be cleared.

IOW, the startup reconcile code will engage IFF the rendered tuned has the deferred annotation and the filesystem flag does exist. Likewise, the regular reconcile loop will only set the filesystem flag if detects a rendered tuned with the deferred annotation.

The default is obviously kept to immediate updates, for backward compatibility.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@ffromani
Copy link
Contributor Author

ffromani commented Apr 4, 2024

/jira refresh

@openshift-ci openshift-ci bot requested review from MarSik and yanirq April 4, 2024 10:15
@openshift-ci-robot openshift-ci-robot added jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. and removed jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels Apr 4, 2024
@openshift-ci-robot
Copy link
Contributor

@ffromani: This pull request references Jira Issue OCPBUGS-28647, which is valid. The bug has been moved to the POST state.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.16.0) matches configured target version for branch (4.16.0)
  • bug is in the state ASSIGNED, which is one of the valid states (NEW, ASSIGNED, POST)

Requesting review from QA contact:
/cc @gsr-shanks

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci bot requested a review from gsr-shanks April 4, 2024 10:15
Copy link
Contributor

openshift-ci bot commented Apr 4, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ffromani

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ffromani ffromani changed the title WIP: OCPBUGS-28647: tuned: operand: add support for deferred updates WIP: POC: OCPBUGS-28647: tuned: operand: add support for deferred updates Apr 4, 2024
@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 4, 2024
@ffromani ffromani force-pushed the tuned-deferred-updates branch from ab08d68 to 7a63c46 Compare April 4, 2024 12:15
@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 20, 2024
@ffromani ffromani force-pushed the tuned-deferred-updates branch from 7a63c46 to 30cdab0 Compare April 30, 2024 11:58
@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 30, 2024
@ffromani ffromani force-pushed the tuned-deferred-updates branch from 30cdab0 to 5376eac Compare April 30, 2024 14:08
@ffromani ffromani force-pushed the tuned-deferred-updates branch 4 times, most recently from f0b7ece to 2314db4 Compare May 8, 2024 17:19
@ffromani
Copy link
Contributor Author

ffromani commented May 9, 2024

/retest

@ffromani
Copy link
Contributor Author

ffromani commented May 9, 2024

 ERRO[2024-05-09T07:59:44Z] 
  * could not run steps: step e2e-gcp-pao failed: "e2e-gcp-pao" pre steps failed: "e2e-gcp-pao" pod "e2e-gcp-pao-ipi-install-install" failed: could not watch pod: the pod ci-op-t8wxg3mq/e2e-gcp-pao-ipi-install-install failed after 1h0m10s (failed containers: test): ContainerFailed one or more containers exited

stil looks as infra issue to me.

@ffromani ffromani force-pushed the tuned-deferred-updates branch 2 times, most recently from 04635db to f0873a9 Compare May 10, 2024 09:29
@ffromani ffromani changed the title WIP: POC: OCPBUGS-28647: tuned: operand: add support for deferred updates OCPBUGS-28647: tuned: operand: add support for deferred updates May 10, 2024
@ffromani
Copy link
Contributor Author

the PR is now reviewable

@jmencak
Copy link
Contributor

jmencak commented Jul 18, 2024

@yanirq in Martin's presence, I think a high-level assessment if this does what we set out to do would be useful. What I have in mind how is this going to be used. Are we going to rely on users to set the deferred annotation in patch profiles? Or do we expect PPC to set this? If the latter, i believe a follow-up PR(s) will be necessary.

@ffromani
Copy link
Contributor Author

@yanirq in Martin's presence, I think a high-level assessment if this does what we set out to do would be useful. What I have in mind how is this going to be used. Are we going to rely on users to set the deferred annotation in patch profiles? Or do we expect PPC to set this? If the latter, i believe a follow-up PR(s) will be necessary.

yes, this is the infra work. We will need followup PR(s) for the performanceprofile to consume this work, because performanceprofile is a higher level construct building on NTO.

@ffromani
Copy link
Contributor Author

ffromani commented Jul 18, 2024

regarding the behavior of the change, we have quite some decent e2e test coverage which ensure and document the observable behavior. More reviews are welcome (and have always be welcome in the past months)

@ffromani
Copy link
Contributor Author

I guess we are waiting for the acknowledge-critical-fixes-only label requirement to be lifted, aren't we?

@jmencak
Copy link
Contributor

jmencak commented Jul 22, 2024

I guess we are waiting for the acknowledge-critical-fixes-only label requirement to be lifted, aren't we?

Sometimes these go away automatically, sometimes "slash test all" helps. Tried slash test all about 3 hours ago in one of my old PRs.

@ffromani
Copy link
Contributor Author

/test all

@ffromani
Copy link
Contributor Author

it seems the label requirement wasn't lifted yet :\

@jmencak
Copy link
Contributor

jmencak commented Jul 22, 2024

it seems the label requirement wasn't lifted yet :\

/label acknowledge-critical-fixes-only

@openshift-ci openshift-ci bot added the acknowledge-critical-fixes-only Indicates if the issuer of the label is OK with the policy. label Jul 22, 2024
@Tal-or
Copy link
Contributor

Tal-or commented Jul 22, 2024

@ffromani If we're on a hurry most of my comments can be postpone for future improvement.
besides that LGTM!

@ffromani
Copy link
Contributor Author

/hold cancel

got another review

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jul 22, 2024
Copy link
Contributor

openshift-ci bot commented Jul 22, 2024

@ffromani: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@openshift-merge-bot openshift-merge-bot bot merged commit f00595e into openshift:master Jul 22, 2024
16 checks passed
@openshift-ci-robot
Copy link
Contributor

@ffromani: Jira Issue OCPBUGS-28647: All pull requests linked via external trackers have merged:

Jira Issue OCPBUGS-28647 has been moved to the MODIFIED state.

In response to this:

Users are able to make tuning changes through NTO/TuneD by updating the CRs on a running system.
When the tuning change is done, the NTO causes the tuned process to restarts, whose unwanted side effect is the tuned process rolling back ALL the tuning and then reapplying it.
In some key use cases, this disrupts the workload behavior.

We add an annotation that cause the update not to be applied immediately, but to be deferred till the next node restart. The NTO operator will continue handle the tuned objects as it does today.
This includes unpacking all profiles on the node.
The deferred annotation would be deduplicated from the tuned object to all the contained tunedprofiles. If the recommended profile computed for a node is originated from a tuned object which has the annotation, then the deferred annotation will be "sticky", and the resulting update will be deferred.

However, if there is a deferred updated pending but the tuned objects are edited to remove the annotation, then the update will become immediate (like they were before this feature and like they are by default without any annotation), and they be applied overwriting the pending deferred update, which will then be lost.
This applies in the case of the recommended profile will be computed from a non-deferred update.

The deferred updates can stack, meaning a deferred update can be applied on top of a previous deferred update; the latter will just overwrite the former as it never was received. An immediate update can overwrite a deferred update, and clear it.

Like the immediate update, the NTO operand will take care of applying the deferred updates, as such:

The NTO operand will gain another chance to process tuned objects at startup, right before the main loops start.

If a tuned object is detected and it has NOT the deferred annotation:

  • the regular reconcile code will work as today
  • the startup reconcile code will not engage

If a tuned object is detected and it DOES HAVE the deferred annotation:

  • the regular reconcile code will set a filesystem flag, will move the status DEGRADED and will ignore the object.
  • at the first followup operand restart, the startup reconcile code will notice the filesystem flag, apply the rendered tuned objects (granted it DOES have the deferred flag, otherwise it will log and skip), clear the filesystem flag. The DEGRADED state will be cleared.

IOW, the startup reconcile code will engage IFF the rendered tuned has the deferred annotation and the filesystem flag does exist. Likewise, the regular reconcile loop will only set the filesystem flag if detects a rendered tuned with the deferred annotation.

The default is obviously kept to immediate updates, for backward compatibility.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@ffromani ffromani deleted the tuned-deferred-updates branch July 22, 2024 14:20
ffromani added a commit to ffromani/cluster-node-tuning-operator that referenced this pull request Jul 22, 2024
ffromani added a commit to ffromani/cluster-node-tuning-operator that referenced this pull request Jul 22, 2024
ffromani added a commit to ffromani/cluster-node-tuning-operator that referenced this pull request Jul 23, 2024
Simplify the flow aligning code to the left.
No intended changes in behavior.

xref: openshift#1019 (comment)

Signed-off-by: Francesco Romani <[email protected]>
ffromani added a commit to ffromani/cluster-node-tuning-operator that referenced this pull request Jul 23, 2024
ffromani added a commit to ffromani/cluster-node-tuning-operator that referenced this pull request Jul 23, 2024
Simplify the flow aligning code to the left.
No intended changes in behavior.

xref: openshift#1019 (comment)

Signed-off-by: Francesco Romani <[email protected]>
@ffromani
Copy link
Contributor Author

followup: #1118

ffromani added a commit to ffromani/cluster-node-tuning-operator that referenced this pull request Jul 23, 2024
Fix the runtime directory to be forward compatible
xref: openshift#1019 (comment)

Signed-off-by: Francesco Romani <[email protected]>
openshift-merge-bot bot pushed a commit that referenced this pull request Jul 23, 2024
* tuned: controller: pack profilesExtract return values in struct

xref: #1019 (comment)

Signed-off-by: Francesco Romani <[email protected]>

* tuned: controller: align to left

Simplify the flow aligning code to the left.
No intended changes in behavior.

xref: #1019 (comment)

Signed-off-by: Francesco Romani <[email protected]>

* tuned: controller: narrow down reapplySysctl

Simplify the code with no intended changes in behavior.

Signed-off-by: Francesco Romani <[email protected]>

* e2e: deferred: add ginkgo labels

and update the tags to match. This way we can easily
skip the flaky tests

The naming schema is made consistent with performanceprofile
labels: test/e2e/performanceprofile/functests/utils/label/label.go

note the recommended kube model is to use `CapitalizedNouns`,
while we do `lowercase-dash-separated`.

Signed-off-by: Francesco Romani <[email protected]>

* e2e: skip flaky tests in nightly runs

nightlies call `make test-e2e` so skip the
flaky tests here. Presubmit lanes (= running pre-merge)
should still run all tests

Signed-off-by: Francesco Romani <[email protected]>

* tuned: controller: fix run directory

Fix the runtime directory to be forward compatible
xref: #1019 (comment)

Signed-off-by: Francesco Romani <[email protected]>

---------

Signed-off-by: Francesco Romani <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
acknowledge-critical-fixes-only Indicates if the issuer of the label is OK with the policy. approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants