Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cluster version controller code restructering #112

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

nb-ohad
Copy link
Contributor

@nb-ohad nb-ohad commented Mar 14, 2024

  • Seperate the different reconcile concerns into functions
  • Preload all input resources
  • Optimize memory managemnt for console resources
  • Add reconsole tracking log messages
  • Fix some issues with object metadata overwrites

Copy link

openshift-ci bot commented Mar 14, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: nb-ohad

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@nb-ohad nb-ohad force-pushed the reconcile-funcs branch 4 times, most recently from 1aee789 to a8972ec Compare March 14, 2024 22:56
@leelavg
Copy link
Contributor

leelavg commented Mar 15, 2024

@nb-ohad

  1. Refactor CSI package code to reduce memory, GC overhead, and "order of operations" considerations #107 introduced a bug wherein ceph deploy & ds .spec.template.metadata.labels are missed during refactor which kinda makes current main undeployable
2024-03-15T03:27:04Z    ERROR   Reconciler error        {"controller": "clusterversion", "controllerGroup": "config.openshift.io", "controllerKind": "ClusterVersion", "ClusterVersion": {"name":"version"}, "namespace": "", "name": "version", "reconcileID": "4c22d638-60c8-40c0-9855-2ae425f8cb09", "error": "Deployment.apps \"csi-cephfsplugin-provisioner\" is invalid: spec.template.metadata.labels: Invalid value: map[string]string(nil): `selector` does not match template `labels`"}
  1. w/ current PR I see codepath in point 1 isn't hit (or at-least I don't see any logs) at all which is more puzzling and a new error in the log appeared
2024-03-15T03:33:03Z    INFO    Starting reconciliation {"controller": "clusterversion", "controllerGroup": "config.openshift.io", "controllerKind": "ClusterVersion", "ClusterVersion": {"name":"version"}, "namespace": "", "name": "version", "reconcileID": "7cc80ef1-57e4-4a1d-b7e2-75aead0b1854", "ClusterVersion": {"name":"version"}}
2024-03-15T03:33:03Z    INFO    Reconciling NginxConfigMap      {"controller": "clusterversion", "controllerGroup": "config.openshift.io", "controllerKind": "ClusterVersion", "ClusterVersion": {"name":"version"}, "namespace": "", "name": "version", "reconcileID": "7cc80ef1-57e4-4a1d-b7e2-75aead0b1854", "ClusterVersion": {"name":"version"}}
2024-03-15T03:33:03Z    ERROR   failed to create nginx config map       {"controller": "clusterversion", "controllerGroup": "config.openshift.io", "controllerKind": "ClusterVersion", "ClusterVersion": {"name":"version"}, "namespace": "", "name": "version", "reconcileID": "7cc80ef1-57e4-4a1d-b7e2-75aead0b1854", "ClusterVersion": {"name":"version"}, "error": "Object openshift-storage-client/ocs-client-operator-console-nginx-conf is already owned by another Deployment controller ocs-client-operator-console"}
github.com/red-hat-storage/ocs-client-operator/controllers.(*ClusterVersionReconciler).reconcileNginxConfigMap
        /workspace/controllers/clusterversion_controller.go:231
github.com/red-hat-storage/ocs-client-operator/controllers.(*ClusterVersionReconciler).reconcilPhases
        /workspace/controllers/clusterversion_controller.go:165
github.com/red-hat-storage/ocs-client-operator/controllers.(*ClusterVersionReconciler).Reconcile
        /workspace/controllers/clusterversion_controller.go:153
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile
        /workspace/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:119
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
        /workspace/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:316
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
        /workspace/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:266
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
        /workspace/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:227
2024-03-15T03:33:03Z    INFO    Reconciliation completed successfully   {"controller": "clusterversion", "controllerGroup": "config.openshift.io", "controllerKind": "ClusterVersion", "ClusterVersion": {"name":"version"}, "namespace": "", "name": "version", "reconcileID": "7cc80ef1-57e4-4a1d-b7e2-75aead0b1854", "ClusterVersion": {"name":"version"}}
2024-03-15T03:33:03Z    INFO    Reconciling StorageClassClaim.  {"controller": "storageclassclaim", "controllerGroup": "ocs.openshift.io", "controllerKind": "StorageClassClaim", "StorageClassClaim": {"name":"ocs-storagecluster-cephfs"}, "namespace": "", "name": "ocs-storagecluster-cephfs", "reconcileID": "bb5c50e1-fe1b-444d-a21b-5c819d968ad6", "StorageClassClaim": {"name":"ocs-storagecluster-cephfs"}}

I think we really need a deployment test suite, can we make an effort to have client operator be able to run on plain k8s which can help a bit better?

Regarding point 1, I'll be raising a new PR to fix that specific issue.

Copy link
Contributor

@leelavg leelavg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks ordered overall. thanks.

controllers/clusterversion_controller.go Outdated Show resolved Hide resolved
controllers/clusterversion_controller.go Outdated Show resolved Hide resolved
controllers/clusterversion_controller.go Outdated Show resolved Hide resolved
nginxConfigMap.Namespace = c.OperatorNamespace

err := c.createOrUpdate(&nginxConfigMap, func() error {
if err := c.own(&nginxConfigMap); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in existing code, console deployment owns the configmap and that seems to be correct and so setting another controller ref here always returns an error.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Console deployment should not own the config map because the deployment controller is not the one that reconciles it. But I guess you mean that the execution fails because there is an existing controller ref.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you mean that the execution fails because there is an existing controller ref.

  • yes.

controllers/clusterversion_controller.go Outdated Show resolved Hide resolved
- Seperate the different reconcile concerns into functions
- Preload all input resources
- Optimize memory managemnt for console resources
- Add reconsole tracking log messages
- Fix some issues with object metadata overwrites

Signed-off-by: nb-ohad <[email protected]>
@openshift-merge-robot
Copy link

PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@leelavg
Copy link
Contributor

leelavg commented Apr 8, 2024

@nb-ohad will you be refreshing this PR?

@nb-ohad
Copy link
Contributor Author

nb-ohad commented Apr 8, 2024

@nb-ohad will you be refreshing this PR?

@leelavg I am not sure I will do that as the changes to the code base are considerable. I might decide on an alternative of recoding these changes from the ground up on top of the current main. Either way, this work will get into the repo at one point or another.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants