Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Run manager with 2 replicas #180

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -325,7 +325,7 @@ spec:
self-node-remediation-operator: ""
name: self-node-remediation-controller-manager
spec:
replicas: 1
replicas: 2
selector:
matchLabels:
control-plane: controller-manager
Expand All @@ -337,6 +337,16 @@ spec:
control-plane: controller-manager
self-node-remediation-operator: ""
spec:
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this fail operator installation in case we only have one node available during installation?

Also, what happens on updates, when there aren't enough nodes to start the pods of the new version?

(the latter can be solved with matchLabelKeys and pod-template-hash, but that's alpha in k8s 1.29 only. PopologySpreadConstraints, also with matchLabelKeys and pod-template-hash, would also work AFAIU, but there matchLabelKeys also is too new, beta since k8s 1.27 / OCP 4.14).

Maybe preferredDuringSchedulingIgnoredDuringExecution is a better choice? It would potentially deploy both pods on the same node, but that's better than nothing, is it? However, I'm not sure if that would still make a difference to the default scheduling behaviour, which already respects replicasets... 🤔

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like the update concern can also be solved with this:

strategy:
    rollingUpdate:
      maxSurge: 0
      maxUnavailable: 1
    type: RollingUpdate

it will force k8s to first delete one of the pods before starting a new one

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this fail operator installation in case we only have one node available during installation?

I will test it this case. In my opinion, I think it should fail to install the operator installation.
There are the following cases:

  1. We have only one node available like SNO(Single Node OpenShift)
    => No reason to deploy SNR/FAR/NHC
  2. We have a plan to deploy a cluster with multiple nodes, but there is only one node available during installation due to the node failures
    => We are unhappy to succeed to deploy the cluster without noticing the node failures. It should stop the operator installation.

looks like the update concern can also be solved with this:

strategy:
    rollingUpdate:
      maxSurge: 0
      maxUnavailable: 1
    type: RollingUpdate

it will force k8s to first delete one of the pods before starting a new one

I will fix it with the above. And the Workload Availability project has the Node Maintenance Operator for maintenance. So we also need to define PodDisruptionBudget like https://github.com/openshift/cluster-monitoring-operator/blob/master/assets/thanos-querier/pod-disruption-budget.yaml.
It will prevent from stopping all of the managers accidentally.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have only one node available like SNO(Single Node OpenShift)
=> No reason to deploy SNR/FAR/NHC

agree

We have a plan to deploy a cluster with multiple nodes, but there is only one node available during installation due to the node failures
=> We are unhappy to succeed to deploy the cluster without noticing the node failures. It should stop the operator installation.

hm.... yes, actually it does not make sense to succeed, it wouldn't help fixing the cluster, because the agents would not be deployed on the failed nodes.

we also need to define PodDisruptionBudget

Let's discuss this in an issue please. As with this affinity, we need to be aware of side effects. Let's not make our operators more important than they are, compared to other deployments.

- labelSelector:
matchExpressions:
- key: control-plane
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

control-plane: controller-manager is a default label which is added by operator-sdk when scaffolding a new operator. So there is a very big risk that other operator's pods also have this label, which can lead to undesired behavior. Please use the self-node-remediation-operator: "" label, which is unique for SNR and also doesn't conflict with the daemonset pods.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will use the self-node-remediation-operator: "" label. Thank you for the suggestion.

operator: In
values:
- controller-manager
topologyKey: kubernetes.io/hostname
containers:
- args:
- --health-probe-bind-address=:8081
Expand Down
12 changes: 11 additions & 1 deletion config/manager/manager.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -16,12 +16,22 @@ spec:
selector:
matchLabels:
control-plane: controller-manager
replicas: 1
replicas: 2
template:
metadata:
labels:
control-plane: controller-manager
spec:
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: control-plane
operator: In
values:
- controller-manager
topologyKey: kubernetes.io/hostname
securityContext:
runAsNonRoot: false
priorityClassName: system-cluster-critical
Expand Down
Loading