Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove duplicate placements for rook-ceph dameons due to specifying with all & specific key #2973

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

malayparida2000
Copy link
Contributor

Rook creates the rook-ceph daemon's placement by merging the "all" and specific key placements. The OCS operator specifies the default OCS tolerations and node affinity in both the "all" key and the specific key leading to duplicated placements. To avoid this, skip specifying the default OCS tolerations and node affinity with the specific key.

Rook creates some rook-ceph daemon's placement by merging the "all" and
specific key placements. The OCS operator specifies the default OCS
tolerations and node affinity in both the "all" key and the specific key
leading to duplicated placements. To avoid this, skip specifying the
default OCS tolerations and node affinity with the specific key.

Signed-off-by: Malay Kumar Parida <[email protected]>
Signed-off-by: Malay Kumar Parida <[email protected]>
@malayparida2000
Copy link
Contributor Author

@travisn @parth-gr After looking at the possible sollutions & how it works I still think the fix has to come on rook side only. As ocs-operator supports specification of placement on the storagecluster CR at spec.placement. Even if we resolve the duplicate specification by ocs-operator. A customer can still end up specifiying same tolerations ones at the "all" key and again with a specific key like "mgr", And we will end up having duplicates. As rook is the last place which creates the pod spec that should be the ideal place to remove the duplicates if any.

@malayparida2000
Copy link
Contributor Author

Test results-
osd

tolerations:
  - effect: NoSchedule
    key: node.ocs.openshift.io/storage
    operator: Equal
    value: "true"
  - effect: NoExecute
    key: node.kubernetes.io/unreachable
    operator: Exists
    tolerationSeconds: 5
  - effect: NoExecute
    key: node.kubernetes.io/not-ready
    operator: Exists
    tolerationSeconds: 300
  - effect: NoSchedule
    key: node.kubernetes.io/memory-pressure
    operator: Exists
    affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: cluster.ocs.openshift.io/openshift-storage
            operator: Exists
          - key: topology.kubernetes.io/zone
            operator: In
            values:
            - us-west-2b

osd-prepare

tolerations:
  - effect: NoSchedule
    key: node.ocs.openshift.io/storage
    operator: Equal
    value: "true"
  - effect: NoExecute
    key: node.kubernetes.io/not-ready
    operator: Exists
    tolerationSeconds: 300
  - effect: NoExecute
    key: node.kubernetes.io/unreachable
    operator: Exists
    tolerationSeconds: 300
affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: cluster.ocs.openshift.io/openshift-storage
            operator: Exists

@malayparida2000
Copy link
Contributor Author

mgr

tolerations:
  - effect: NoSchedule
    key: node.ocs.openshift.io/storage
    operator: Equal
    value: "true"
  - effect: NoExecute
    key: node.kubernetes.io/unreachable
    operator: Exists
    tolerationSeconds: 5
  - effect: NoExecute
    key: node.kubernetes.io/not-ready
    operator: Exists
    tolerationSeconds: 300
affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: cluster.ocs.openshift.io/openshift-storage
            operator: Exists
    podAntiAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
      - labelSelector:
          matchLabels:
            app: rook-ceph-mgr
            rook_cluster: openshift-storage
        topologyKey: kubernetes.io/hostname

mon

tolerations:
  - effect: NoSchedule
    key: node.ocs.openshift.io/storage
    operator: Equal
    value: "true"
  - effect: NoExecute
    key: node.kubernetes.io/unreachable
    operator: Exists
    tolerationSeconds: 5
  - effect: NoExecute
    key: node.kubernetes.io/not-ready
    operator: Exists
    tolerationSeconds: 300
affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: cluster.ocs.openshift.io/openshift-storage
            operator: Exists
    podAntiAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
      - labelSelector:
          matchExpressions:
          - key: app
            operator: In
            values:
            - rook-ceph-mon
        topologyKey: topology.kubernetes.io/zone
      - labelSelector:
          matchLabels:
            app: rook-ceph-mon
        topologyKey: kubernetes.io/hostname

@malayparida2000
Copy link
Contributor Author

mds

tolerations:
  - effect: NoSchedule
    key: node.ocs.openshift.io/storage
    operator: Equal
    value: "true"
  - effect: NoExecute
    key: node.kubernetes.io/unreachable
    operator: Exists
    tolerationSeconds: 5
  - effect: NoExecute
    key: node.kubernetes.io/not-ready
    operator: Exists
    tolerationSeconds: 300
affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: cluster.ocs.openshift.io/openshift-storage
            operator: Exists
    podAntiAffinity:
      preferredDuringSchedulingIgnoredDuringExecution:
      - podAffinityTerm:
          labelSelector:
            matchExpressions:
            - key: rook_file_system
              operator: In
              values:
              - test-storagecluster-cephfilesystem
          topologyKey: topology.kubernetes.io/zone
        weight: 100

nfs

tolerations:
  - effect: NoSchedule
    key: node.ocs.openshift.io/storage
    operator: Equal
    value: "true"
  - effect: NoExecute
    key: node.kubernetes.io/unreachable
    operator: Exists
    tolerationSeconds: 5
  - effect: NoExecute
    key: node.kubernetes.io/not-ready
    operator: Exists
    tolerationSeconds: 300
affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: cluster.ocs.openshift.io/openshift-storage
            operator: Exists
    podAntiAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
      - labelSelector:
          matchExpressions:
          - key: app
            operator: In
            values:
            - rook-ceph-nfs
        topologyKey: kubernetes.io/hostname

@malayparida2000
Copy link
Contributor Author

exporter

tolerations:
  - effect: NoSchedule
    key: node.kubernetes.io/memory-pressure
    operator: Exists
  - effect: NoExecute
    key: node.kubernetes.io/not-ready
    operator: Exists
    tolerationSeconds: 300
  - effect: NoExecute
    key: node.kubernetes.io/unreachable
    operator: Exists
    tolerationSeconds: 5
  - effect: NoSchedule
    key: node.ocs.openshift.io/storage
    operator: Equal
    value: "true"

crashcollector

tolerations:
  - effect: NoSchedule
    key: node.kubernetes.io/memory-pressure
    operator: Exists
  - effect: NoExecute
    key: node.kubernetes.io/not-ready
    operator: Exists
    tolerationSeconds: 300
  - effect: NoExecute
    key: node.kubernetes.io/unreachable
    operator: Exists
    tolerationSeconds: 5
  - effect: NoSchedule
    key: node.ocs.openshift.io/storage
    operator: Equal
    value: "true"

@malayparida2000
Copy link
Contributor Author

@travisn any idea why exporter & crashcollector do not have any node affinities on them? Inspite of node affinities being specified with the 'all' key.

@parth-gr
Copy link
Member

@travisn @parth-gr After looking at the possible sollutions & how it works I still think the fix has to come on rook side only. As ocs-operator supports specification of placement on the storagecluster CR at spec.placement. Even if we resolve the duplicate specification by ocs-operator. A customer can still end up specifiying same tolerations ones at the "all" key and again with a specific key like "mgr", And we will end up having duplicates. As rook is the last place which creates the pod spec that should be the ideal place to remove the duplicates if any.

Yes even I saw that we have mechanism already to remove duplicate env variables https://github.com/rook/rook/blob/c33f6bda7bbe5af1ccb56d2573d86fba25244dde/pkg/operator/k8sutil/pod.go#L377

@travisn
Copy link
Contributor

travisn commented Jan 23, 2025

@travisn any idea why exporter & crashcollector do not have any node affinities on them? Inspite of node affinities being specified with the 'all' key.

@malayparida2000 The crash collector and exporter have node selectors to assign them to specific nodes, based on which nodes have other ceph daemons running. Rook basically calculates their node placement instead of relying on the node affinity settings. But those daemons should inherit the tolerations.

@travisn
Copy link
Contributor

travisn commented Jan 23, 2025

@travisn @parth-gr After looking at the possible sollutions & how it works I still think the fix has to come on rook side only. As ocs-operator supports specification of placement on the storagecluster CR at spec.placement. Even if we resolve the duplicate specification by ocs-operator. A customer can still end up specifiying same tolerations ones at the "all" key and again with a specific key like "mgr", And we will end up having duplicates. As rook is the last place which creates the pod spec that should be the ideal place to remove the duplicates if any.

If OCS operator owns fixing its own duplicates, then in case the customer adds the placement that is duplicate, if the alerts bother them, then they should know to remove the duplicates that they caused, right? In any case, everything still works, it's just a recommendation to avoid the duplication and very low impact. Merging complex types automatically could be prone to errors so we want to avoid that complexity.

@malayparida2000
Copy link
Contributor Author

malayparida2000 commented Jan 24, 2025

With this PR as you can see from the taste results I have pasted, there are no longer duplicates on daemons's placement due to ocs-operator.

Copy link
Contributor

openshift-ci bot commented Jan 24, 2025

@parth-gr: changing LGTM is restricted to collaborators

In response to this:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@parth-gr
Copy link
Member

parth-gr commented Jan 24, 2025

@travisn @malayparida2000
I tried appending different duplicate values to osd prepare pod and osd po placement and also duplicate values in all placement,

But the duplicate are only taken by osd prepare pod

PS: I got the reason, the osd and mons, etc are deployments so if we apply duplicate in deployment it will create a pod with only unique once, but the prepare-pod is a job which directly has pod spec, so that's why it still kept the duplicates

Copy link
Member

@iamniting iamniting left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good to me,
/hold for @travisn review

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jan 27, 2025
@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jan 27, 2025
Copy link
Contributor

openshift-ci bot commented Jan 27, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: iamniting, malayparida2000, parth-gr

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 27, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants