Remove duplicate placements for rook-ceph dameons due to specifying with all & specific key #2973

malayparida2000 · 2025-01-22T16:39:57Z

Rook creates the rook-ceph daemon's placement by merging the "all" and specific key placements. The OCS operator specifies the default OCS tolerations and node affinity in both the "all" key and the specific key leading to duplicated placements. To avoid this, skip specifying the default OCS tolerations and node affinity with the specific key.

Rook creates some rook-ceph daemon's placement by merging the "all" and specific key placements. The OCS operator specifies the default OCS tolerations and node affinity in both the "all" key and the specific key leading to duplicated placements. To avoid this, skip specifying the default OCS tolerations and node affinity with the specific key. Signed-off-by: Malay Kumar Parida <[email protected]>

Signed-off-by: Malay Kumar Parida <[email protected]>

malayparida2000 · 2025-01-23T10:37:27Z

@travisn @parth-gr After looking at the possible sollutions & how it works I still think the fix has to come on rook side only. As ocs-operator supports specification of placement on the storagecluster CR at spec.placement. Even if we resolve the duplicate specification by ocs-operator. A customer can still end up specifiying same tolerations ones at the "all" key and again with a specific key like "mgr", And we will end up having duplicates. As rook is the last place which creates the pod spec that should be the ideal place to remove the duplicates if any.

malayparida2000 · 2025-01-23T13:04:08Z

Test results-
osd

tolerations:
  - effect: NoSchedule
    key: node.ocs.openshift.io/storage
    operator: Equal
    value: "true"
  - effect: NoExecute
    key: node.kubernetes.io/unreachable
    operator: Exists
    tolerationSeconds: 5
  - effect: NoExecute
    key: node.kubernetes.io/not-ready
    operator: Exists
    tolerationSeconds: 300
  - effect: NoSchedule
    key: node.kubernetes.io/memory-pressure
    operator: Exists

    affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: cluster.ocs.openshift.io/openshift-storage
            operator: Exists
          - key: topology.kubernetes.io/zone
            operator: In
            values:
            - us-west-2b

osd-prepare

tolerations:
  - effect: NoSchedule
    key: node.ocs.openshift.io/storage
    operator: Equal
    value: "true"
  - effect: NoExecute
    key: node.kubernetes.io/not-ready
    operator: Exists
    tolerationSeconds: 300
  - effect: NoExecute
    key: node.kubernetes.io/unreachable
    operator: Exists
    tolerationSeconds: 300

affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: cluster.ocs.openshift.io/openshift-storage
            operator: Exists

malayparida2000 · 2025-01-23T13:07:49Z

mgr

tolerations:
  - effect: NoSchedule
    key: node.ocs.openshift.io/storage
    operator: Equal
    value: "true"
  - effect: NoExecute
    key: node.kubernetes.io/unreachable
    operator: Exists
    tolerationSeconds: 5
  - effect: NoExecute
    key: node.kubernetes.io/not-ready
    operator: Exists
    tolerationSeconds: 300

affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: cluster.ocs.openshift.io/openshift-storage
            operator: Exists
    podAntiAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
      - labelSelector:
          matchLabels:
            app: rook-ceph-mgr
            rook_cluster: openshift-storage
        topologyKey: kubernetes.io/hostname

mon

tolerations:
  - effect: NoSchedule
    key: node.ocs.openshift.io/storage
    operator: Equal
    value: "true"
  - effect: NoExecute
    key: node.kubernetes.io/unreachable
    operator: Exists
    tolerationSeconds: 5
  - effect: NoExecute
    key: node.kubernetes.io/not-ready
    operator: Exists
    tolerationSeconds: 300

affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: cluster.ocs.openshift.io/openshift-storage
            operator: Exists
    podAntiAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
      - labelSelector:
          matchExpressions:
          - key: app
            operator: In
            values:
            - rook-ceph-mon
        topologyKey: topology.kubernetes.io/zone
      - labelSelector:
          matchLabels:
            app: rook-ceph-mon
        topologyKey: kubernetes.io/hostname

malayparida2000 · 2025-01-23T13:10:58Z

mds

tolerations:
  - effect: NoSchedule
    key: node.ocs.openshift.io/storage
    operator: Equal
    value: "true"
  - effect: NoExecute
    key: node.kubernetes.io/unreachable
    operator: Exists
    tolerationSeconds: 5
  - effect: NoExecute
    key: node.kubernetes.io/not-ready
    operator: Exists
    tolerationSeconds: 300

affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: cluster.ocs.openshift.io/openshift-storage
            operator: Exists
    podAntiAffinity:
      preferredDuringSchedulingIgnoredDuringExecution:
      - podAffinityTerm:
          labelSelector:
            matchExpressions:
            - key: rook_file_system
              operator: In
              values:
              - test-storagecluster-cephfilesystem
          topologyKey: topology.kubernetes.io/zone
        weight: 100

nfs

tolerations:
  - effect: NoSchedule
    key: node.ocs.openshift.io/storage
    operator: Equal
    value: "true"
  - effect: NoExecute
    key: node.kubernetes.io/unreachable
    operator: Exists
    tolerationSeconds: 5
  - effect: NoExecute
    key: node.kubernetes.io/not-ready
    operator: Exists
    tolerationSeconds: 300

affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: cluster.ocs.openshift.io/openshift-storage
            operator: Exists
    podAntiAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
      - labelSelector:
          matchExpressions:
          - key: app
            operator: In
            values:
            - rook-ceph-nfs
        topologyKey: kubernetes.io/hostname

malayparida2000 · 2025-01-23T13:13:29Z

exporter

tolerations:
  - effect: NoSchedule
    key: node.kubernetes.io/memory-pressure
    operator: Exists
  - effect: NoExecute
    key: node.kubernetes.io/not-ready
    operator: Exists
    tolerationSeconds: 300
  - effect: NoExecute
    key: node.kubernetes.io/unreachable
    operator: Exists
    tolerationSeconds: 5
  - effect: NoSchedule
    key: node.ocs.openshift.io/storage
    operator: Equal
    value: "true"

crashcollector

tolerations:
  - effect: NoSchedule
    key: node.kubernetes.io/memory-pressure
    operator: Exists
  - effect: NoExecute
    key: node.kubernetes.io/not-ready
    operator: Exists
    tolerationSeconds: 300
  - effect: NoExecute
    key: node.kubernetes.io/unreachable
    operator: Exists
    tolerationSeconds: 5
  - effect: NoSchedule
    key: node.ocs.openshift.io/storage
    operator: Equal
    value: "true"

malayparida2000 · 2025-01-23T13:14:35Z

@travisn any idea why exporter & crashcollector do not have any node affinities on them? Inspite of node affinities being specified with the 'all' key.

parth-gr · 2025-01-23T13:15:09Z

@travisn @parth-gr After looking at the possible sollutions & how it works I still think the fix has to come on rook side only. As ocs-operator supports specification of placement on the storagecluster CR at spec.placement. Even if we resolve the duplicate specification by ocs-operator. A customer can still end up specifiying same tolerations ones at the "all" key and again with a specific key like "mgr", And we will end up having duplicates. As rook is the last place which creates the pod spec that should be the ideal place to remove the duplicates if any.

Yes even I saw that we have mechanism already to remove duplicate env variables https://github.com/rook/rook/blob/c33f6bda7bbe5af1ccb56d2573d86fba25244dde/pkg/operator/k8sutil/pod.go#L377

travisn · 2025-01-23T18:21:00Z

@travisn any idea why exporter & crashcollector do not have any node affinities on them? Inspite of node affinities being specified with the 'all' key.

@malayparida2000 The crash collector and exporter have node selectors to assign them to specific nodes, based on which nodes have other ceph daemons running. Rook basically calculates their node placement instead of relying on the node affinity settings. But those daemons should inherit the tolerations.

travisn · 2025-01-23T18:29:29Z

@travisn @parth-gr After looking at the possible sollutions & how it works I still think the fix has to come on rook side only. As ocs-operator supports specification of placement on the storagecluster CR at spec.placement. Even if we resolve the duplicate specification by ocs-operator. A customer can still end up specifiying same tolerations ones at the "all" key and again with a specific key like "mgr", And we will end up having duplicates. As rook is the last place which creates the pod spec that should be the ideal place to remove the duplicates if any.

If OCS operator owns fixing its own duplicates, then in case the customer adds the placement that is duplicate, if the alerts bother them, then they should know to remove the duplicates that they caused, right? In any case, everything still works, it's just a recommendation to avoid the duplication and very low impact. Merging complex types automatically could be prone to errors so we want to avoid that complexity.

malayparida2000 · 2025-01-24T05:00:04Z

With this PR as you can see from the taste results I have pasted, there are no longer duplicates on daemons's placement due to ocs-operator.

openshift-ci · 2025-01-24T11:45:33Z

@parth-gr: changing LGTM is restricted to collaborators

In response to this:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

parth-gr · 2025-01-24T14:20:40Z

@travisn @malayparida2000
I tried appending different duplicate values to osd prepare pod and osd po placement and also duplicate values in all placement,

But the duplicate are only taken by osd prepare pod

PS: I got the reason, the osd and mons, etc are deployments so if we apply duplicate in deployment it will create a pod with only unique once, but the prepare-pod is a job which directly has pod spec, so that's why it still kept the duplicates

iamniting

looks good to me,
/hold for @travisn review

openshift-ci · 2025-01-27T05:46:25Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: iamniting, malayparida2000, parth-gr

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [iamniting]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

malayparida2000 force-pushed the duplicate_placement branch from 5e55de8 to b79310d Compare January 23, 2025 08:08

Fix unit tests

8daca3b

Signed-off-by: Malay Kumar Parida <[email protected]>

malayparida2000 requested review from travisn and iamniting January 24, 2025 05:00

parth-gr approved these changes Jan 24, 2025

View reviewed changes

iamniting approved these changes Jan 27, 2025

View reviewed changes

openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jan 27, 2025

openshift-ci bot assigned iamniting Jan 27, 2025

openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jan 27, 2025

openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 27, 2025

malayparida2000 mentioned this pull request Jan 27, 2025

osd: fix duplicate tolerations in key rotation pods rook/rook#15323

Closed

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove duplicate placements for rook-ceph dameons due to specifying with all & specific key #2973

Remove duplicate placements for rook-ceph dameons due to specifying with all & specific key #2973

malayparida2000 commented Jan 22, 2025

malayparida2000 commented Jan 23, 2025

malayparida2000 commented Jan 23, 2025

malayparida2000 commented Jan 23, 2025

malayparida2000 commented Jan 23, 2025

malayparida2000 commented Jan 23, 2025

malayparida2000 commented Jan 23, 2025

parth-gr commented Jan 23, 2025

travisn commented Jan 23, 2025

travisn commented Jan 23, 2025 •

edited

Loading

malayparida2000 commented Jan 24, 2025 •

edited

Loading

openshift-ci bot commented Jan 24, 2025

parth-gr commented Jan 24, 2025 •

edited

Loading

iamniting left a comment

openshift-ci bot commented Jan 27, 2025

Remove duplicate placements for rook-ceph dameons due to specifying with all & specific key #2973

Are you sure you want to change the base?

Remove duplicate placements for rook-ceph dameons due to specifying with all & specific key #2973

Conversation

malayparida2000 commented Jan 22, 2025

malayparida2000 commented Jan 23, 2025

malayparida2000 commented Jan 23, 2025

malayparida2000 commented Jan 23, 2025

malayparida2000 commented Jan 23, 2025

malayparida2000 commented Jan 23, 2025

malayparida2000 commented Jan 23, 2025

parth-gr commented Jan 23, 2025

travisn commented Jan 23, 2025

travisn commented Jan 23, 2025 • edited Loading

malayparida2000 commented Jan 24, 2025 • edited Loading

openshift-ci bot commented Jan 24, 2025

parth-gr commented Jan 24, 2025 • edited Loading

iamniting left a comment

Choose a reason for hiding this comment

openshift-ci bot commented Jan 27, 2025

travisn commented Jan 23, 2025 •

edited

Loading

malayparida2000 commented Jan 24, 2025 •

edited

Loading

parth-gr commented Jan 24, 2025 •

edited

Loading