Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[cinder-csi-plugin] Multi region/clouds support for controllerServer #2551

Merged
merged 7 commits into from
Jul 29, 2024

Conversation

MatthieuFin
Copy link
Contributor

@MatthieuFin MatthieuFin commented Feb 14, 2024

Affected binaries:

  • cinder-csi-plugin

What this PR does / why we need it:
Enable multi region / multi clouds support on cinder-csi-plugin controllerServer.

Which issue this PR fixes(if applicable):
fixes #2035

Special notes for reviewers:
I have a kubernetes cluster stretch on multiples openstack clusters (3 OVH and 1 onPremise linked by a private dark fiber).
That why my approche is more "multi clouds" than only "multi region"

I don't touch nodeServer behavior (I simply deploy a Daemonset with a nodeSelector on a label whitch select nodes based on their hypervisor, and mount a dedicated secret with openstack creds associated)

The purpose is on controllerServer which handle all kube requests to manage pvc, he had be able to CRUD volumes on all managed openstack cluster.

I propose to use gcfg subsection feature to be abble to handle multiple "sections" Global in config files. This permit to be backaward compatible with configfiles syntax.

I choose to use StorageClass parameters to deal with cloud selection (i add an optionnal field cloud which containt config Global subsection name). In this way when a createVolumeRequest income controllerServer could identify OSInstance to used based on match between SC field parametes.cloud and his config Global subsections.

Issue
This PR is currently a MVP, which permit to create volumes and attach them to correct VM / OpenStack cluster, but i have a bit issue to troubleshoot, indeed all csi spec message don't contains volume_context or parameters, especially for DeleteVolumeRequest which only contain volumeID.

I have 2 ideas to troubleshoot this issue:

  • query each managed OS clouds to find one who had a volume with searched ID (That's linear complexity)
  • implement a kubernetes client and as kubernetes API to find (bad idea, I just realized that volume_id is openstack id not pv name)
  • maybe use secrets field as it seems suggested in external-provisioner discussion

Release note:

feat add multi regions/clouds support.

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Feb 14, 2024
@k8s-ci-robot
Copy link
Contributor

Welcome @MatthieuFin!

It looks like this is your first PR to kubernetes/cloud-provider-openstack 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes/cloud-provider-openstack has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

@k8s-ci-robot k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Feb 14, 2024
@k8s-ci-robot
Copy link
Contributor

Hi @MatthieuFin. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Feb 14, 2024
@MatthieuFin
Copy link
Contributor Author

As mentioned here and after some local tests it seems to be the proper implementation way.

I propose to avoid usage of storageClass parameter.cloud and volume context to propagate config cloud name between gRPC calls, and instead use a secret with a key cloud which contain the Global config subsection name, which permit to controller to retrieve proper credentials from his configuration.

@dulek
Copy link
Contributor

dulek commented Feb 16, 2024

/ok-to-test

Sorry, I won't have time to take a closer look before my vacations, will be back in 2 weeks. @kayrus might be the proper person to check this PR out.

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Feb 16, 2024
@dulek
Copy link
Contributor

dulek commented Feb 16, 2024

Looks like the tests have to be fixed here.

@MatthieuFin
Copy link
Contributor Author

/retest

@k8s-ci-robot k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Feb 16, 2024
@MatthieuFin
Copy link
Contributor Author

/retest

@MatthieuFin MatthieuFin marked this pull request as ready for review February 19, 2024 20:07
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Feb 19, 2024
@MatthieuFin
Copy link
Contributor Author

Hi,
After a few days of use on my cluster, I encountered an issue, regard allowedTopologies set on StorageClass (first of all don't forget to enable feature gate Topology add argument --feature-gates Topology=true on container csi-provisioner) which permit to push allowedTopologies constraints on pv .spec.nodeAffinity and keep affinity on pod reschedule.

My issue concern topology keys, currently only available is topology.cinder.csi.openstack.org/zone and obviously my provider OVH use same zone name (nova) on each of their openstack cluster, so I need to add possibility to manage another key to differentiate them.

@MatthieuFin MatthieuFin marked this pull request as draft February 26, 2024 09:52
@k8s-ci-robot k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Feb 26, 2024
@MatthieuFin MatthieuFin marked this pull request as ready for review February 26, 2024 14:38
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Feb 26, 2024
@sebastienmusso
Copy link

Hi, After a few days of use on my cluster, I encountered an issue, regard allowedTopologies set on StorageClass (first of all don't forget to enable feature gate Topology add argument --feature-gates Topology=true on container csi-provisioner) which permit to push allowedTopologies constraints on pv .spec.nodeAffinity and keep affinity on pod reschedule.

My issue concern topology keys, currently only available is topology.cinder.csi.openstack.org/zone and obviously my provider OVH use same zone name (nova) on each of their openstack cluster, so I need to add possibility to manage another key to differentiate them.

we have exactly the same problem (several openstack with same availability zone name)

I'm interesseted by this feature let me know if you need some test feedback

regards

@MatthieuFin
Copy link
Contributor Author

Hi, After a few days of use on my cluster, I encountered an issue, regard allowedTopologies set on StorageClass (first of all don't forget to enable feature gate Topology add argument --feature-gates Topology=true on container csi-provisioner) which permit to push allowedTopologies constraints on pv .spec.nodeAffinity and keep affinity on pod reschedule.
My issue concern topology keys, currently only available is topology.cinder.csi.openstack.org/zone and obviously my provider OVH use same zone name (nova) on each of their openstack cluster, so I need to add possibility to manage another key to differentiate them.

we have exactly the same problem (several openstack with same availability zone name)

I'm interesseted by this feature let me know if you need some test feedback

regards

Hi !

You could use my last commit to build image, if you'r lazy to build it you could use mine exposed here
Personally I set labels on my nodes topology.kubernetes.io/region which contains my region name, so i have a daemonSet with nodeSelector on this label to deploy my nodeplugin (nodeServers) and i add args to container --additionnal-topology topology.kubernetes.io/region=GRA9 for example

This permit to create a storageClass with following allowedTopology:

allowedTopologies:
- matchLabelExpressions:
  - key: topology.cinder.csi.openstack.org/zone
    values:
    - nova
  - key: topology.kubernetes.io/region
    values:
    - GRA9

Moreover you have to enable option ignore-volume-az in controllerServer configuration:

[BlockStorage]
bs-version=v3
ignore-volume-az=True

[Global]
auth-url="https://auth.cloud.ovh.net/v3"
....

In that way if you print your CSINode object you shoul see

    topologyKeys:
    - topology.cinder.csi.openstack.org/zone
    - topology.kubernetes.io/region

Which permit you to use correctly your storage class based on another param than availability zone name

@MatthieuFin
Copy link
Contributor Author

Thanks to @zetaab on slack #provider-openstack we identify an issue on listVolumes requests when --max-entries is set on external-attacher and lower than openstack volumes number. I will try to troubleshoot that.

I just added some unit tests on the ListVolumes function with and without using --max-entries usage in a multi-cloud configuration. To be able to say that this function should work well.

Moreover, I tested it on my cluster and all volumes are correctly listed with pagination feature.

Implementation sounds good now for me !

@dulek , @kayrus , @zetaab , @jichenjc , @mdbooth Please could we have feedbacks ?

@Hybrid512
Copy link

Looks like @MatthieuFin fixed most of the issues, what are the reasons why this code can't be merged ?

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 2, 2024
@jichenjc
Copy link
Contributor

jichenjc commented Jul 3, 2024

looks like there is a merge conflict, the others ok to me now

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jul 3, 2024
MatthieuFin and others added 7 commits July 3, 2024 09:12
…ubernetes#2035)

* feat(cinder-csi-plugin-controller): add support --cloud-name arg which permit to manage multiple config Global sections

* feat(cinder-csi-plugin-controller): add support of key `cloud` from secret specified in storageClass to reference specific config Global section

* feat(cinder-csi-plugin-node): add support --cloud-name arg which permit to specify one of config Global section

Signed-off-by: MatthieuFin <[email protected]>
…ubernetes#2035)

* feat(cinder-csi-plugin-node): Additionnal topology keys `--additionnal-topology` are announced on NodeGetInfo to create proper CSINode object
    and could used in storage class allowedTopologies field, in combination with csi-provisioner `--feature-gates Topology=true`
    created PV will have nodeAffinity set with topologies presents in storageClass allowedTopologies and in `--additionnal-topology` argument.

Signed-off-by: MatthieuFin <[email protected]>
add documentation and examples about multi cloud support configuration

Co-authored-by: sebastienmusso <[email protected]>
Co-authored-by: FlorentLaloup <[email protected]>
Signed-off-by: MatthieuFin <[email protected]>
…ies` arg

Add unit tests cases on listvolumes function in multicloud configuration
with and without `--max-entries` arg

Signed-off-by: MatthieuFin <[email protected]>
@k8s-ci-robot k8s-ci-robot removed lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Jul 3, 2024
@MatthieuFin
Copy link
Contributor Author

looks like there is a merge conflict, the others ok to me now

/lgtm

Rebase done

@jichenjc
Copy link
Contributor

jichenjc commented Jul 3, 2024

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jul 3, 2024
@MatthieuFin
Copy link
Contributor Author

Hello there !
Can we hope that this PR will be merged ?
Personally, I need this feature in my production environment since couple month, in my POV it works pretty well, and maintaining a fork with rebase to do and conflicts to resolve is quite painful.

@jichenjc
Copy link
Contributor

I can approve but I still need folks to take a look
so if no objection until 3-5 days later then I will go ahead and merge this

@MatthieuFin
Copy link
Contributor Author

No problem !
I just wanted an update on this PR state. Thanks for it.
Take your time, I prefer a proper review

@devfaz
Copy link

devfaz commented Jul 29, 2024

Hi, is there any todo left to get this merged?

@jichenjc
Copy link
Contributor

/approve

good enough time for people to comment if they have any objection

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jichenjc

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 29, 2024
@k8s-ci-robot k8s-ci-robot merged commit 515b4c9 into kubernetes:master Jul 29, 2024
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[cinder-csi-plugin] Multi-region support for provisioning Cinder storage.
9 participants