Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade is not updating KubeadmConfigSpec. ClusterConfiguration. KubernetesVersion #11344

Open
gandhisagar opened this issue Oct 28, 2024 · 11 comments
Labels
kind/support Categorizes issue or PR as a support question. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. priority/backlog Higher priority than priority/awaiting-more-evidence.

Comments

@gandhisagar
Copy link

gandhisagar commented Oct 28, 2024

What steps did you take and what happened?

We are doing upgrade of kubernetes cluster deployed using cluster-api (capv- on vsphere infrastructure).

As part of upgrade , we are applying the following changes:

Applying clusterctl upgrade plan
Changing pre-post kubeadm commands
Changing spec.version (e.g. from 1.29.3 to 1.30.4)
Cluster is getting upgraded successfully. we can see all nodes are at 1.30.4. but KubeadmConfigSpec. ClusterConfiguration. KubernetesVersion is not getting updated automatically.

KubeadmControlplane instance

spec:
    kubeadmConfigSpec:
      clusterConfiguration:
        apiServer:
          extraArgs:
            cloud-provider: external
        controllerManager:
          extraArgs:
            cloud-provider: external
        dns: {}
        etcd:
          local:
            extraArgs:
              election-timeout: "2500"
              heartbeat-interval: "500"
        **kubernetesVersion: v1.29.3**
        networking: {}
        scheduler: {}
      files:
      - content: |
          apiVersion: v1
          kind: Pod
          metadata:
            creationTimestamp: null
            name: kube-vip
            namespace: kube-system
          spec:
            containers:
              - args:
                  - manager
                env:
                  - name: vip_arp
                    value: "true"
                  - name: port
                    value: "6443"
                  - name: vip_interface
                    value: ""
                  - name: vip_cidr
                    value: "32"
                  - name: cp_enable
                    value: "true"
                  - name: cp_namespace
                    value: kube-system
                  - name: vip_ddns
                    value: "false"
                  - name: svc_enable
                    value: "false"
                  - name: svc_leasename
                    value: plndr-svcs-lock
                  - name: svc_election
                    value: "true"
                  - name: vip_leaderelection
                    value: "true"
                  - name: vip_leasename
                    value: plndr-cp-lock
                  - name: vip_leaseduration
                    value: "15"
                  - name: vip_renewdeadline
                    value: "10"
                  - name: vip_retryperiod
                    value: "2"
                  - name: address
                    value: 192.168.1.3
                  - name: prometheus_server
                    value: :2112
                image: sspi-test.broadcom.com/registry/kube-vip/kube-vip:v0.6.4
                imagePullPolicy: IfNotPresent
                name: kube-vip
                resources: {}
                securityContext:
                  capabilities:
                    add:
                      - NET_ADMIN
                      - NET_RAW
                volumeMounts:
                  - mountPath: /etc/kubernetes/admin.conf
                    name: kubeconfig
                  - mountPath: /etc/hosts
                    name: etchosts
            hostNetwork: true
            volumes:
              - hostPath:
                  path: /etc/kubernetes/admin.conf
                name: kubeconfig
              - hostPath:
                  path: /etc/kube-vip.hosts
                  type: File
                name: etchosts
          status: {}
        owner: root:root
        path: /etc/kubernetes/manifests/kube-vip.yaml
        permissions: "0644"
      - content: 127.0.0.1 localhost kubernetes
        owner: root:root
        path: /etc/kube-vip.hosts
        permissions: "0644"
      - content: |
         <removed>
        owner: root:root
        path: /etc/pre-kubeadm-commands/50-kube-vip-prepare.sh
        permissions: "0700"
      format: cloud-config
      initConfiguration:
        localAPIEndpoint: {}
        nodeRegistration:
          criSocket: /var/run/crio/crio.sock
          imagePullPolicy: IfNotPresent
          kubeletExtraArgs:
            cloud-provider: external
          name: '{{ local_hostname }}'
      joinConfiguration:
        discovery: {}
        nodeRegistration:
          criSocket: /var/run/crio/crio.sock
          imagePullPolicy: IfNotPresent
          kubeletExtraArgs:
            cloud-provider: external
          name: '{{ local_hostname }}'
      postKubeadmCommands:
      - removed
      preKubeadmCommands:
      - removed
      users:
      - name: capv
        sshAuthorizedKeys:
        -  removed
    machineTemplate:
      infrastructureRef:
        apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
        kind: VSphereMachineTemplate
        name: ssp-cluster
        namespace: ssp-cluster
      metadata: {}
    replicas: 1
    rolloutStrategy:
      rollingUpdate:
        maxSurge: 1
      type: RollingUpdate
    **version: v1.30.4**

Machine object (spec.version) is also 1.30.4

spec:
bootstrap:
configRef:
apiVersion: bootstrap.cluster.x-k8s.io/v1beta1
kind: KubeadmConfig
name: ssp-cluster-rbjxg
namespace: ssp-cluster
uid: 2f9e1f34-c625-4b3d-a12d-1f2aa44ac084
dataSecretName: ssp-cluster-rbjxg
clusterName: ssp-cluster
infrastructureRef:
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: VSphereMachine
name: ssp-cluster-rbjxg
namespace: ssp-cluster
uid: fdf5bbf9-4fe9-4d58-9bad-d5efbb61326f
nodeDeletionTimeout: 10s
providerID: vsphere://42263451-3edc-5138-04d7-a7ea59b9946d
version: v1.30.4

We are following this : https://cluster-api.sigs.k8s.io/tasks/upgrading-clusters#how-to-upgrade-the-kubernetes-control-plane-version

When we tried to update it manually , it fails as forbidden to update this field.

Any suggestion or any specific step in upgrade we are missing out ?

So far tried

  1. Manual Upgrade : FAILED with error: spec.kubeadmConfigSpec.clusterConfiguration.kubernetesVersion: Forbidden: cannot be modified
  2. Force-reconcile : By adding annotation to KubeadmControlPlane (cluster.x-k8s.io/force-reconcile: "true") , No luck
  3. Restarted all pods on management cluster

What did you expect to happen?

We were expecting if KubeadmConfigSpec. ClusterConfiguration. KubernetesVersion is not modifiable then it should automatically get updated to 1.30.4 after upgrade.

Cluster API version

clusterctl version:
clusterctl version: &version.Info{Major:"1", Minor:"6", GitVersion:"v1.6.3", GitCommit:"965ffa1d94230b8127245df750a99f09eab9dd97", GitTreeState:"clean", BuildDate:"2024-03-12T17:15:08Z", GoVersion:"go1.21.8", Compiler:"gc", Platform:"linux/amd64"}

bootstrap-kubeadm: v1.7.1
cert-manager: v1.14.2
cluster-api: v1.7.1
control-plane-kubeadm: v1.7.1
infrastructure-vsphere: v1.10.0
ipam-incluster: v0.1.0

Kubernetes version

1.29.3 -> 1.30.4 Upgrade

Anything else you would like to add?

root@sspi-test:/image/VMware-SSP-Installer-5.0.0.0.0.80589143/phoenix# kubectl get kubeadmcontrolplane ssp-cluster -n ssp-cluster
NAME CLUSTER INITIALIZED API SERVER AVAILABLE REPLICAS READY UPDATED UNAVAILABLE AGE VERSION
ssp-cluster ssp-cluster true true 1 1 1 0 63m v1.30.4

root@sspi-test:/image/VMware-SSP-Installer-5.0.0.0.0.80589143/phoenix# kubectl get cluster -A
NAMESPACE NAME CLUSTERCLASS PHASE AGE VERSION
ssp-cluster ssp-cluster Provisioned 63m

Label(s) to be applied

/kind bug
One or more /area label. See https://github.com/kubernetes-sigs/cluster-api/labels?q=area for the list of labels.

@k8s-ci-robot k8s-ci-robot added kind/bug Categorizes issue or PR as related to a bug. needs-priority Indicates an issue lacks a `priority/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Oct 28, 2024
@k8s-ci-robot
Copy link
Contributor

This issue is currently awaiting triage.

If CAPI contributors determine this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@gandhisagar
Copy link
Author

Or this field will never get updated during upgrade as kubeadm init happened during deployment and this field is only used while initializing the cluster?

@neolit123
Copy link
Member

neolit123 commented Oct 29, 2024

CAPI does have this:

log.V(3).Info("Altering ClusterConfiguration.KubernetesVersion", "kubernetesVersion", config.Spec.ClusterConfiguration.KubernetesVersion)

but it's only updated if the value is "".

Or this field will never get updated during upgrade as kubeadm init happened during deployment and this field is only used while initializing the cluster?

searching the code base, i'd say it's not updated continuously.
what is your use case to track the kubeadm config version?

@sbueringer
Copy link
Member

This question was also posted at least in 2 Slack channels. @gandhisagar can you please de-duplicate? It's not very efficient for folks trying to help.

@gandhisagar
Copy link
Author

@neolit123 We are automating the upgrade using capi/capv in our enterprise product. In upgrade we are changing the spec.version as described in documentation but KubeadmConfigSpec. ClusterConfiguration. KubernetesVersion remains the one which is used during installation. I am trying to understand the implication here , If it remains the old , does production will have any effect ?

@gandhisagar
Copy link
Author

@sbueringer Sure, cold response there so will watch for a day and delete it from slack channel.

@fabriziopandini
Copy link
Member

echoing what I answered to slack channel (and please, stop duplicating request, it doesn't help you to solve your problem and makes everyone else life more complicated)

you should not set the KubeadmConfigSpec. ClusterConfiguration. KubernetesVersion field; if you leave it empty CABPK will use the the top level version and upgrades will just work.

Note: we should probably also remove the field from the API, but we are stuck in refactoring this API by a few other ongoing discussion about

@fabriziopandini fabriziopandini added kind/support Categorizes issue or PR as a support question. priority/backlog Higher priority than priority/awaiting-more-evidence. labels Oct 30, 2024
@k8s-ci-robot k8s-ci-robot removed the needs-priority Indicates an issue lacks a `priority/foo` label and requires one. label Oct 30, 2024
@fabriziopandini fabriziopandini removed the kind/bug Categorizes issue or PR as related to a bug. label Oct 30, 2024
@gandhisagar
Copy link
Author

@fabriziopandini This is upgrade , not deployment. This field is populated during deployment but not getting changed during upgrade. When I say upgrade it means we are upgrading the template from 1.29 to 1.30. Its in place upgrade , not blue-green. How to make it blank during upgrade ?

As you may have noticed , Already deleted the message .

@sbueringer
Copy link
Member

I think the only way to unset this field on a KCP object that already has it is to disable the KCP validation webhook, unset the field and then enable the webhook again.

@gandhisagar
Copy link
Author

gandhisagar commented Oct 30, 2024

@sbueringer Is there any procedure I can follow or do you recommend not to do that in production. We are fine keeping this old , Just trying to see if there is any impact with this field being old in production , appreciate the help.

@sbueringer
Copy link
Member

sbueringer commented Oct 31, 2024

Not sure what the impact is. As far as I can tell this kubernetesVersion gets passed from the KCP object to the KubeadmConfigs and from there onto Machines and then used by kubeadm. (but maybe I"m misreading our code)

I would probably try to verify which version effectively ends up in the config file used by kubeadm when creating new Nodes with kubeadm join.

If the result is that kubeadm join gets a config file with the wrong Kubernetes version, the question is what impact that has on kubeadm behavior (I have no idea).

If overall the result of that investigation is that it's problematic, the only way to do this right now is: "I think the only way to unset this field on a KCP object that already has it is to disable the KCP validation webhook, unset the field and then enable the webhook again.". We don't have any further documentation.

What we maybe could consider is allowing folks to unset the kubernetesVersion field within ClusterConfiguration (but this requires a code change). I assume today the validating webhook on KCP blocks unsetting the version? (based on what you wrote above)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/support Categorizes issue or PR as a support question. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. priority/backlog Higher priority than priority/awaiting-more-evidence.
Projects
None yet
Development

No branches or pull requests

5 participants