Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Not able to join new server nodes to RKE2 HA cluster via AWS load balancer #7640

Open
syedsalman3753 opened this issue Jan 27, 2025 · 2 comments

Comments

@syedsalman3753
Copy link

Environmental Info:
RKE2 Version: v1.28.9+rke2r1

Node(s) CPU architecture, OS, and Version:

  • OS:
       PRETTY_NAME="Ubuntu 24.04.1 LTS"
       NAME="Ubuntu"
       VERSION_ID="24.04"
       VERSION="24.04.1 LTS (Noble Numbat)"
       VERSION_CODENAME=noble
       ID=ubuntu
       ID_LIKE=debian
       HOME_URL="https://www.ubuntu.com/"
       SUPPORT_URL="https://help.ubuntu.com/"
       BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
       PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
       UBUNTU_CODENAME=noble
       LOGO=ubuntu-logo
    
  • CPU: 8
  • MEMORY: 32GB
  • Node: Linux ip-172-31-12-161 6.8.0-1021-aws #23-Ubuntu SMP Mon Dec 9 23:59:34 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
  • RKE2_VERSION: v1.28.9+rke2r1

Cluster Configuration:
Created 3 nodes with 8vcpu & 32 GB RAM with an AWS Network load balancer

Describe the bug:
Tried to create an RKE2 HA Kubernetes cluster with an AWS Network load balancer acting as a fixed registration address.
I was able to set the rke2-server primary node but while trying to add another server point to the load balancer domain getting the below error.

INFO[0005] Handling backend connection request [control-subsequent_plane-1] 
INFO[0005] Remotedialer connected to proxy               url="wss://127.0.0.1:9345/v1-rke2/connect"
INFO[0006] Adding member control-subsequent_plane-1-190984c8=https://172.31.15.20:2380 to etcd cluster [control-plane-1-ffe640b6=https://172.31.12.161:2380] 
INFO[0006] Waiting to retrieve kube-proxy configuration; server is not ready: https://127.0.0.1:9345/v1-rke2/readyz: 500 Internal Server Error 
INFO[0006] Starting etcd to join cluster with members [control-plane-1-ffe640b6=https://172.31.12.161:2380 control-subsequent_plane-1-190984c8=https://172.31.15.20:2380] 
INFO[0011] Waiting to retrieve kube-proxy configuration; server is not ready: https://127.0.0.1:9345/v1-rke2/readyz: 500 Internal Server Error

Steps To Reproduce:

  • Installed RKE2: v1.28.9+rke2r1
  • Primary server config.yaml
      token: 89Ce43E4d793FB2c67B7F4A5f7Ca8dc8
      node-name: control-plane-1
      node-ip: 172.31.12.161
      node-label:
        - rke2-upgrade=true
      tls-san:
        - rke2-testing-lb-9a80d15c1f848017.elb.ap-south-1.amazonaws.com
      cni:
        - canal
      disable:
        - rke2-ingress-nginx
      kubelet-arg:
        - --allowed-unsafe-sysctls=net.ipv4.conf.all.src_valid_mark,net.ipv4.ip_forward
        - eviction-hard=memory.available<500Mi,nodefs.available<10%,nodefs.inodesFree<5%
        - eviction-soft=memory.available<1Gi,nodefs.available<15%,nodefs.inodesFree<10%
        - eviction-soft-grace-period=memory.available=1m,nodefs.available=1m,nodefs.inodesFree=1m
        - eviction-max-pod-grace-period=120
        - eviction-minimum-reclaim=memory.available=0Mi,nodefs.available=500Mi,imagefs.available=2Gi
      
      etcd-snapshot-retention: 15
      etcd-snapshot-schedule-cron: "0 0 * * *"
      etcd-snapshot-dir: "/mnt/rancher-k8s-data/server-1/etcd/snapshots"
      
      kube-apiserver-arg:
       - "--audit-log-path=/mnt/rancher-k8s-data/server-1/audit/audit.log"
       - "--audit-policy-file=/etc/rancher/rke2/audit-policy.yaml"
       - "--audit-log-maxage=30"
       - "--audit-log-maxbackup=30"
       - "--audit-log-maxsize=1024"
      write-kubeconfig-mode: "0644"
    
  • Secondary server config yaml
      token: 89Ce43E4d793FB2c67B7F4A5f7Ca8dc8
      server: https://rke2-testing-lb-9a80d15c1f848017.elb.ap-south-1.amazonaws.com:9345
      node-name: control-subsequent_plane-1
      node-ip: 172.31.15.20
      node-label:
        - rke2-upgrade=true
      tls-san:
        - rke2-testing-lb-9a80d15c1f848017.elb.ap-south-1.amazonaws.com
      cni:
        - canal
      disable:
        - rke2-ingress-nginx
      kubelet-arg:
        - --allowed-unsafe-sysctls=net.ipv4.conf.all.src_valid_mark,net.ipv4.ip_forward
        - eviction-hard=memory.available<500Mi,nodefs.available<10%,nodefs.inodesFree<5%
        - eviction-soft=memory.available<1Gi,nodefs.available<15%,nodefs.inodesFree<10%
        - eviction-soft-grace-period=memory.available=1m,nodefs.available=1m,nodefs.inodesFree=1m
        - eviction-max-pod-grace-period=120
        - eviction-minimum-reclaim=memory.available=0Mi,nodefs.available=500Mi,imagefs.available=2Gi
      
      etcd-snapshot-retention: 15
      etcd-snapshot-schedule-cron: "0 0 * * *"
      etcd-snapshot-dir: "/mnt/rancher-k8s-data/server-2/etcd/snapshots"
      
      kube-apiserver-arg:
       - "--audit-log-path=/mnt/rancher-k8s-data/server-2/audit/audit.log"
       - "--audit-policy-file=/etc/rancher/rke2/audit-policy.yaml"
       - "--audit-log-maxage=30"
       - "--audit-log-maxbackup=30"
       - "--audit-log-maxsize=1024"
      
      write-kubeconfig-mode: "0644"
    
    

Expected behavior:

  • The secondary server should be able to join the cluster via the load balancer domain

Actual behavior:

  • The secondary server is not able to join the cluster via the load balancer domain

Additional context / logs:
rke2-subsequent-server.log

@syedsalman3753
Copy link
Author

debug mode: Subsequent server debug logs

rke2-subsequent-server-debug-mode.log

@brandond
Copy link
Member

Check the logs under /var/log/pods.

Are you confident that you've opened all the correct ports between nodes, and these two nodes can reach each other directly - not just through the load-balancer?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants