Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug: kubelet server certificates does not include keepalived VIP #158

Open
moray95 opened this issue Aug 28, 2023 · 11 comments
Open

bug: kubelet server certificates does not include keepalived VIP #158

moray95 opened this issue Aug 28, 2023 · 11 comments
Labels
bug Something isn't working

Comments

@moray95
Copy link

moray95 commented Aug 28, 2023

Summary

When using HA setup with Keeplived, the server certificates provisioned for Kubelet does not include the Keepalived VIP. This causes TLS verification issues when performing various operations like viewing logs or port forwarding on the current leader.

Issue Type

Bug Report

Ansible Version

ansible [core 2.14.6]
  config file = /Users/moray/.ansible.cfg
  configured module search path = ['/Users/moray/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
  ansible python module location = /opt/homebrew/Cellar/ansible/7.6.0/libexec/lib/python3.11/site-packages/ansible
  ansible collection location = /Users/moray/.ansible/collections:/usr/share/ansible/collections
  executable location = /opt/homebrew/bin/ansible
  python version = 3.11.4 (main, Jul 25 2023, 17:36:13) [Clang 14.0.3 (clang-1403.0.22.14.1)] (/opt/homebrew/Cellar/ansible/7.6.0/libexec/bin/python3.11)
  jinja version = 3.1.2
  libyaml = True

Steps to Reproduce

  1. Install RKE2 (sample playbook below)
- hosts: rke
  become: true
  roles:
    - role: lablabs.rke2
  vars:
    rke2_ha_mode: true
    rke2_ha_mode_keepalived: true
    rke2_version: v1.26.7+rke2r1
    rke2_install_bash_url: https://get.rke2.io
    rke2_api_ip: 10.64.0.9
    rke2_disable:
      - rke2-ingress-nginx
    rke2_cni: canal
    rke2_cluster_group_name: rke
    rke2_servers_group_name: rke_master
    # Ansible group including worker nodes
    rke2_agents_group_name: rke_worker
    rke2_server_options:
      - "disable-cloud-controller: true"
  1. Try viewing logs of any pod on the current Keepalived leader

Expected Results

The TLS certificate generated for Kubelet includes the Keepalived VIP (10.64.0.9 in the example above), issuing kubectl logs and kubectl port-forward command on pods on the current leader works without problem.

Actual Results

The TLS certificate for Kubelet does not include the Keepalived VIP (10.64.0.9 in the example above). Issuing kubectl logs or kubectl port-forward commands on pods on the current leader results in the following error:

Error from server: Get "https://10.64.0.9:10250/containerLogs/kube-system/kube-proxy-master-0/kube-proxy": tls: failed to verify certificate: x509: certificate is valid for 127.0.0.1, 10.64.0.10, not 10.64.0.9

Additional information:

  • The API server serving certificate does include the VIP.
  • The leader's internal ip address always shows up as the VIP.
  • I have tried setting RKE2 options node-ip and advertise-address to the non-virtual ip but to no avail.
@moray95 moray95 added the bug Something isn't working label Aug 28, 2023
@kubilaykaptanoglu
Copy link

kubilaykaptanoglu commented Aug 29, 2023

Hi moray95,

Can you set this variable and try again?

rke2_additional_sans:
  - "10.64.0.9"

@moray95
Copy link
Author

moray95 commented Aug 29, 2023

@kubilaykaptanoglu I have tried this but sadly it does not work. The 10.64.0.9 is already present in tls-san without setting this because rke2_api_ip is automatically added here.

@kubilaykaptanoglu
Copy link

I couldn't reproduce this problem. Especially I tried with your configuration.

@moray95
Copy link
Author

moray95 commented Aug 31, 2023

Out of curiosity, does Kubernetes show the VIP as the node internal address?

@kubilaykaptanoglu
Copy link

Other nodes are using VIP as server address
image
This image is from master-01 server
image
I turned off master-01 and VIP moved to master-02
image

In summary, We use VIP for load balancing between master nodes. if i understood the your question correctly. 'kubectl get node -o wide' is output doesn't show VIP as the node internal address.

I'm actually going to fork this project and do this balancing with HaProxy.

@moray95
Copy link
Author

moray95 commented Aug 31, 2023

Thanks for the info.

I guess here is the problem:

'kubectl get node -o wide' is output doesn't show VIP as the node internal address.

On my setup, I see the VIP as the node internal address. When VIP isn't used, certificates shouldn't cause a problem as the non-virtual ip are is properly added to the certificate.

@kubilaykaptanoglu
Copy link

Can you share your hosts(inventory) file with me?

@moray95
Copy link
Author

moray95 commented Aug 31, 2023

Sure, here it is:

rke_master:
  vars:
    rke2_type: server
  hosts:
    master-0:
      ansible_host: 10.64.0.10
    master-1:
      ansible_host: 10.64.0.11
    master-2:
      ansible_host: 10.64.0.12

rke_worker:
  vars:
    rke2_type: agent
  hosts:
    worker-0:
      ansible_host: 10.64.0.100
    worker-1:
      ansible_host: 10.64.0.101
    worker-2:
      ansible_host: 10.64.0.102
    worker-3:
      ansible_host: 10.64.0.103
    worker-4:
      ansible_host: 10.64.0.104

rke:
  children:
    rke_master:
    rke_worker:

@kubilaykaptanoglu
Copy link

If possible can you do a clean installation?

Before installation you should run these commands on all servers;

/usr/local/bin/rke2-uninstall.sh
rm -rf /etc/systemd/system/rke2-server.service

My result:
image

my example config:

- hosts: rke
  become: true
  remote_user: ubuntu
  roles:
    - role: lablabs.rke2
  vars:
    rke2_ha_mode: true
    rke2_ha_mode_keepalived: true
    rke2_version: v1.27.4+rke2r1
    rke2_install_bash_url: https://get.rke2.io
    rke2_api_ip: 10.55.0.53
    rke2_cni: canal
    rke2_cluster_group_name: rke
    rke2_servers_group_name: rke_master
    # Ansible group including worker nodes
    rke2_agents_group_name: rke_worker
    rke2_server_options:
      - "disable-cloud-controller: true"

@moray95
Copy link
Author

moray95 commented Aug 31, 2023

So, I performed a clean install on a brand new VM and here is my finding:

Using 10.34.10.2 as VIP and 10.34.10.3 as node ip, the same issue still appears. Using 10.34.10.2 as node ip and 10.34.10.3 as VIP, the issue disappears.

On the initial setup, I changed the VIP from 10.64.0.9 to 10.64.0.19 and the issue got fixed.

My guess would be that RKE uses the lowest IP when multiple IPs are available, but your example seems to contradict this. Maybe it's an OS difference? I am using Ubuntu Server 22.04 for the record.

@kubilaykaptanoglu
Copy link

I used Ubuntu 20.04. I will try next week with Ubuntu 22.04 and I will write result here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants