[Bug]: nginx stuck deploying when not scheduling on control-plane #1343

WebSpider · 2024-05-08T11:45:45Z

Description

When deploying a new cluster with:

allow_scheduling_on_control_plane = false
ingress_controller = "nginx"
agent_nodepools = []
autoscaler_nodepools = [
  {
    ... some valid autoscaler config ...
  }
]

the cluster deploy gets stuck with nginx waiting on a loadbalancer.

The nginx-controller pod does not launch, since it has no valid nodes to deploy to.
This does not trigger an autoscaler scale-up, because autoscaler is only deployed after nginx

A possible solution could be to handle autoscaler before nginx, so it can trigger a scale-up, and place the nginx-controller pod.

Workaround:

I've found the following to work:

Deploy the cluster with:

ingess_controller = "none"

Change the ingress controller to "nginx" after the initial deploy is complete.
Since cluster-autoscaler is installed at that point, a scale-up is triggered, and all is good.

Kube.tf file

locals {
  hcloud_token = "xxxxxxxxxxx"
}
module "kube-hetzner" {
  providers = {
    hcloud = hcloud
  }
  hcloud_token = var.hcloud_token != "" ? var.hcloud_token : local.hcloud_token
  source = "kube-hetzner/kube-hetzner/hcloud"
  version = "2.13.5"
  ssh_port = 65530
  ssh_public_key = file("./kube-test-key.pub")
  ssh_private_key = file("./kube-test-key")
  ssh_hcloud_key_label = "role=admin"
  control_plane_nodepools = [
    {
      name            = "cp-fsn1",
      server_type     = "cpx11",
      location        = "fsn1",
      labels          = [],
      taints          = [],
      count           = 1,
      swap_size       = "2G",
      kubelet_args    = ["kube-reserved=cpu=250m,memory=300Mi,ephemeral-storage=1Gi", "system-reserved=cpu=250m,memory=100Mi"]
      placement_group = "cp-def"
    },
    {
      name            = "cp-nbg1",
      server_type     = "cpx11",
      location        = "nbg1",
      labels          = [],
      taints          = [],
      count           = 1,
      swap_size       = "2G",
      kubelet_args    = ["kube-reserved=cpu=250m,memory=300Mi,ephemeral-storage=1Gi", "system-reserved=cpu=250m,memory=100Mi"]
      placement_group = "cp-def"
    },
    {
      name            = "cp-hel1",
      server_type     = "cpx11",
      location        = "hel1",
      labels          = [],
      taints          = [],
      count           = 1,
      swap_size       = "2G",
      kubelet_args    = ["kube-reserved=cpu=250m,memory=300Mi,ephemeral-storage=1Gi", "system-reserved=cpu=250m,memory=100Mi"]
      placement_group = "cp-def"
    }
  ]
  agent_nodepools = [
    {
      name        = "agent-small",
      server_type = "cpx11",
      location    = "fsn1",
      labels = [
        "node.kubernetes.io/role=worker"
      ],
      taints = [],
      count  = 0
      placement_group = "nodepool-default"
    },
  ]
  load_balancer_type     = "lb11"
  load_balancer_location = "fsn1"
  base_domain = "example.nl"
  autoscaler_nodepools = [
    {
      name        = "asg-tiny-fsn"
      server_type = "cpx11"
      location    = "fsn1"
      min_nodes   = 0
      max_nodes   = 5
      labels = {
        "node.kubernetes.io/role" : "worker"
      }
      kubelet_args = ["kube-reserved=cpu=100m,memory=200Mi,ephemeral-storage=1Gi", "system-reserved=cpu=100m,memory=100Mi"]
      placement_group = "asg-tiny-fsn"
    }
  ]
  cluster_autoscaler_extra_args = [
    "--ignore-daemonsets-utilization=true",
    "--enforce-node-group-min-size=true",
    "--skip-nodes-with-local-storage=false",
  ]
  enable_delete_protection = {
    load_balancer = true
    volume        = false
  }
  enable_csi_driver_smb = true
  ingress_controller = "nginx"
  allow_scheduling_on_control_plane = false
  kured_options = {
    "reboot-days" : "su",
    "start-time" : "3am",
    "end-time" : "8am",
    "time-zone" : "Local",
    "lock-ttl" : "30m",
  }
  cluster_name = "cutiepie"
  k3s_global_kubelet_args = [
    "kube-reserved=cpu=100m,memory=200Mi,ephemeral-storage=1Gi", "system-reserved=cpu=100m,memory=200Mi", "image-gc-high-threshold=50",
    "image-gc-low-threshold=40"
  ]
  cni_plugin = "flannel"
  dns_servers = [
    "1.1.1.1",
    "8.8.8.8",
    "2606:4700:4700::1111",
  ]
  use_control_plane_lb = true
  create_kustomization = false
  export_values = true
}
provider "hcloud" {
  token = var.hcloud_token != "" ? var.hcloud_token : local.hcloud_token
}
terraform {
  required_version = ">= 1.5.0"
  required_providers {
    hcloud = {
      source  = "hetznercloud/hcloud"
      version = ">= 1.43.0"
    }
  }
}
output "kubeconfig" {
  value     = module.kube-hetzner.kubeconfig
  sensitive = true
}
variable "hcloud_token" {
  sensitive = true
  default   = ""
}

Screenshots

No response

Platform

Linux

The text was updated successfully, but these errors were encountered:

Fixes kube-hetzner#1343

WebSpider added the bug Something isn't working label May 8, 2024

WebSpider added a commit to WebSpider/terraform-hcloud-kube-hetzner that referenced this issue May 15, 2024

Change in dependencies for cluster-autoscaler deployment

9f83ee2

Fixes kube-hetzner#1343

WebSpider mentioned this issue May 15, 2024

Fix cluster-autoscaler dependencies #1349

Merged

mysticaltech closed this as completed in #1349 May 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: nginx stuck deploying when not scheduling on control-plane #1343

[Bug]: nginx stuck deploying when not scheduling on control-plane #1343

WebSpider commented May 8, 2024

[Bug]: nginx stuck deploying when not scheduling on control-plane #1343

[Bug]: nginx stuck deploying when not scheduling on control-plane #1343

Comments

WebSpider commented May 8, 2024

Description

Kube.tf file

Screenshots

Platform