Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: nginx stuck deploying when not scheduling on control-plane #1343

Closed
WebSpider opened this issue May 8, 2024 · 0 comments · Fixed by #1349
Closed

[Bug]: nginx stuck deploying when not scheduling on control-plane #1343

WebSpider opened this issue May 8, 2024 · 0 comments · Fixed by #1349
Labels
bug Something isn't working

Comments

@WebSpider
Copy link
Contributor

Description

When deploying a new cluster with:

allow_scheduling_on_control_plane = false
ingress_controller = "nginx"
agent_nodepools = []
autoscaler_nodepools = [
  {
    ... some valid autoscaler config ...
  }
]

the cluster deploy gets stuck with nginx waiting on a loadbalancer.

The nginx-controller pod does not launch, since it has no valid nodes to deploy to.
This does not trigger an autoscaler scale-up, because autoscaler is only deployed after nginx

A possible solution could be to handle autoscaler before nginx, so it can trigger a scale-up, and place the nginx-controller pod.

Workaround:

I've found the following to work:

  1. Deploy the cluster with:
ingess_controller = "none"
  1. Change the ingress controller to "nginx" after the initial deploy is complete.
    Since cluster-autoscaler is installed at that point, a scale-up is triggered, and all is good.

Kube.tf file

locals {
  hcloud_token = "xxxxxxxxxxx"
}
module "kube-hetzner" {
  providers = {
    hcloud = hcloud
  }
  hcloud_token = var.hcloud_token != "" ? var.hcloud_token : local.hcloud_token
  source = "kube-hetzner/kube-hetzner/hcloud"
  version = "2.13.5"
  ssh_port = 65530
  ssh_public_key = file("./kube-test-key.pub")
  ssh_private_key = file("./kube-test-key")
  ssh_hcloud_key_label = "role=admin"
  control_plane_nodepools = [
    {
      name            = "cp-fsn1",
      server_type     = "cpx11",
      location        = "fsn1",
      labels          = [],
      taints          = [],
      count           = 1,
      swap_size       = "2G",
      kubelet_args    = ["kube-reserved=cpu=250m,memory=300Mi,ephemeral-storage=1Gi", "system-reserved=cpu=250m,memory=100Mi"]
      placement_group = "cp-def"
    },
    {
      name            = "cp-nbg1",
      server_type     = "cpx11",
      location        = "nbg1",
      labels          = [],
      taints          = [],
      count           = 1,
      swap_size       = "2G",
      kubelet_args    = ["kube-reserved=cpu=250m,memory=300Mi,ephemeral-storage=1Gi", "system-reserved=cpu=250m,memory=100Mi"]
      placement_group = "cp-def"
    },
    {
      name            = "cp-hel1",
      server_type     = "cpx11",
      location        = "hel1",
      labels          = [],
      taints          = [],
      count           = 1,
      swap_size       = "2G",
      kubelet_args    = ["kube-reserved=cpu=250m,memory=300Mi,ephemeral-storage=1Gi", "system-reserved=cpu=250m,memory=100Mi"]
      placement_group = "cp-def"
    }
  ]
  agent_nodepools = [
    {
      name        = "agent-small",
      server_type = "cpx11",
      location    = "fsn1",
      labels = [
        "node.kubernetes.io/role=worker"
      ],
      taints = [],
      count  = 0
      placement_group = "nodepool-default"
    },
  ]
  load_balancer_type     = "lb11"
  load_balancer_location = "fsn1"
  base_domain = "example.nl"
  autoscaler_nodepools = [
    {
      name        = "asg-tiny-fsn"
      server_type = "cpx11"
      location    = "fsn1"
      min_nodes   = 0
      max_nodes   = 5
      labels = {
        "node.kubernetes.io/role" : "worker"
      }
      kubelet_args = ["kube-reserved=cpu=100m,memory=200Mi,ephemeral-storage=1Gi", "system-reserved=cpu=100m,memory=100Mi"]
      placement_group = "asg-tiny-fsn"
    }
  ]
  cluster_autoscaler_extra_args = [
    "--ignore-daemonsets-utilization=true",
    "--enforce-node-group-min-size=true",
    "--skip-nodes-with-local-storage=false",
  ]
  enable_delete_protection = {
    load_balancer = true
    volume        = false
  }
  enable_csi_driver_smb = true
  ingress_controller = "nginx"
  allow_scheduling_on_control_plane = false
  kured_options = {
    "reboot-days" : "su",
    "start-time" : "3am",
    "end-time" : "8am",
    "time-zone" : "Local",
    "lock-ttl" : "30m",
  }
  cluster_name = "cutiepie"
  k3s_global_kubelet_args = [
    "kube-reserved=cpu=100m,memory=200Mi,ephemeral-storage=1Gi", "system-reserved=cpu=100m,memory=200Mi", "image-gc-high-threshold=50",
    "image-gc-low-threshold=40"
  ]
  cni_plugin = "flannel"
  dns_servers = [
    "1.1.1.1",
    "8.8.8.8",
    "2606:4700:4700::1111",
  ]
  use_control_plane_lb = true
  create_kustomization = false
  export_values = true
}
provider "hcloud" {
  token = var.hcloud_token != "" ? var.hcloud_token : local.hcloud_token
}
terraform {
  required_version = ">= 1.5.0"
  required_providers {
    hcloud = {
      source  = "hetznercloud/hcloud"
      version = ">= 1.43.0"
    }
  }
}
output "kubeconfig" {
  value     = module.kube-hetzner.kubeconfig
  sensitive = true
}
variable "hcloud_token" {
  sensitive = true
  default   = ""
}

Screenshots

No response

Platform

Linux

@WebSpider WebSpider added the bug Something isn't working label May 8, 2024
WebSpider added a commit to WebSpider/terraform-hcloud-kube-hetzner that referenced this issue May 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant