Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mega Issue: Supported AKS Kubelet Configuration #196

Open
Bryce-Soghigian opened this issue Mar 12, 2024 · 3 comments
Open

Mega Issue: Supported AKS Kubelet Configuration #196

Bryce-Soghigian opened this issue Mar 12, 2024 · 3 comments
Labels
needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one.

Comments

@Bryce-Soghigian
Copy link
Contributor

Bryce-Soghigian commented Mar 12, 2024

Tell us about your request

Karpenter Core plans on moving the kubelet configuration outside of the core api and instead have cloudproviders maintain their own set of supported kubelet configuration in the v1 api. The AKS provider needs a migration plan so that we can stay on track with keeping our core version synced close with upstream.

Also AKS supports the following kubelet configuration: https://learn.microsoft.com/en-us/azure/aks/custom-node-configuration?tabs=linux-node-pools#kubelet-configuration

We need to start implementing the propagation for all of the aks supported kubelet configuration so karpenter has feature parity. Perhaps we can also drive further discussion in other kubelet configuration features customers want and let them configure them through karpenter before rolling out to the wider audience at AKS.

This issue has been created to track

  1. the migration plan to move away from cloud neutral core configuration
  2. each of the supported kubelet config fields we want to expose

Attachments

See: kubernetes-sigs/karpenter#758 (comment)
Thread on Slack we talked about moving from cloud neutral configuration: https://kubernetes.slack.com/archives/C04JW2J5J5P/p1709226455964629

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment
@Bryce-Soghigian Bryce-Soghigian changed the title [Mega Issue] Supported AKS Kubelet Configuration Mega Issue: Supported AKS Kubelet Configuration Mar 12, 2024
@Bryce-Soghigian Bryce-Soghigian added the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label Mar 14, 2024
@Tomasz-Kluczkowski
Copy link

Is the kubelet configuration via karpenter node pool even working?
I tried setting this on one of the node pools:

spec:
  kubelet:
    systemReserved:
      cpu: 1000m
    kubeReserved:
      cpu: 1000m

then inspected the kubelet settings on the node and found that the settings are not applied at all?

I am not sure if ps aufx | grep kubelet is the right command to check it, but considering it scheduled a pod which requests 15000m cores to a node which can allocate 15750m cores at maximum with no reservations when I wanted 2000m cores reserved, it definitely does not work as expected....

root@aks-cpu-reserved-9bcbd:/# ps aufx | grep kubelet
root        3198  1.8  0.3 3039332 119056 ?      Ssl  10:12   0:10 
/usr/local/bin/kubelet --enable-server
--node-labels=karpenter.azure.com/sku-memory=32768,kubernetes.io/os=linux,kubernetes.azure.com/cluster=MC_karpenter-trial_karpenter-trial_uksouth,kubernetes.azure.com/mode=user,karpenter.azure.com/sku-gpu-count=0,kubernetes.azure.com/nodenetwork-vnetguid=f82592a3-317e-4e60-9d4d-23bcbf4f0e60,karpenter.azure.com/sku-storage-ephemeralos-maxsize=274.877906944,karpenter.azure.com/sku-name=Standard_F16s_v2,kubernetes.azure.com/role=agent,kubernetes.azure.com/podnetwork-type=overlay,karpenter.sh/nodepool=cpu-reserved,karpenter.azure.com/sku-encryptionathost-capable=true,karpenter.sh/capacity-type=spot,node.kubernetes.io/instance-type=Standard_F16s_v2,karpenter.azure.com/sku-family=F,topology.kubernetes.io/region=uksouth,karpenter.azure.com/sku-networking-accelerated=true,kubernetes.azure.com/network-subnet=aks-subnet,kubernetes.azure.com/ebpf-dataplane=cilium,kubernetes.io/arch=amd64,karpenter.azure.com/sku-storage-premium-capable=true,karpenter.azure.com/sku-version=2,karpenter.azure.com/sku-cpu=16
--v=2 
--volume-plugin-dir=/etc/kubernetes/volumeplugins
--kubeconfig /var/lib/kubelet/kubeconfig 
--bootstrap-kubeconfig /var/lib/kubelet/bootstrap-kubeconfig 
--runtime-request-timeout=15m 
--container-runtime-endpoint=unix:///run/containerd/containerd.sock 
--runtime-cgroups=/system.slice/containerd.service 
--cgroup-driver=systemd 
--max-pods=250 
--authentication-token-webhook=true 
--rotate-certificates=true 
--authorization-mode=Webhook 
--pod-max-pids=-1 
--event-qps=0 
--register-with-taints=cpu-reserved=true:NoSchedule 
--kube-reserved=cpu=260m,memory=3645Mi 
--cluster-dns=10.0.0.10 
--image-gc-high-threshold=85 
--tls-private-key-file=/etc/kubernetes/certs/kubeletserver.key --node-status-update-frequency=10s --keep-terminated-pod-volumes=false --kubeconfig=/var/lib/kubelet/kubeconfig --pod-manifest-path=/etc/kubernetes/manifests --tls-cipher-suites=TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305,TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305,TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384,TLS_RSA_WITH_AES_256_GCM_SHA384,TLS_RSA_WITH_AES_128_GCM_SHA256 --enforce-node-allocatable=pods --azure-container-registry-config=/etc/kubernetes/azure.json --protect-kernel-defaults=true --cluster-domain=cluster.local --client-ca-file=/etc/kubernetes/certs/ca.crt --cloud-config=/etc/kubernetes/azure.json --pod-infra-container-image=mcr.microsoft.com/oss/kubernetes/pause:3.6 --tls-cert-file=/etc/kubernetes/certs/kubeletserver.crt --cloud-provider=external --eviction-hard=memory.available<750Mi --cgroups-per-qos=true --address=0.0.0.0 --streaming-connection-idle-timeout=4h --resolv-conf=/run/systemd/resolve/resolv.conf --read-only-port=0 
--system-reserved=memory=0,cpu=0 --anonymous-auth=false --image-gc-low-threshold=80

@Bryce-Soghigian
Copy link
Contributor Author

Azure Karpenter does reference the data structure internally for passing around kubelet configuration, but we do not allow customers to set the values yet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one.
Projects
None yet
Development

No branches or pull requests

2 participants