Automated rotation of node bootstrap token (e.g., after token changes from node image upgrade) #247
Labels
area/bootstrap
Issues or PRs related to bootstrap
area/provisioning
Issues or PRs related to provisioning (instance provider)
area/security
Issues or PRs related to security
Tell us about your request
The automated rotation of node bootstrap token whenever it is outdated.
Tell us about the problem you're trying to solve. What are you trying to do, and why is it hard?
When deploying Karpenter, node bootstrap token is one of the required parameters, as it is used for a provisioned node to join the cluster.
However, that node bootstrap token could be replaced from user operations such as node image upgrade.
Without manual redeployment of Karpenter with the new token, nodes provisioned by Karpenter will fail to join the cluster.
This feature request calls for an automated process to eliminate this routinely burden of redeployment from the user.
Are you currently working around this issue?
No.
But a potential design I have in my mind is to have some kind of fetching loop inside Karpenter that watch over the bootstrap token and update accordingly.
Although there is a potential race condition: when the token is deleted, then Karpenter try to provision a node, then Karpenter fetch the new token. The larger the delay between each iteration, the more prevalent the issue can be. However, it will eventually succeeded once that provision times out and the replacement got created with the updated token(?).
Additional Context
This is primarily for self-hosted Karpenter.
For AKS Node Auto Provisioning (NAP), it is per design that all unnecessary parameter concerns should be abstracted away from the users. This is to ensure fully automated experience and ease of usage.
In this case, an automated node bootstrap token rotation mechanism (or something equivalent) is ensured all the time to not make it a visible user concern in the first place.
Although, there was a similar issue where NAP-provisioned nodes can use outdated token after node image upgrade, resulting in an inability to join the cluster. The apparent symptom could be similar to #248. At this time, the fix is being rolled out on AKS side. Please file a separate issue if similar problems still occur later on.
Attachments
No response
Community Note
The text was updated successfully, but these errors were encountered: