Automated rotation of node bootstrap token (e.g., after token changes from node image upgrade) #247

comtalyst · 2024-04-03T05:35:19Z

Tell us about your request

The automated rotation of node bootstrap token whenever it is outdated.

Tell us about the problem you're trying to solve. What are you trying to do, and why is it hard?

When deploying Karpenter, node bootstrap token is one of the required parameters, as it is used for a provisioned node to join the cluster.
However, that node bootstrap token could be replaced from user operations such as node image upgrade.
Without manual redeployment of Karpenter with the new token, nodes provisioned by Karpenter will fail to join the cluster.
This feature request calls for an automated process to eliminate this routinely burden of redeployment from the user.

Are you currently working around this issue?

No.
But a potential design I have in my mind is to have some kind of fetching loop inside Karpenter that watch over the bootstrap token and update accordingly.
Although there is a potential race condition: when the token is deleted, then Karpenter try to provision a node, then Karpenter fetch the new token. The larger the delay between each iteration, the more prevalent the issue can be. However, it will eventually succeeded once that provision times out and the replacement got created with the updated token(?).

Additional Context

This is primarily for self-hosted Karpenter.

For AKS Node Auto Provisioning (NAP), it is per design that all unnecessary parameter concerns should be abstracted away from the users. This is to ensure fully automated experience and ease of usage.
In this case, an automated node bootstrap token rotation mechanism (or something equivalent) is ensured all the time to not make it a visible user concern in the first place.

Although, there was a similar issue where NAP-provisioned nodes can use outdated token after node image upgrade, resulting in an inability to join the cluster. The apparent symptom could be similar to #248. At this time, the fix is being rolled out on AKS side. Please file a separate issue if similar problems still occur later on.

Attachments

No response

Community Note

Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
If you are interested in working on this issue or have submitted a pull request, please leave a comment

Bryce-Soghigian · 2024-04-03T15:21:50Z

Useful to note that the fix has not been rolled out for NAP yet and is still rolling out. A simple empty put mc will mitigate the issue az aks update -g rg -n clusterName

Bryce-Soghigian · 2024-05-17T07:37:06Z

Was rolled out a while ago, closing this one.

Bryce-Soghigian mentioned this issue Apr 3, 2024

GPU Nodepool Node not Registered #181

Closed

Bryce-Soghigian mentioned this issue Apr 3, 2024

NAP is creating NodeClaims and VMs but not registering Nodes to the cluster #248

Closed

tallaxes added area/bootstrap Issues or PRs related to bootstrap area/security Issues or PRs related to security area/provisioning Issues or PRs related to provisioning (instance provider) labels Apr 3, 2024

comtalyst changed the title ~~Automated rotation of node bootstrap token~~ Automated rotation of node bootstrap token (e.g., after token changes from node image upgrade) Apr 3, 2024

Bryce-Soghigian closed this as completed May 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Automated rotation of node bootstrap token (e.g., after token changes from node image upgrade) #247

Automated rotation of node bootstrap token (e.g., after token changes from node image upgrade) #247

comtalyst commented Apr 3, 2024 •

edited

Bryce-Soghigian commented Apr 3, 2024

Bryce-Soghigian commented May 17, 2024

Automated rotation of node bootstrap token (e.g., after token changes from node image upgrade) #247

Automated rotation of node bootstrap token (e.g., after token changes from node image upgrade) #247

Comments

comtalyst commented Apr 3, 2024 • edited

Tell us about your request

Tell us about the problem you're trying to solve. What are you trying to do, and why is it hard?

Are you currently working around this issue?

Additional Context

Attachments

Community Note

Bryce-Soghigian commented Apr 3, 2024

Bryce-Soghigian commented May 17, 2024

comtalyst commented Apr 3, 2024 •

edited