Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automated rotation of node bootstrap token (e.g., after token changes from node image upgrade) #247

Closed
comtalyst opened this issue Apr 3, 2024 · 2 comments
Labels
area/bootstrap Issues or PRs related to bootstrap area/provisioning Issues or PRs related to provisioning (instance provider) area/security Issues or PRs related to security

Comments

@comtalyst
Copy link
Collaborator

comtalyst commented Apr 3, 2024

Tell us about your request

The automated rotation of node bootstrap token whenever it is outdated.

Tell us about the problem you're trying to solve. What are you trying to do, and why is it hard?

When deploying Karpenter, node bootstrap token is one of the required parameters, as it is used for a provisioned node to join the cluster.
However, that node bootstrap token could be replaced from user operations such as node image upgrade.
Without manual redeployment of Karpenter with the new token, nodes provisioned by Karpenter will fail to join the cluster.
This feature request calls for an automated process to eliminate this routinely burden of redeployment from the user.

Are you currently working around this issue?

No.
But a potential design I have in my mind is to have some kind of fetching loop inside Karpenter that watch over the bootstrap token and update accordingly.
Although there is a potential race condition: when the token is deleted, then Karpenter try to provision a node, then Karpenter fetch the new token. The larger the delay between each iteration, the more prevalent the issue can be. However, it will eventually succeeded once that provision times out and the replacement got created with the updated token(?).

Additional Context

This is primarily for self-hosted Karpenter.

For AKS Node Auto Provisioning (NAP), it is per design that all unnecessary parameter concerns should be abstracted away from the users. This is to ensure fully automated experience and ease of usage.
In this case, an automated node bootstrap token rotation mechanism (or something equivalent) is ensured all the time to not make it a visible user concern in the first place.

Although, there was a similar issue where NAP-provisioned nodes can use outdated token after node image upgrade, resulting in an inability to join the cluster. The apparent symptom could be similar to #248. At this time, the fix is being rolled out on AKS side. Please file a separate issue if similar problems still occur later on.

Attachments

No response

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment
@Bryce-Soghigian
Copy link
Contributor

Useful to note that the fix has not been rolled out for NAP yet and is still rolling out. A simple empty put mc will mitigate the issue az aks update -g rg -n clusterName

@tallaxes tallaxes added area/bootstrap Issues or PRs related to bootstrap area/security Issues or PRs related to security area/provisioning Issues or PRs related to provisioning (instance provider) labels Apr 3, 2024
@comtalyst comtalyst changed the title Automated rotation of node bootstrap token Automated rotation of node bootstrap token (e.g., after token changes from node image upgrade) Apr 3, 2024
@Bryce-Soghigian
Copy link
Contributor

Was rolled out a while ago, closing this one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/bootstrap Issues or PRs related to bootstrap area/provisioning Issues or PRs related to provisioning (instance provider) area/security Issues or PRs related to security
Projects
None yet
Development

No branches or pull requests

3 participants