-
Notifications
You must be signed in to change notification settings - Fork 389
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve git credential handling in Kubernetes #1577
base: main
Are you sure you want to change the base?
Conversation
One note: we might also want to consider making the pod owner of the secret which would automatically trigger its deletion along side the pod. However, if for some reason the pod fails to be created the secret would remain so there's a need to clean it up manually in that case. |
186c740
to
eb0d9ee
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A chart managed secret / a dynamically created secret?
Is the git credentials the same for all build pods? If it is, we can just have a Helm chart template create a k8s Secret with a key in it representing the credentials.
Single / Multiple k8s Secrets
Have it been considered to use just one single secret that has several keys, one per build pod? Could be better, could be worse. It would feel good to know we have reflected on it a bit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Line 79 in eb0d9ee
git_credentials = Unicode( |
I think the way to go is a much smaller change, where the pod we dynamically create just references a Helm chart managed k8s Secret. If this is to create a new k8s Secret template, or to reference an existing k8s Secret, is not something I've explored yet.
The k8s Secret needs to be located in the same namespace as the build pods, but a Helm chart can specify a custom namespace if needed in a k8s resource's metadata.
@consideRatio Thanks for your suggestions. To answer your questions/remarks: Currently the credentials are static by provider but my implementation was done so on purpose: it takes into account #1169 (Allow dynamic repository credentials for authenticated Binderhub instances) that I am interested in as well. ###A chart managed secret / a dynamically created secret? If we take the static credential use case, having one single chart managed secret is a good idea. The If we take the #1169 use case into account, we would need to ensure that we cleanup the content of the secret to avoid keeping tokens (and a growing secret) for more time than necessary (i.e. user session length + builder lifetime). From the top of my head, this could be done in the following way:
The thing we need to determine now is how to differentiate a static credential VS a user provided one. ###Single / Multiple k8s Secrets In the static case, I think one single secret would make sense. For the #1169 use case, it's a different story. We would likely need to benchmark things a bit. Kubernetes secrets cannot be larger than 1MIB but one should also not create too many little Secrets in a given namespace as it could also exhausts memory. The documentation also suggest to use resource quotas in order to limit their number. |
Ah okay! This fix becomes more complicated if planning for #1169 at the same time, to support runtime decided git credentials, but we can go for it if you want. I'd appreciate if another maintainer could opinie about the questions below though, as I'm not comfortable approve/merging this without further agreement otherwise. Question 1: Is there agreement to add support for runtime decided git credentials?Doing it imples managing k8s Secrets via the binderhub software itself, as compared to the BinderHub Helm chart. Question 2: Should we support only runtime decided git credentials, or a mix?If using binderhub software managed k8s Secrets by default, it means that we take some performance hit as pod must wait for the k8s Secret creation, which follows the pod creation in sequence, and burden the k8s api-server a bit by additional requests. It may be miniscule though. So, if we switch to this where its not used, this is a compromise even though it may possibly be very small. At the same time, if we start supporting both options with a single Helm chart managed k8s Secret and runtime created k8s Secrets, we increase the complexity of configuration/documentation and the complexity of the code base as a whole. Implementation criterias for runtime decided git credentials
|
My own response to the questions above:
@minrk you worked to provide the kubespawner implementation dynamically creating k8s Secret resources for internal_ssl, what do you think? |
On the implementation side, the secret can be created before the pod and the ownership applied through patching. I did a small test locally. It can be done as soon as the pod enters the pending state. I haven't benchmarked it but it would likely reduce the performance hit of having the pod waiting on the secret presence to be properly started. The downside is that if the pod creation fails the secret needs to be manually cleaned up. |
Yepp, a quite significant downside i think. I'd rather make sure we dont emit credentials we fail to clean up than optimize performance to this degree - based on guessed impact scale. |
The git credentials are now stored in a temporary secret following recommended best practice. The secret will have the same lifespan as the builder pod.
This makes the code simpler and lets k8s encode the data
d62951e
to
fc24709
Compare
The final implementation comes from the following design decision: Only create the secret once the pod is ready so we reduce the surface were sensitive data is managed and avoid its management if the pod fails for some reason.
fc24709
to
a81abe3
Compare
@consideRatio I was wondering one thing: should the secret creation code be put in its own function ? |
Note the git credential is also needed by BinderHub before the build pod is created, since BinderHub checks the registry for an existing image so the Git SHA is required: binderhub/binderhub/builder.py Lines 269 to 297 in 8351995
This may complicated things if implementing dynamic per-user/repo Git credentials. I've hit a similar issue whilst trying to disentangle another property: If we go down the route of dynamic docker config it would make sense to keep these two aligned, so if you have any thoughts on that issue please share them! |
I've been thinking about this because I want to get temporary registry credentials into the build image to push to ECR (push tokens are time-limited so a static secret won't work #1623). Using secrets is definitely best practice, but on balance I think managing dynamic secrets overcomplicates things for BinderHub due to the clean-up issue- it's a shame you can't atomically create a secret+pod. ownerReferences may help, but there's a circular dependency between the pod and secret and therefore a race condition. In the worst case the implementation in https://github.com/jupyterhub/kubespawner/blob/0f22abf096459b5424325fa93ff4b30bef568391/kubespawner/spawner.py#L2714-L2732 will lead to If we want to pursue this I think a compromise of using a Helm chart managed secret for global credentials, and plain env-vars for dynamic credentials, is reasonable. It requires additional logic in the build class to choose between mounting the global secret vs ignoring it and setting an environment variable, but we'll need additional logic to support dynamic credentials anyway. The small risk of using env vars can be mitigated by ensuring any dynamic credentials are time limited- they most likely will be anyway, e.g. OAuth tokens expire, as do ECR registry credentials. You could probably make the case that a dynamic expiring env var is more secure than a one year old static secret 😃 Finally if designed correctly this will be an implementation detail so there's nothing stopping us from switching in future. |
Related to your post from past December (that I was sure I answered to so sorry for the delay): For your new setup, it sounds a bit like the forge use case. In between, one related thing I have discovered that could be of interest: the Secrets Store CSI driver. Note that I haven't tested it yet. As for the orphaned secrets, I am wondering whether having a cleanup CronJob would help in that case ? Note that the current implementation should properly clean everything in case of failure. |
The git credentials are now stored in a secret following recommended best practice.
The secret will have the same lifespan as the builder pod.