-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Waiting for CNI driver delays Spegel startup - mirror not used #116
Comments
So I have observed this in some AKS clusters that I am running that are also using Calico. I think the issue is derived from the fact that the priority class is too high, resulting in Spegel being scheduled before the Calico DS. Then it experiences exponential backoff as the node is not ready. A good solution would be to make sure that the priority class is lower than what ever calico has set. Checking my AKS cluster I can see that both are set to It is interesting that setting host network allows you to ignore the dependency towards the CNI plugin. I think that in the future it would be interesting to see if it would be possible to have a solution that does not rely on the CNI. I do not think enabling host networking is the solution though as it comes with other implications. |
Interesting about the Reading the docs I'm not sure In fact I've not really been able to find anything that controls the order of the scheduling of pods to the degree that I suppose Spegel would ideally want - to come up before any other pod on the node. It seems the only way would be to have the mirror config on the nodes already, before kubernetes even starts - which implies some platform-specific bootstrap/init magic.. Furthermore I suppose that you might want to use Spegel as the source for your CNI images, in which case What implications could |
Its been a while since I read through the scheduling docs in depth so you may be right. I think it will be an impossible challenge to mirror the CNI image now when I think about it. The reason I have not looked to far into it is due to the fact that the Calico images are pre pulled on nodes when running AKS so it has been less of an issue for me. Currently Spegel relies on poth container host ports and node port services to work. The host port creates a port map on the node to the local container. This is how Containerd is able to reach onto the container. While the node port service does the same but on a separate port that is load balanced onto a random Spegel instance in the cluster. There are two separate ports to allow for the use of Spegel even when the local Spegel instance is down. I have been forced to think about the implications of this a bit more thanks to this issue which is great. Both of these node to container port mappings are implemented by the CNI. Which explains the dependency on the CNI which other applications may not have. Now enabling host network would in turn solve some of these problems as it would skip the CNI all together and use the hosts networking namespace, with the downside of basically being able to intercept all networking on the node. I am not really sure if this is something that i want to allow as it pushes things even further into the realm of privileged Pods, but still something that I can consider to be an option. This would most likely require a refactor of some of the logic as Spegel would need to avoid the container networking all together and instead proxy on the Node IP level, which would have some benefits. |
You could include a node taint that is removed when spegel reaches healthy status. Cilium uses this model to ensure the daemonset is up before workloads can be scheduled. |
I will amend the toleration to avoid scheduling. I thin k that fixing #22 could close this issue in the future. Having a single prefer local service would remove the need for host ports while offering the same feature. |
Hi,
Spegel looks great! But I'm on a test cluster and have noticed that it's not starting up quickly enough on a fresh node to get any benefit from it - by the time Spegel has started and registered the mirrors the pods that I want to pull from Spegel have already started pulling from the default registry
The startup process seems to be blocking on my CNI driver (Calico) starting up:
This more or less happens in parallel to my pods starting up, so by the time Spegel is running my pods have pulled from the default registry and bypassed Spegel.
My workaround so far is to enable
hostNetwork: true
in the Spegel daemonset spec - this stops it having a dependency on Calico, and it starts more or less immediately (before my pods at least)I assume not everyone uses Calico so changing Spegel to use hostNetwork by default may not be desired? Could a flag be added to the helm chart to enable this instead? Bear in mind that
service.registry.port
needs setting to30020
for this to work, but otherwise it does seem to work in some limited testing..The text was updated successfully, but these errors were encountered: