Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Waiting for CNI driver delays Spegel startup - mirror not used #116

Open
spmason opened this issue Jun 12, 2023 · 5 comments
Open

Waiting for CNI driver delays Spegel startup - mirror not used #116

spmason opened this issue Jun 12, 2023 · 5 comments
Labels
bug Something isn't working

Comments

@spmason
Copy link

spmason commented Jun 12, 2023

Hi,

Spegel looks great! But I'm on a test cluster and have noticed that it's not starting up quickly enough on a fresh node to get any benefit from it - by the time Spegel has started and registered the mirrors the pods that I want to pull from Spegel have already started pulling from the default registry

The startup process seems to be blocking on my CNI driver (Calico) starting up:

Warning  NetworkNotReady    16s (x12 over 38s)  kubelet             network is not ready: container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: cni plugin not initialized

This more or less happens in parallel to my pods starting up, so by the time Spegel is running my pods have pulled from the default registry and bypassed Spegel.

My workaround so far is to enable hostNetwork: true in the Spegel daemonset spec - this stops it having a dependency on Calico, and it starts more or less immediately (before my pods at least)

I assume not everyone uses Calico so changing Spegel to use hostNetwork by default may not be desired? Could a flag be added to the helm chart to enable this instead? Bear in mind that service.registry.port needs setting to 30020 for this to work, but otherwise it does seem to work in some limited testing..

@phillebaba
Copy link
Member

So I have observed this in some AKS clusters that I am running that are also using Calico. I think the issue is derived from the fact that the priority class is too high, resulting in Spegel being scheduled before the Calico DS. Then it experiences exponential backoff as the node is not ready. A good solution would be to make sure that the priority class is lower than what ever calico has set. Checking my AKS cluster I can see that both are set to system-node-critical. I will find a default priority class that is lower than this, which will hopefully solve your problems.

It is interesting that setting host network allows you to ignore the dependency towards the CNI plugin. I think that in the future it would be interesting to see if it would be possible to have a solution that does not rely on the CNI. I do not think enabling host networking is the solution though as it comes with other implications.

@spmason
Copy link
Author

spmason commented Jun 13, 2023

Interesting about the priorityClassName - I just tried to set it to system-cluster-critical though, on the spegel daemonset pods (which according to the docs is higher priority than system-node-critical) and saw the same behaviour as before.

Reading the docs I'm not sure priorityClassName really controls the order of scheduling, it just seems to control whether scheduling is guaranteed. In my testing my own regular-priority pods get scheduled at exactly the same time as critical ones, because my nodes have enough space for them all I assume.

In fact I've not really been able to find anything that controls the order of the scheduling of pods to the degree that I suppose Spegel would ideally want - to come up before any other pod on the node. It seems the only way would be to have the mirror config on the nodes already, before kubernetes even starts - which implies some platform-specific bootstrap/init magic..

Furthermore I suppose that you might want to use Spegel as the source for your CNI images, in which case hostNetwork would be your only option I assume?

What implications could hostNetwork: true have? My test environment isn't used much but I can't say I've spotted anything bad happening with this setting, any ideas for things I could look for?

@phillebaba
Copy link
Member

Its been a while since I read through the scheduling docs in depth so you may be right.

I think it will be an impossible challenge to mirror the CNI image now when I think about it. The reason I have not looked to far into it is due to the fact that the Calico images are pre pulled on nodes when running AKS so it has been less of an issue for me. Currently Spegel relies on poth container host ports and node port services to work. The host port creates a port map on the node to the local container. This is how Containerd is able to reach onto the container. While the node port service does the same but on a separate port that is load balanced onto a random Spegel instance in the cluster. There are two separate ports to allow for the use of Spegel even when the local Spegel instance is down.

I have been forced to think about the implications of this a bit more thanks to this issue which is great. Both of these node to container port mappings are implemented by the CNI. Which explains the dependency on the CNI which other applications may not have. Now enabling host network would in turn solve some of these problems as it would skip the CNI all together and use the hosts networking namespace, with the downside of basically being able to intercept all networking on the node. I am not really sure if this is something that i want to allow as it pushes things even further into the realm of privileged Pods, but still something that I can consider to be an option. This would most likely require a refactor of some of the logic as Spegel would need to avoid the container networking all together and instead proxy on the Node IP level, which would have some benefits.

@danielloader
Copy link

danielloader commented Jul 31, 2023

You could include a node taint that is removed when spegel reaches healthy status. Cilium uses this model to ensure the daemonset is up before workloads can be scheduled.

https://docs.cilium.io/en/stable/installation/taints/

@phillebaba
Copy link
Member

I will amend the toleration to avoid scheduling. I thin k that fixing #22 could close this issue in the future. Having a single prefer local service would remove the need for host ports while offering the same feature.

@phillebaba phillebaba added the bug Something isn't working label Jan 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants