Add cpu pinning to nodes configuration #3138

hedibouattour · 2023-03-24T14:54:34Z

What would you like to be added:
The possibility to pin nodes to cpus in kind clusters.
Why is this needed:
For large clusters (scale testing purposes), nodes overload the cpu. So we need to pin kind nodes to sets of specific cpus during their creation.
As an example, on a 96 thread machine, kind clusters with Calico start to show instability above 30 nodes, this patch enables stability with 60 kind workers, by pinning every node to a cpu.
Docker allows this using the --cpuset-cpus flag. The easiest way is to add that to the node configuration like in #3131.

The text was updated successfully, but these errors were encountered:

BenTheElder · 2023-03-28T05:15:58Z

As an example, on a 96 thread machine, kind clusters with Calico start to show instability above 30 nodes, this patch enables stability with 60 kind workers, by pinning every node to a cpu.

Wow! Can you elaborate more on the use case?

While kind technically supports multi-node, this is for https://kind.sigs.k8s.io/docs/contributing/project-scope/#p0-support-testing-kubernetes
Where we need to test some rolling node behavior.

It's not intended as a reasonable solution for performance testing.

Docker allows this using the --cpuset-cpus flag

Mmm, is there a comparable flag in podman etc? We've tried to minimize coupling to docker going forward so we can attempt to meet demands to support other tools (e.g. also nerdctl, ignite...)

What's the expected user experience here? Users need to select CPUs for each node?

BenTheElder · 2023-03-28T05:16:59Z

Is this correctly transitively applied to the "nested" containers? Under what environment have you tested this? (It's been a while since looking at this part of the system and we have podman/docker, runc/crun, rootless/rootful, ...)

BenTheElder · 2023-03-28T05:18:09Z

Kubernetes usually scale tests with multiple VMs or physical machines, this is a new one 😅

dims · 2023-03-28T11:06:35Z

xref: #3131

sknat · 2023-03-31T15:10:06Z

Hi @BenTheElder

Trying to give a bit of context of why we thought about doing this :

It's not intended as a reasonable solution for performance testing.

We are working on a CNI integration (Calico/VPP), and wanted a way to do functional testing of clusters of honest size. The idea is not to do scale testing in the sense to measure performance, but rather to make sure things keep working if we push cluster sizes beyond the usual 4 nodes. This also allows a variety of test scenarios (e.g. how does CNI react to many nodes going down at the same time, etc...)

What's the expected user experience here? Users need to select CPUs for each node?

I guess so, I don't have a strong opinion there, we went that way just because it gave the most flexibility.

Is this correctly transitively applied to the "nested" containers

iirc in our testing this was transitively applied (@hedibouattour keep we honest if it wasn't the case 😄)

Under what environment have you tested this ?

It was a pretty standard env :

Ubuntu 22.04.1 LTS (5.15.0 kernel)
Docker 20.10 / containerd 1.6.15
Rootful / runc

Kubernetes usually scale tests with multiple VMs or physical machines

I guess it's the appeal for container-ception 😂
Do you know what's the usual sizing used for scale testing ? (#vCpus / memory per VM ?)

aojea · 2023-03-31T15:20:40Z

We are working on a CNI integration (Calico/VPP), and wanted a way to do functional testing of clusters of honest size

I have done this in the past, it is an easy and cheap way to find "scale" bugs, and containers allows you to avoid developing mocks, you just kind of "miniaturize everything" ...

said this, I can't see how this can be generalizable, in my experience , and if you are building some scale testing framework, this is going to evolve to support different options, I think that you'll be better if you use kind as an API to create the containers so you build your framework on top of it so you can customize it

BenTheElder · 2023-04-18T03:17:33Z

We can't move forward with something like this without digging in deeper to the interactions with the various container environments etc to make sure we're not introducing a new footgun.

See also: #3169 #2999

sknat · 2023-04-24T17:36:31Z

Right, I agree on the fact that we probably don't have a full picture of the consequences the
use of such an option would have, and yes probably using it would break some of the usual guarantees of kind.

Would having an more generic option e.g. extraContainerRuntimeArgs then be a viable alternative ?
This would allow building more complex behaviors like this one, while making it clear that the
option is highly dependent on the runtime implementation, and that consequences are also up
to the caller to handle.

BenTheElder · 2023-04-24T18:51:03Z

Would having a more generic option e.g. extraContainerRuntimeArgs then be a viable alternative ?

no. It's an implementation detail that we exec docker and it will be a nightmare to support this in the future. Consider if kind needs to start setting an argument that previously a user set or if they conflict or if we opt to switch to a client library.

BenTheElder · 2023-04-24T18:52:31Z

We can support pinning nodes but we need more investigation and some thought about the abstraction before committing. Typically we try to borrow from CRI if possible.

we've been bitten many times by thin abstractions and compat nightmares. See e.g. #2875 where we're in a bind because we didn't shim this like kOps did.

hedibouattour added the kind/feature Categorizes issue or PR as related to a new feature. label Mar 24, 2023

dims mentioned this issue Mar 28, 2023

Added CPU pinning to Docker container #3131

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add cpu pinning to nodes configuration #3138

Add cpu pinning to nodes configuration #3138

hedibouattour commented Mar 24, 2023

BenTheElder commented Mar 28, 2023

BenTheElder commented Mar 28, 2023

BenTheElder commented Mar 28, 2023

dims commented Mar 28, 2023

sknat commented Mar 31, 2023

aojea commented Mar 31, 2023

BenTheElder commented Apr 18, 2023

sknat commented Apr 24, 2023

BenTheElder commented Apr 24, 2023

BenTheElder commented Apr 24, 2023

Add cpu pinning to nodes configuration #3138

Add cpu pinning to nodes configuration #3138

Comments

hedibouattour commented Mar 24, 2023

BenTheElder commented Mar 28, 2023

BenTheElder commented Mar 28, 2023

BenTheElder commented Mar 28, 2023

dims commented Mar 28, 2023

sknat commented Mar 31, 2023

aojea commented Mar 31, 2023

BenTheElder commented Apr 18, 2023

sknat commented Apr 24, 2023

BenTheElder commented Apr 24, 2023

BenTheElder commented Apr 24, 2023