-
Notifications
You must be signed in to change notification settings - Fork 502
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Flaky Tests] Update/revert kindest/node
to v1.32.0
#11532
[Flaky Tests] Update/revert kindest/node
to v1.32.0
#11532
Conversation
🌧️
|
runc
overriderunc
override to v1.2.5
/test all |
cb4050b
to
8c95a05
Compare
runc
override to v1.2.5
containerd
to include fixed go-cni
version
/assign |
/test all |
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great find! 😮 🦅
Just asking to be sure: Do we need to apply the containerd
override for the cluster node(s) spawned through kind-up.sh
as well?
Lines 364 to 384 in 934b267
# workarounds for KinD issues | |
# TODO(marc1404): Remove once kindest/node uses runc >= v1.2.4 | |
if [[ -n "${CI:-}" ]]; then | |
cp /get-runc/runc "$(dirname "$0")/../pkg/provider-local/node/runc" | |
else | |
"$(dirname "$0")/../pkg/provider-local/node/get-runc.sh" | |
fi | |
for node in $nodes; do | |
# workaround https://kind.sigs.k8s.io/docs/user/known-issues/#pod-errors-due-to-too-many-open-files | |
docker exec "$node" sh -c "sysctl fs.inotify.max_user_instances=8192" | |
# TODO(marc1404): Remove once kindest/node uses runc >= v1.2.4 | |
# workaround issue with runc v1.2.3 provided by kindest/node:v1.32.0 by installing runc v1.2.4 manually (https://github.com/opencontainers/runc/pull/4555) | |
if [ -n "${CI:-}" ]; then | |
echo "Installing runc on node $node from container filesystem" | |
docker cp /get-runc/runc "$node":/usr/local/sbin/runc | |
else | |
docker cp "$(dirname "$0")/../pkg/provider-local/node/runc" "$node":/usr/local/sbin/runc | |
fi | |
done |
As I understand it the issue with go-cni
seems to affect our tests primarily.
/test all |
In that case we would need to pre-build the image, as it's only referenced there. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this light, do we need all this at all, i.e., can't we go back v1.31 and wait for a proper patch for v1.32 to be released, before we upgrade to it again? This way, we don't need to introduce these workarounds? Or is there something in v1.32 that we desperately need now?
247431d
to
7772fbe
Compare
Also addresse @marc1404's concern and override containerd in kind node, as local failure(s) have also been observed there by now.
The proposed changes make the problem + fix more explicit. I personally would vote for this "forward-fix". |
I'm a fan of "fix-forward" if there is a real problem :) However, if there is nothing we gain out of v1.32 right now, I don't quite understand why we spend all this time to fix a problem that we wouldn't had if we just waited for the next release. Now we have workarounds and TODOs in our code and gain basically nothing out of it. 😕 |
… version The former go-cni version had a racy behavior that causes issues on our side.
7772fbe
to
e620a17
Compare
This includes the still supported versions for runc (v1.1.13) and containerd (v1.7.18).
e620a17
to
9ec3e11
Compare
containerd
to include fixed go-cni
versionkindest/node
to v1.32.0
After another round of discussion we concluded that it might be best to revert/update to |
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/approve
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/approve
Thank you! After a lot of back and forth I'm glad we can remove this workaround again 😅
I've added a commit to, temporarily, require a Renovate dashboard approval for patch updates of kindest/node
. I'll take care of checking future kindest/node
releases on whether they ship with the correct version of runc
and containerd
in order to remove this restriction again.
2221f70
(#11532)
LGTM label has been added. Git tree hash: dd7c93a89e65a457eebe3fd4692dbe7f539f2d82
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: marc1404, rfranzke The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
How to categorize this PR?
/area testing
/kind flake
What this PR does / why we need it:
Fix the flakes introduced with #11510.
Due to a racy locking behavior in go-cni, which has been included in containerd v2.0.2, upon the cni restart, our tests can get stuck and flake.
The image
kindest/node v1.32.0
used in this PR is based onkind v0.25.0
includes both still-supported versions for runc (v1.1.13
) and containerd (v1.7.18
).Which issue(s) this PR fixes:
Flakes:
Special notes for your reviewer:
/cc @ScheererJ
Release note: