Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

One conformance test fails on v1.32.1+k3s1 #11774

Closed
pnagy-cldr opened this issue Feb 12, 2025 · 3 comments
Closed

One conformance test fails on v1.32.1+k3s1 #11774

pnagy-cldr opened this issue Feb 12, 2025 · 3 comments

Comments

@pnagy-cldr
Copy link

pnagy-cldr commented Feb 12, 2025

Environmental Info:
K3s Version:
k3s version v1.32.1+k3s1 (6a322f1)
go version go1.23.4

Node(s) CPU architecture, OS, and Version:
Linux HOSTNAME 5.4.0-1041-aws #43-Ubuntu SMP Fri Mar 19 22:06:16 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

Cluster Configuration:
NAME STATUS ROLES AGE VERSION
HOSTNAME Ready control-plane,master 3m18s v1.32.1+k3s1
HOSTNAME Ready 2m50s v1.32.1+k3s1
HOSTNAME Ready 2m51s v1.32.1+k3s1
HOSTNAME Ready 2m51s v1.32.1+k3s1

Describe the bug:
We run conformance tests with sonobuoy to ensure that our k3s cluster where we test our products is CNCF conformance.
On v1.30.1, there were 2 failing tests. We upgraded k3s to v1.32.1 and it has only a single failing testcase:

"Servers with support for API chunking should support continue listing from the last key if the original version has been compacted away, though the list is inconsistent"

Steps To Reproduce:
We installed k3s using an older version of the script from https://get.k3s.io/ (it seems to me that the only difference is that the latest version can install k3s from github PR, ours does not have that feature yet)

The master node gets the following parameters:
INSTALL_K3S_EXEC="--default-local-storage-path=/data/0 --disable=servicelb --disable=traefik --write-kubeconfig-mode=644 --node-label topology.kubernetes.io/zone=k3s_zone0" ARCH=amd64 INSTALL_K3S_VERSION=v1.32.1+k3s1

And 3 additional nodes are installed in the cluster with the same script, they are installed using the following parameters:
K3S_URL=https://HOSTNAME:6443/ K3S_TOKEN=TOKEN ARCH=amd64 INSTALL_K3S_VERSION=v1.32.1+k3s1 INSTALL_K3S_EXEC="--node-label topology.kubernetes.io/zone=k3s_zone_2"

So there are 4 nodes in the cluster.

Run all conformance tests:
sonobuoy run --plugin e2e --wait --mode certified-conformance

Expected behavior:
Since k3s should be a CNCF conformant kubernetes distribution, we expect all tests to run successfully.

Actual behavior:
One conformance test fails.

Additional context / logs:
At this point we are unsure if something is missing from our installation, our there is a real conformance bug.
We executed the same test on other distributions and it did not fail there, so probably the test itself is okay.

@brandond
Copy link
Member

I believe this is a flakey test, or at least depends on etcd-specific compaction intervals that are not guaranteed by alternative datastores like kine.

The test assumes that a key will be compacted within a specific timeframe, and that precondition is not necessarily met by kine. This isn't a bug - kine does support the behavior that is being tested here (listing from the last key if the original version even if it's been compacted) - but the compaction behavior for kine is slightly different than etcd.

Note that the test specifically waits 2*storagebackend.DefaultCompactInterval:
https://github.com/kubernetes/kubernetes/blob/v1.32.1/test/e2e/apimachinery/chunking.go#L168

Kine is not guaranteed to compact a key within this timeframe, as the compact requests from the apiserver are ignored. It has its own internal compaction interval, AND always retains a minimum number of recent revisions in order to ensure that all SQL clients have a consistent view of revisions created in the SQL datastore.

@github-project-automation github-project-automation bot moved this from New to Done Issue in K3s Development Feb 12, 2025
@pnagy-cldr
Copy link
Author

@brandond I have executed the test on several versions several times and it is failing consistently.

I have also tried to deploy k3s with the same command what is documented here that is used for conformance certification:

https://github.com/cncf/k8s-conformance/pull/3555/files#diff-7e1ef9f9a93abef036acbf7a854a2d2aec40d65353e1f7de33b9ce8b9c10bcf0

It also fails on that setup. How do you certify it?

Could you tell me what should we change on our setup? If etcd is the problem, should I use another datastore? I haven't configured etcd to be used explicitly, is it the default one? (I thought sqlite is the default)

@brandond
Copy link
Member

brandond commented Feb 12, 2025

The feature being tested works. It's just that kine probably won't compact within the expected timeframe. The test really shouldn't be hardcoded to require that compaction occurs within a specific timeframe, or perhaps it should just allow for a longer timeframe. For kine, there also needs to be a minimum number of new revisions committed to the datastore in order for an old revision to be compacted - so there needs to be other writes going on, in addition to waiting for compaction to occur.

I would probably open an issue with the Kubernetes project.

If you need the test to pass on your cluster 100% of the time, use embedded etcd instead of kine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done Issue
Development

No branches or pull requests

2 participants