Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DRA: convert periodics to use pre-built kind images #33980

Open
pohly opened this issue Dec 17, 2024 · 17 comments
Open

DRA: convert periodics to use pre-built kind images #33980

pohly opened this issue Dec 17, 2024 · 17 comments
Assignees
Labels
kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. wg/device-management Categorizes an issue or PR as relevant to WG Device Management.

Comments

@pohly
Copy link
Contributor

pohly commented Dec 17, 2024

What should be cleaned up or changed:

Cheaper than each job compiling from source.

https://kind.sigs.k8s.io/docs/user/quick-start/#building-images

Provide any links for context:

https://kubernetes.slack.com/archives/C2C40FMNF/p1734418756244779?thread_ts=1734417601.687079&cid=C2C40FMNF

cc @BenTheElder

@pohly pohly added the kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. label Dec 17, 2024
@k8s-ci-robot k8s-ci-robot added the needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label Dec 17, 2024
@pohly
Copy link
Contributor Author

pohly commented Dec 17, 2024

@BenTheElder: on Slack you linked to https://kind.sigs.k8s.io/docs/user/quick-start/#building-images as documentation, but I don't see there how to find a pre-built kind image that matches the current Kubernetes source. Can you elaborate how that would work?

If a periodic job runs for a certain revision of the k/k repo, then I'd like to use a kind image built exactly for that revision. Otherwise detecting the exact revision which introduced a regression will be much harder. In particular, the revisions shown by testgrid would be misleading.

@pohly
Copy link
Contributor Author

pohly commented Dec 17, 2024

/wg device-management

@k8s-ci-robot k8s-ci-robot added wg/device-management Categorizes an issue or PR as relevant to WG Device Management. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Dec 17, 2024
@BenTheElder
Copy link
Member

@BenTheElder: on Slack you linked to https://kind.sigs.k8s.io/docs/user/quick-start/#building-images as documentation, but I don't see there how to find a pre-built kind image that matches the current Kubernetes source. Can you elaborate how that would work?

I meant we can use pre-built kubernetes binaries, the cloud e2es consume from a shared job that publishes kubernetes CI builds with binaries / container images / tarballs of these.

kind has support to build a node image from those, so you skip compiling kubernetes and just download it instead, but you still pack that onto the base image.

@BenTheElder
Copy link
Member

Right now we only have super convenient support for releases, where you can just do kind build node-image v1.32.0, but it can consume manually supplied kubernetes server tarball release URLs from dl.k8s.io, and you can compute that from looking up the latest CI build.

@BenTheElder
Copy link
Member

BenTheElder commented Dec 17, 2024

like this shell one-liner: kind build node-image 'https://dl.k8s.io/ci/'"$(curl -sSL https://dl.k8s.io/ci/latest.txt)"'/kubernetes-server-linux-amd64.tar.gz'

@BenTheElder
Copy link
Member

BenTheElder commented Dec 17, 2024

If a periodic job runs for a certain revision of the k/k repo, then I'd like to use a kind image built exactly for that revision. Otherwise detecting the exact revision which introduced a regression will be much harder. In particular, the revisions shown by testgrid would be misleading.

it's possible to write out the tested commit without it coming from the prowjob cloning it, a lot of the prowjobs using kubetest are doing this and not even cloning k/k at all, just fetching a build, and recording the commit from the build metadata.

e.g. https://prow.k8s.io/view/gs/kubernetes-ci-logs/logs/ci-kubernetes-e2e-gci-gce/1868903398972067840 doesn't clone but you can see commits in testgrid: https://testgrid.k8s.io/sig-release-master-blocking#gce-cos-master-default

this is via https://storage.googleapis.com/kubernetes-ci-logs/logs/ci-kubernetes-e2e-gci-gce/1868903398972067840/artifacts/metadata.json IIRC

@BenTheElder
Copy link
Member

https://docs.prow.k8s.io/docs/metadata-artifacts/ (it's finished.json, which is containing a superset of the test-runner-written metadata.json)

@pohly
Copy link
Contributor Author

pohly commented Dec 17, 2024

Thanks, that should get me started.

/assign

@pohly
Copy link
Contributor Author

pohly commented Feb 13, 2025

https://docs.prow.k8s.io/docs/metadata-artifacts/ (it's finished.json, which is containing a superset of the test-runner-written metadata.json)

What wasn't clear to me was how I can derive the "repo-version" from the data in https://dl.k8s.io/ci/latest.txt or the kubernetes-server-linux-amd64.tar.gz. Looking at how https://testgrid.k8s.io/sig-release-master-blocking#gce-cos-master-default does it, it seems... that I don't need to?

https://storage.googleapis.com/kubernetes-ci-logs/logs/ci-kubernetes-e2e-gci-gce/1890029131647684608/finished.json contains only "revision", which matches https://dl.k8s.io/ci/latest.txt:

{"timestamp":1739455473,"passed":true,"metadata":{"control_plane_node_os_image":"cos-109-17800-436-33","job-version":"v1.33.0-alpha.1.158+2642d8222d8524","kubetest-version":"v20250212-16f67660c2","revision":"v1.33.0-alpha.1.158+2642d8222d8524","worker_node_os_image":"cos-109-17800-436-33"},"result":"SUCCESS"}

@BenTheElder
Copy link
Member

What wasn't clear to me was how I can derive the "repo-version" from the data in https://dl.k8s.io/ci/latest.txt or the kubernetes-server-linux-amd64.tar.gz. Looking at how https://testgrid.k8s.io/sig-release-master-blocking#gce-cos-master-default does it, it seems... that I don't need to?

Yeah, I think you only need to make sure that it's persisted in metadata.json (finished.json will merge this)

This format is organic, hacky, limited docs ... but it's mostly relevant to testgrid at this point, and we mostly care about logging the commit hash somewhere so we can get the diff links (IMHO)

There are some current docs at https://docs.prow.k8s.io/docs/metadata-artifacts/

@BenTheElder
Copy link
Member

You will be the first in a while to do this without using kubetest(2), there are not many tools generating these files.

@pohly
Copy link
Contributor Author

pohly commented Feb 14, 2025

Most of the changes were made in #34330, with one small fix in #34331.

https://prow.k8s.io/view/gs/kubernetes-ci-logs/logs/ci-kind-dra/1890450663226216448 and https://prow.k8s.io/view/gs/kubernetes-ci-logs/logs/ci-kind-dra-all/1890409894180294656 ran with those changes and completed fine. They were significantly faster, too: 17 minutes instead of 35 minutes when building from source.

Left to do before closing the issue:

@BenTheElder
Copy link
Member

Left to do before closing the issue: watch resource usage and lower the requested CPU/RAM in those jobs.

You may run into issues doing that, because requesting the CPUs is a proxy for "be the sole tenant", to reserve I/O

(There are other options for I/O though, like kubernetes-sigs/kind#845 (comment))

@pohly
Copy link
Contributor Author

pohly commented Feb 14, 2025

Do we have some empiric values that a kind-based job should request at least, even if the actual usage then will be lower?

@BenTheElder
Copy link
Member

Am I reading correctly that it actually only requests 2 cores already? That would explain particularly large diff in compile time.

Do we have some empiric values that a kind-based job should request at least, even if the actual usage then will be lower?

Most of them were just going "I'll be the single tenant" (which ... #34139), if you haven't been seeing flakes with 2 cores then you may be finding the recommendation.

It's going to depend on workload though ...

@pohly
Copy link
Contributor Author

pohly commented Feb 19, 2025

/milestone v1.33

@k8s-ci-robot
Copy link
Contributor

@pohly: The provided milestone is not valid for this repository. Milestones in this repository: [someday, v1.24, v1.25]

Use /milestone clear to clear the milestone.

In response to this:

/milestone v1.33

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. wg/device-management Categorizes an issue or PR as relevant to WG Device Management.
Projects
Status: 🏗 In progress
Development

No branches or pull requests

3 participants