Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Following the installation instructions and testing the simple example in the readme fails in imagecheck-0 step #498

Open
aressem opened this issue Feb 6, 2025 · 6 comments
Labels
documentation Improvements or additions to documentation

Comments

@aressem
Copy link

aressem commented Feb 6, 2025

Installing the v0.22.0 version of the stack using the helm chart completes successfully and the controller pod is running. Creating a pipeline with the example in the readme fails.

Example used:

steps:
- label: Hello World!
  agents:
    queue: kubernetes
  plugins:
  - kubernetes:
      podSpec:
        containers:
        - image: alpine:latest
          command:
          - echo Hello World!

Error observed:

Warning  Failed            4s    kubelet            Error: failed to create containerd task: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: exec: "/workspace/tini-static": stat /workspace/tini-static: no such file or directory: unknown

To me it seems like the imagecheck-0 container picks up the alpine image that does not have the tini-static binary:

Init Containers:
  imagecheck-0:
    Container ID:  containerd://8259ee370a235ee7e1a9ecac2dfac7b0264399e1dc8e68696ab0152e245ae560
    Image:         alpine:latest
    Image ID:      alpine@sha256:56fa17d2a7e7f168a043a2712e63aed1f8543aeafdcee47c58dcffe38ed51099
    Port:          <none>
    Host Port:     <none>
    Command:
      /workspace/tini-static
    Args:
      --version
    State:          Terminated
      Reason:       StartError
      Message:      failed to create containerd task: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: exec: "/workspace/tini-static": stat /workspace/tini-static: no such file or directory: unknown
      Exit Code:    128
      Started:      Thu, 01 Jan 1970 01:00:00 +0100
      Finished:     Thu, 06 Feb 2025 23:55:32 +0100

So have I misunderstood something here or is the example just not working anymore?

@DrJosh9000
Copy link
Contributor

Hi @aressem, thanks for raising the issue.

/workspace/tini-static (along with /workspace/buildkite-agent) should be copied in from the copy-agent init container, which by default uses the buildkite-agent image. Have you changed the image config value when deploying the stack?

@aressem
Copy link
Author

aressem commented Feb 12, 2025

When the image can't be pulled due to either network timeouts or security policies, this is the error that was shown. The root cause here was a network issue that caused a image pull backoff from the kubelet. I suspect that the imagecheck container dont retry on image pull failure. I think that this can be closed.

@aressem
Copy link
Author

aressem commented Feb 12, 2025

After experimenting a bit more with this it seems like the imagecheck-0 can actually be scheduled to run as an init container before the copy-agent container. This will result in the reported failure as copy-agent is responsible for copying the tini-static binary.

@DrJosh9000
Copy link
Contributor

That's an interesting theory @aressem - because init containers run sequentially it would mean imagecheck-* is somehow being inserted in the slice of containers before copy-agent.

@aressem
Copy link
Author

aressem commented Feb 14, 2025

You have to ensure that the ´copy-agent´ runs before the ´imagecheck-*´ init container. I'm able to get the reported error every time now. This started when we preloaded the image used in the build container (container-0) in the AMIs. Pulling that image would then be much faster that the agent image on a new machine. Bottom line is that the resource definition must make sure that the copy-agent runs first.

@DrJosh9000
Copy link
Contributor

That's what's interesting @aressem - copy-agent is always the first init container:

initContainers := []corev1.Container{w.createWorkspaceSetupContainer(podSpec, workspaceVolume)}

imagecheck-* containers are only appended to the end of initContainers:

initContainers = append(initContainers, corev1.Container{

and finally any user-supplied init containers are appended to the end of initContainers before being used in the job spec:

podSpec.InitContainers = append(initContainers, podSpec.InitContainers...)

@petetomasik petetomasik added the documentation Improvements or additions to documentation label Feb 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests

3 participants