Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pipeline failing with OOM Killed, invalid memory request in modify-windows-iso-file #572

Open
govindkailas opened this issue Dec 13, 2024 · 3 comments
Labels

Comments

@govindkailas
Copy link

What happened:
I am trying to create the PipelineRun for Win2k22 as explained here, https://artifacthub.io/packages/tekton-pipeline/kubevirt-tekton-pipelines/windows-efi-installer. I noticed that the Task `modify-windows-iso-file pod exited with OOM Kill.

windows2k22-installer-run-x2wgq-create-vm-root-disk-pod           0/1     Completed   0          2m33s
windows2k22-installer-run-x2wgq-delete-imported-configmaps-pod    0/1     Completed   0          28s
windows2k22-installer-run-x2wgq-delete-imported-iso-pod           0/1     Completed   0          28s
windows2k22-installer-run-x2wgq-import-win-iso-pod                0/1     Completed   0          2m47s
windows2k22-installer-run-x2wgq-modify-windows-iso-file-pod       2/3     OOMKilled   0          2m21s
windows2k22-installer-run-xd4b0a76bd261bbdd1b1831a65c85839f-pod   0/1     Completed   0          2m47s

Upon examining the pod, I noticed it has the wrong values for the request.memory

  step-convert-iso-file:
    Container ID:  containerd://ba43a82059f4af3f7fa26bd652ad97ce0b326359645563dacbcb20596f8d269d
    Image:         quay.io/kubevirt/tekton-tasks-disk-virt:v0.23.0
    Image ID:      quay.io/kubevirt/tekton-tasks-disk-virt@sha256:c7563bd9a5a9b09d15922823f5514ed17173db27362ab7b63960591c0bb2835e
    Port:          <none>
    Host Port:     <none>
    Command:
      /tekton/bin/entrypoint
    Args:
      -wait_file
      /tekton/run/1/out
      -post_file
      /tekton/run/2/out
      -termination_path
      /tekton/termination
      -step_metadata_dir
      /tekton/run/2/status
      -entrypoint
      /tekton/scripts/script-2-ltggf
      --
    State:          Running
      Started:      Thu, 12 Dec 2024 17:16:40 -0800
    Ready:          True
    Restart Count:  0
    Limits:
      cpu:     100m
      memory:  256Mi
    Requests:
      cpu:        16m
      memory:     44739242 ##<<=== This is incorrect 

Is there a way to override this? I couldn't determine where it is sourcing the values from.

Additional context:
Add any other context about the problem here.

Environment:

  • KubeVirt version (use virtctl version): v1.3.0
  • Kubernetes version (use kubectl version): v1.29.8
  • VM or VMI specifications: N/A
  • Cloud provider or hardware configuration: N/A
  • OS (e.g. from /etc/os-release): Ubuntu 22.04
@govindkailas govindkailas changed the title Pipeline failing with OOM Killed, Pipeline failing with OOM Killed, invalid memory request in modify-windows-iso-file Dec 13, 2024
@govindkailas
Copy link
Author

I also noticed that the pod is exiting with access permission errors on PVC,

+ guestfish -a tmp/target-pvc/disk.img -m /dev/sda tar-out / -
++ id -u
++ id -g
+ tar xvf - -C /tmp/extracted-iso-files -m --no-overwrite-dir --owner=107 --group=0 --no-same-permissions
./
tar: .: Cannot change mode to rwxrwxrwx: Operation not permitted

This is despite having the below config,

spec:
  taskRunSpecs:
    - pipelineTaskName: modify-windows-iso-file
      podTemplate:
        securityContext:
          fsGroup: 107
          runAsUser: 107

@ksimon1
Copy link
Member

ksimon1 commented Jan 6, 2025

Hi,
thanks for reporting this.
Our tasks/pipelines does not set any memory request in the manifests, this comes from tekton - https://tekton.dev/docs/pipelines/compute-resources/#requests.
You should be able to set compute resources via PipelineRun

spec:
  taskRunSpecs:
    - pipelineTaskName: modify-windows-iso-file
      computeResources: 
        requests:
          memory: ...

More info:

How much memory does your node have? I have never encounter this error.

The error in the log

+ guestfish -a tmp/target-pvc/disk.img -m /dev/sda tar-out / -
++ id -u
++ id -g
+ tar xvf - -C /tmp/extracted-iso-files -m --no-overwrite-dir --owner=107 --group=0 --no-same-permissions
./
tar: .: Cannot change mode to rwxrwxrwx: Operation not permitted

should not be causing any troubles.

@govindkailas
Copy link
Author

Thanks for your input, I have adjusted the task accordingly,

    taskRunSpecs:
    -   pipelineTaskName: modify-windows-iso-file
        podTemplate:
            securityContext:
                fsGroup: 107
                runAsUser: 107
        computeResources: 
            requests:
                memory: 256Mi    
            limits:
                memory: 1Gi

I could see the throttling on CPU and it used the limit memory as well.

k top pods 
NAME                                                          CPU(cores)   MEMORY(bytes)         
windows2k25-installer-run-9c6m4-modify-windows-iso-file-pod   100m         1033Mi         

Worker nodes have plenty of resources (~1.5 TB of Mem and 100 CPU).

I will increase the memory and CPU limits to see how it affects performance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants