Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Overwrite registry image ENTRYPOINT to avoid panic when scheduler-state.json file is corrupted. #292

Conversation

dimitar-kostadinov
Copy link
Contributor

@dimitar-kostadinov dimitar-kostadinov commented Nov 21, 2024

How to categorize this PR?

/area robustness
/kind enhancement

What this PR does / why we need it:
Implements mitigation for distribution/distribution#4478.
The registry-cache container is extended with command that checks scheduler-state.json and if it is corrupted, to clean up the persistent volume mounted under /var/lib/registry.
The command that perform the check and cleanup is:

      repoRoot=/var/lib/registry
      if [ -f "${repoRoot}/scheduler-state.json" ]; then
          if [ -s "${repoRoot}/scheduler-state.json" ]; then
              echo "The scheduler-state.json file is OK"
          else
              echo "Cleanup corrupted scheduler-state.json file"
              rm -f "${repoRoot}/scheduler-state.json"
              echo "Cleanup docker directory"
              rm -rf "${repoRoot}/docker"
          fi
      else
          echo "The scheduler-state.json file is not created yet"
      fi

      source /entrypoint.sh /etc/distribution/config.yml

Which issue(s) this PR fixes:
Fixes #291

Special notes for your reviewer:
Steps to reproduce the issue:

  1. Exec to registry container, e.g.k -n kube-system exec -it registry-docker-io-0 -- sh, and corrupt the file:
    rm var/lib/registry/scheduler-state.json
    touch var/lib/registry/scheduler-state.json
    
  2. Exec to the node, e.g. k -n shoot--local--local exec -it machine-shoot--local--local-local-6cffc-lxvbn -- bash, and kill the registry container:
    ctr -n k8s.io task kill -s SIGKILL c16425e8b798203b9d011b6445b0a2c483ae74b13f98f1c7c415c3868d240250
    
    use containerId from registry pod: status.containerStatuses[0].containerID.

Release note:

The following Distribution issue [distribution/distribution#4478](https://github.com/distribution/distribution/issues/4478) is now mitigated.

@gardener-prow gardener-prow bot added area/robustness Robustness, reliability, resilience related kind/enhancement Enhancement, improvement, extension cla: yes Indicates the PR's author has signed the cla-assistant.io CLA. labels Nov 21, 2024
@gardener-prow gardener-prow bot requested a review from ialidzhikov November 21, 2024 13:07
@gardener-prow gardener-prow bot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Nov 21, 2024
@dimitar-kostadinov
Copy link
Contributor Author

Known flake: #290

/test pull-gardener-extension-registry-cache-e2e-kind

/test pull-gardener-extension-registry-cache-unit

pkg/component/registrycaches/registry_caches.go Outdated Show resolved Hide resolved
pkg/component/registrycaches/registry_caches.go Outdated Show resolved Hide resolved
pkg/component/registrycaches/registry_caches.go Outdated Show resolved Hide resolved
pkg/component/registrycaches/registry_caches_test.go Outdated Show resolved Hide resolved
pkg/component/registrycaches/registry_caches.go Outdated Show resolved Hide resolved
@ialidzhikov
Copy link
Member

/assign

@dimitar-kostadinov
Copy link
Contributor Author

/hold

@gardener-prow gardener-prow bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Nov 22, 2024
@dimitar-kostadinov dimitar-kostadinov changed the title Add cleanup-volume init container to avoid panic when scheduler-state.json file is corrupted. Overwrite registry image ENTRYPOINT to avoid panic when scheduler-state.json file is corrupted. Nov 25, 2024
@dimitar-kostadinov
Copy link
Contributor Author

/unhold

@gardener-prow gardener-prow bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Nov 25, 2024
imagevector/images.yaml Outdated Show resolved Hide resolved
pkg/component/registrycaches/registry_caches.go Outdated Show resolved Hide resolved
Copy link
Member

@ialidzhikov ialidzhikov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@gardener-prow gardener-prow bot added the lgtm Indicates that a PR is ready to be merged. label Nov 25, 2024
Copy link
Contributor

gardener-prow bot commented Nov 25, 2024

LGTM label has been added.

Git tree hash: a32d790a8fcfaa7e359d572635d9114af3b40b87

Copy link
Contributor

gardener-prow bot commented Nov 25, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ialidzhikov

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@gardener-prow gardener-prow bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 25, 2024
@gardener-prow gardener-prow bot merged commit ec07ef3 into gardener:main Nov 25, 2024
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/robustness Robustness, reliability, resilience related cla: yes Indicates the PR's author has signed the cla-assistant.io CLA. kind/enhancement Enhancement, improvement, extension lgtm Indicates that a PR is ready to be merged. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement a mitigation for panic: unexpected end of JSON input
2 participants