Skip to content

Refactor gcs cat retries into gcs_restapi.sh#4880

Open
chandramerla wants to merge 2 commits into
kubevirt:mainfrom
chandramerla:refactor-gcs-cat-retry-s390x
Open

Refactor gcs cat retries into gcs_restapi.sh#4880
chandramerla wants to merge 2 commits into
kubevirt:mainfrom
chandramerla:refactor-gcs-cat-retry-s390x

Conversation

@chandramerla
Copy link
Copy Markdown
Member

This avoids code duplication in prow jobs and keeps retry logic in a new wrapper function of cat operation

What this PR does / why we need it:
This PR refactors the GCS file reading retry logic by introducing a new cat_gcs_file_with_retry() function in gcs_restapi.sh. This eliminates code duplication across Prow job configurations by centralizing the retry mechanism for handling GCS eventual consistency issues.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #

Special notes for your reviewer:

  • The retry logic behavior remains functionally equivalent to the previous implementation
  • Error messages are now consistently sent to stderr while content goes to stdout

Checklist

This checklist is not enforcing, but it's a reminder of items that could be relevant to every PR.
Approvers are expected to review this list.

Release note:

NONE

@kubevirt-bot
Copy link
Copy Markdown
Contributor

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@kubevirt-bot kubevirt-bot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. dco-signoff: no Indicates the PR's author has not DCO signed all their commits. size/M labels Mar 27, 2026
This avoids code duplication in prow jobs and keeps retry logic in a new wrapper function of cat operation

Signed-off-by: chandramerla <Chandra.Merla@ibm.com>
@chandramerla chandramerla force-pushed the refactor-gcs-cat-retry-s390x branch from 9237e35 to 49613f5 Compare March 30, 2026 14:00
@kubevirt-bot kubevirt-bot added dco-signoff: yes Indicates the PR's author has DCO signed all their commits. and removed dco-signoff: no Indicates the PR's author has not DCO signed all their commits. labels Mar 30, 2026
@chandramerla chandramerla marked this pull request as ready for review March 30, 2026 14:04
@kubevirt-bot kubevirt-bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 30, 2026
@kubevirt-bot kubevirt-bot requested review from dollierp and enp0s3 March 30, 2026 14:04
Copy link
Copy Markdown

@sourcery-ai sourcery-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've left some high level feedback:

  • In the prow job scripts, you can simplify the retry call pattern by using command negation instead of checking $?, e.g. if ! KUBEVIRTCI_TAG=$(cat_gcs_file_with_retry kubevirt-prow "$GCS_FILE_PATH" "false"); then ...; fi, which avoids a separate assignment and status check.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- In the prow job scripts, you can simplify the retry call pattern by using command negation instead of checking `$?`, e.g. `if ! KUBEVIRTCI_TAG=$(cat_gcs_file_with_retry kubevirt-prow "$GCS_FILE_PATH" "false"); then ...; fi`, which avoids a separate assignment and status check.

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

@chandramerla
Copy link
Copy Markdown
Member Author

@dhiller As discussed here, I’ve created this follow-up PR.

Since this change affects images/bootstrap/gcs_restapi.sh, I believe we’ll first need to build and publish the updated bootstrap image, and then reference it in the jobs we use for rehearsals.

@dollierp — could you please help with this?

Copy link
Copy Markdown
Contributor

@dollierp dollierp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

/lgtm

@kubevirt-bot kubevirt-bot added the lgtm Indicates that a PR is ready to be merged. label Mar 30, 2026
@chandramerla
Copy link
Copy Markdown
Member Author

@dollierp Thanks for the lgtm!
Could you please help me rehearsal on this PR? Since this change affects images/bootstrap/gcs_restapi.sh, I think we first need to build and publish the updated bootstrap image before referencing it in rehearsal jobs.

@dollierp
Copy link
Copy Markdown
Contributor

Hi @chandramerla,

I crafted rehearsal jobs modified to use the gcs_restapi.sh from this PR:

Please have a look.

Regards,

The rm_gcs_file() function was silently failing to delete files from
GCS storage, causing files to accumulate at:
https://gcsweb.ci.kubevirt.io/gcs/kubevirt-prow/release/kubevirt/kubevirtci/

Root Cause:
- Function checked if response body was empty to determine success
- Empty response was assumed to mean successful deletion
- However, auth failures, rate limits, and other errors can also
  return empty responses, causing false success reports

Evidence of Intermittent Failures:
- Failed: https://prow.ci.kubevirt.io/view/gs/kubevirt-prow/logs/publish-kubevirtci-s390x/2038751228271267840
- Worked: https://prow.ci.kubevirt.io/view/gs/kubevirt-prow/logs/publish-kubevirtci-s390x/2038925877974142976
- Worked: https://prow.ci.kubevirt.io/view/gs/kubevirt-prow/logs/publish-kubevirtci-s390x/2037221243811270656

The intermittent nature suggests auth rate limiting or transient
network issues were causing failures that went undetected.

Fix:
- Capture HTTP status code from curl response
- HTTP 204 (No Content) = successful deletion
- HTTP 404 (Not Found) = already deleted (also success)
- Any other status = actual error, log details and return failure
- Preserve original 'exit 1' behavior on auth failure

This ensures deletion failures are properly detected and reported,
preventing accumulation of stale GCS files.
@kubevirt-bot kubevirt-bot added dco-signoff: no Indicates the PR's author has not DCO signed all their commits. and removed lgtm Indicates that a PR is ready to be merged. labels Mar 31, 2026
@kubevirt-bot
Copy link
Copy Markdown
Contributor

New changes are detected. LGTM label has been removed.

@kubevirt-bot kubevirt-bot added size/L and removed dco-signoff: yes Indicates the PR's author has DCO signed all their commits. size/M labels Mar 31, 2026
@chandramerla
Copy link
Copy Markdown
Member Author

@dollierp
Thanks for running the reharsal jobs.

While periodic-kubevirtci-bump-centos-base-s390x went fine. I saw publish-kubevirtci-s390x failed because the gcs file release/kubevirt/kubevirtci/amd64-d38e43dc created by x86 publish job isn't deleted by the complementing s390x job (though it said deleted at the end File release%2Fkubevirt%2Fkubevirtci%2Famd64-d38e43dc deleted successfully it didn't). This seems like some common problem seeing the https://gcsweb.ci.kubevirt.io/gcs/kubevirt-prow/release/kubevirt/kubevirtci/, as many files there are not deleted intermittently (failed, Worked, Worked)
So, I've update the rm_gcs_file() to be based on right response code than treating empty response as success and also to capture more details when it fails.

Could you please trigger both x86 publish job and rehearse-publish-kubevirtci-s390x-dollierp (ensuring it uses updated gcs_restapi.sh from this PR) on a same commit?

@dollierp
Copy link
Copy Markdown
Contributor

dollierp commented Apr 1, 2026

Hi @chandramerla,

New rehearsal jobs:

1 similar comment
@dollierp
Copy link
Copy Markdown
Contributor

dollierp commented Apr 1, 2026

Hi @chandramerla,

New rehearsal jobs:

Copy link
Copy Markdown
Contributor

@dhiller dhiller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve

@kubevirt-bot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dhiller

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@kubevirt-bot kubevirt-bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 14, 2026
@brianmcarey
Copy link
Copy Markdown
Member

/check-dco

@kubevirt-bot
Copy link
Copy Markdown
Contributor

Thanks for your pull request. Before we can look at it, you'll need to add a 'DCO signoff' to your commits.

📝 Please follow instructions in the contributing guide to update your commits with the DCO

Full details of the Developer Certificate of Origin can be found at developercertificate.org.

The list of commits missing DCO signoff:

  • 73cc28d Fix GCS file deletion by checking HTTP status codes
Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Copy link
Copy Markdown
Member

@brianmcarey brianmcarey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this @chandramerla

The second commit is missing a sign off but apart from that it looks good to me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. dco-signoff: no Indicates the PR's author has not DCO signed all their commits. size/L

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants