Refactor gcs cat retries into gcs_restapi.sh#4880
Conversation
|
Skipping CI for Draft Pull Request. |
This avoids code duplication in prow jobs and keeps retry logic in a new wrapper function of cat operation Signed-off-by: chandramerla <Chandra.Merla@ibm.com>
9237e35 to
49613f5
Compare
There was a problem hiding this comment.
Hey - I've left some high level feedback:
- In the prow job scripts, you can simplify the retry call pattern by using command negation instead of checking
$?, e.g.if ! KUBEVIRTCI_TAG=$(cat_gcs_file_with_retry kubevirt-prow "$GCS_FILE_PATH" "false"); then ...; fi, which avoids a separate assignment and status check.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- In the prow job scripts, you can simplify the retry call pattern by using command negation instead of checking `$?`, e.g. `if ! KUBEVIRTCI_TAG=$(cat_gcs_file_with_retry kubevirt-prow "$GCS_FILE_PATH" "false"); then ...; fi`, which avoids a separate assignment and status check.Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
|
@dollierp Thanks for the lgtm! |
|
Hi @chandramerla, I crafted rehearsal jobs modified to use the Please have a look. Regards, |
The rm_gcs_file() function was silently failing to delete files from GCS storage, causing files to accumulate at: https://gcsweb.ci.kubevirt.io/gcs/kubevirt-prow/release/kubevirt/kubevirtci/ Root Cause: - Function checked if response body was empty to determine success - Empty response was assumed to mean successful deletion - However, auth failures, rate limits, and other errors can also return empty responses, causing false success reports Evidence of Intermittent Failures: - Failed: https://prow.ci.kubevirt.io/view/gs/kubevirt-prow/logs/publish-kubevirtci-s390x/2038751228271267840 - Worked: https://prow.ci.kubevirt.io/view/gs/kubevirt-prow/logs/publish-kubevirtci-s390x/2038925877974142976 - Worked: https://prow.ci.kubevirt.io/view/gs/kubevirt-prow/logs/publish-kubevirtci-s390x/2037221243811270656 The intermittent nature suggests auth rate limiting or transient network issues were causing failures that went undetected. Fix: - Capture HTTP status code from curl response - HTTP 204 (No Content) = successful deletion - HTTP 404 (Not Found) = already deleted (also success) - Any other status = actual error, log details and return failure - Preserve original 'exit 1' behavior on auth failure This ensures deletion failures are properly detected and reported, preventing accumulation of stale GCS files.
|
New changes are detected. LGTM label has been removed. |
|
@dollierp While periodic-kubevirtci-bump-centos-base-s390x went fine. I saw publish-kubevirtci-s390x failed because the gcs file release/kubevirt/kubevirtci/amd64-d38e43dc created by x86 publish job isn't deleted by the complementing s390x job (though it said deleted at the end Could you please trigger both x86 publish job and rehearse-publish-kubevirtci-s390x-dollierp (ensuring it uses updated gcs_restapi.sh from this PR) on a same commit? |
|
Hi @chandramerla, New rehearsal jobs: |
1 similar comment
|
Hi @chandramerla, New rehearsal jobs: |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: dhiller The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/check-dco |
|
Thanks for your pull request. Before we can look at it, you'll need to add a 'DCO signoff' to your commits. 📝 Please follow instructions in the contributing guide to update your commits with the DCO Full details of the Developer Certificate of Origin can be found at developercertificate.org. The list of commits missing DCO signoff:
DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
brianmcarey
left a comment
There was a problem hiding this comment.
Thanks for this @chandramerla
The second commit is missing a sign off but apart from that it looks good to me.
This avoids code duplication in prow jobs and keeps retry logic in a new wrapper function of cat operation
What this PR does / why we need it:
This PR refactors the GCS file reading retry logic by introducing a new cat_gcs_file_with_retry() function in gcs_restapi.sh. This eliminates code duplication across Prow job configurations by centralizing the retry mechanism for handling GCS eventual consistency issues.
Which issue(s) this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)format, will close the issue(s) when PR gets merged):Fixes #
Special notes for your reviewer:
Checklist
This checklist is not enforcing, but it's a reminder of items that could be relevant to every PR.
Approvers are expected to review this list.
Release note: