Skip to content

record-aip-id-for-lambda-response-exceptions #159

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jun 13, 2025

Conversation

ehanson8
Copy link
Contributor

@ehanson8 ehanson8 commented Jun 6, 2025

Purpose and background context

If exceptions occur while parsing the lambda response, neither the aip_uuid or aip_s3_uri would be recorded in the output CSV. This can be caused large bags that raise JSONDecodeError exceptions when the lambda fails to validate the bag.

How can a reviewer manually see the effects of these changes?

A input file will be shared on Slack that exposes the issue. After setting the challenge secret for Prod, run the following command with the main branch:

pipenv run cli bulk-validate -i output/errors.csv -o output/errors_results_broken.csv

In the output CSV, see that the aip_uuid column is not set for the AIPs that experienced errors.

Fetch lambda-response-exception and run the following command:

pipenv run cli bulk-validate -i output/errors.csv -o output/errors_results_fixed.csv

In the output CSV, see that the aip_uuid column is set for the AIPs that experienced errors.

Includes new or updated dependencies?

YES

Changes expectations for external applications?

NO

Developer

  • All new ENV is documented in README
  • All new ENV has been added to staging and production environments
  • All related Jira tickets are linked in commit message(s)
  • Stakeholder approval has been confirmed (or is not needed)

Code Reviewer(s)

  • The commit message is clear and follows our guidelines (not just this PR message)
  • There are appropriate tests covering any new functionality
  • The provided documentation is sufficient for understanding any new functionality introduced
  • Any manual tests have been performed and verified
  • New dependencies are appropriate or there were no changes

Why these changes are being introduced:
* After the last update to bulk-validate, it was discovered that if an exception occurred while parsing the lambda response, the uuid or S3 URI would not be recorded.

How this addresses that need:
* Add aip_uuid and aip_s3_uri to dataframe the Except block of validate_aip_bulk_worker to ensure they are recorded in case of exceptions while parsing the lambda response
* Add corresponding CLI tests to show that the uuid and S3 URI are added to the results CSV in case of a lambda response exception
* Update dependencies

Side effects of this change:
* None

Relevant ticket(s):
* NA
@ehanson8 ehanson8 requested a review from a team as a code owner June 6, 2025 17:36
@jonavellecuerdo jonavellecuerdo self-assigned this Jun 6, 2025
@ghukill ghukill self-assigned this Jun 6, 2025
Copy link
Contributor

@ghukill ghukill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! I can see how this will still include the AIP UUID / S3 URI in the bulk response, even if the lambda goes sideways and has an error.

I do get the sense there is a bit of friction between three parts in play, given the last few bugs and PRs have been around this area:

  1. what data the actual, deployed Lambda includes in the response (or cannot include, if it fails to return a response)
  2. the function validate_aip_via_lambda(), that calls the lambda...
  3. the "bulk worker" function validate_aip_bulk_worker(), that calls validate_aip_via_lambda(), that calls the lambda...

Given that things are mostly functional, and edge cases exposed, it might be worth a ticket to see if those three things can be simplified or streamlined. Or, perhaps it's just a little finicky like this and these additions are the way to incrementally improve it!

All said, this change makes sense to me. Nice work!

Comment on lines +396 to +397
results_df.loc[row_index, "aip_uuid"] = aip_uuid
results_df.loc[row_index, "aip_s3_uri"] = s3_uri
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes sense to me.

If the function validate_aip_via_lambda got updated to include the AIP uuid + S3 URI in the response from this past PR, it makes sense to capture that output in this bulk handler to write to the output.

@jonavellecuerdo jonavellecuerdo removed their assignment Jun 13, 2025
@ehanson8 ehanson8 merged commit 452144a into main Jun 13, 2025
4 checks passed
@ehanson8 ehanson8 deleted the lambda-response-exception branch June 13, 2025 19:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants