Add failed-import batch archiving to aid debugging #24
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
My goal here is to make it easier to debug failed imports that occur under realtime processing. This code zips up the jurisdiction data directory that was being processed when an import fails, and puts it in the
archive/
path of the realtime processing S3 bucket. It logs the name of the archive zip to the log message, so that a debugger can download and examine which data was being imported at the time.This is kind of yet another hack on top of hacks, but hopefully at least just a logging thing and not something that further complicates the data flow here.
Thinking through this, I think a more reasonable overall process might be something like the following. I didn't implement this because it would be more work, and work spanning into openstates-core. But consider this a little mini-EP tagged onto this PR for your feedback to inform future work.
I think that would help a few things:
Now of course I think our original idea of the big SQL Mesh transformation engine is even better than the above, but the above is less work than that and probably still a significant improvement.