Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent simultaneous task failures multiple archive calls #1008

Open
wants to merge 2 commits into
base: develop
Choose a base branch
from

Conversation

arhall0
Copy link
Collaborator

@arhall0 arhall0 commented Feb 11, 2025

If multiple tasks fail simultaneously, they will all attempt to call beeflow.wf_manager.resources.wf_update.archive_workflow. This will succeed for the first task that makes the call, but then lead to an endless loop of calls for the other tasks that error since the workflow folder will no longer exist in .beeflow/workflows.

This solves that problem by checking the workflow state does not begin with Archived when attempting to archive a failed workflow. db.workflows.get_workflow_state seems to be the only way to do this once a workflow has been archived.

I put this check where a failed task would call archive_fail_workflow, but potentially this could be in archive_workflow directly.

Needs a test before removing WIP.

@arhall0 arhall0 added the WIP Work in progress label Feb 11, 2025
@arhall0 arhall0 self-assigned this Feb 11, 2025
@arhall0 arhall0 added bug Something isn't working WIP Work in progress and removed WIP Work in progress labels Feb 11, 2025
@arhall0 arhall0 force-pushed the prevent-multi-failed-tasks-archiving branch from 55553f1 to aae765e Compare February 12, 2025 17:31
@arhall0
Copy link
Collaborator Author

arhall0 commented Feb 12, 2025

I adjusted the solution based on discussion. Since the behavior of a failing task failing and archiving the entire workflow may change, I added a more general check to archive_workflow that prevents archiving if the workflow is already in an Archived state and logs a warning about this. I think this is a more robust fix as it should not need to be changed if task failing behavior changes.

@arhall0 arhall0 removed the WIP Work in progress label Feb 12, 2025
@arhall0 arhall0 requested review from rstyd and pagrubel February 12, 2025 21:00
@pagrubel
Copy link
Collaborator

Nice I tried a workflow of cat-grep-tar with a 2nd branch starting with running the cat step without a proper input file and all looks good. One branch fails and the workflow is still archived. Here is the dag. I think I may put the workflow in our data directory. I'll add a readme for it.
2fe1fb

@pagrubel
Copy link
Collaborator

pagrubel commented Feb 15, 2025

So the workflow that caused the error doesn't cause the error anymore, however, the above workflow continued but the final status was not in the archive so because a task failed on one branch the other branch continued to run but the workflow was archived too early. I've added the example. This workflow should be archived and removed from the ~/.beeflow/workflows directory. It was archived early and the status is incorrect. If you do a beeflow query on it, it looks fine because it is accessing the ~/.beeflow/workflow directory.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants