Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Workers concurrently register execution for one step #786

Open
trihoangvo opened this issue Sep 28, 2022 · 0 comments
Open

Workers concurrently register execution for one step #786

trihoangvo opened this issue Sep 28, 2022 · 0 comments
Labels
bug Something isn't working

Comments

@trihoangvo
Copy link
Contributor

trihoangvo commented Sep 28, 2022

Bug Report

Description

In yorc logs, we see sometimes, two task executions are registered for one step at the same time. (See two duplicated entries "All previous steps of step" in the log below). As a result, workers concurrently work on the registered task execution. They create and remove ansible files from each other (e.g., overlay files) and cause unexpected error.

2022/09/20 17:05:26 [DEBUG] All previous steps of step:"Secrule_inbound_RDSMySQL_database_endpoint_install" are done, so it can be registered to be executed
2022/09/20 17:05:26 [DEBUG] All previous steps of step:"Secrule_inbound_RDSMySQL_database_endpoint_install" are done, so it can be registered to be executed
2022/09/20 17:05:26 [DEBUG] Register task execution with ID:"445ecfcc-da88-418e-ac35-1493d3c7a057", taskID:"89f0adf7-4b64-47bc-b76e-7b74e5c58905" and step:"Secrule_inbound_RDSMySQL_database_endpoint_install"
2022/09/20 17:05:26 [DEBUG] Will store runningExecutions with id "445ecfcc-da88-418e-ac35-1493d3c7a057" in txn for task "89f0adf7-4b64-47bc-b76e-7b74e5c58905"
2022/09/20 17:05:26 [DEBUG] Register task execution with ID:"d88fc3f0-6a59-4be1-81b8-20c800ed5b81", taskID:"89f0adf7-4b64-47bc-b76e-7b74e5c58905" and step:"Secrule_inbound_RDSMySQL_database_endpoint_install"
2022/09/20 17:05:26 [DEBUG] Will store runningExecutions with id "d88fc3f0-6a59-4be1-81b8-20c800ed5b81" in txn for task "89f0adf7-4b64-47bc-b76e-7b74e5c58905"

Expected behavior

One task execution for a given step is registered.

Actual behavior

Two duplicated task executions are registered.

Additional information you deem important

This issue happens only occasionally when two or more steps join into one step. For example:

Step 1 ------> Step 3
Step 2 -----------^

  • Worker 1 completes Step 1.
  • Worker 2 completes Step 2 and registers next Step 3.
  • Worker 2 checks all previous steps of 3 are DONE, acquires lock for task, registers task execution A for step 3, and unlocks.
  • Worker 1 checks all previous steps of 3 are DONE, acquires lock for task, registers task execution B for step 3, and unlocks.

The following lock:

https://github.com/ystia/yorc/blob/develop/tasks/workflow/step.go#L621

cannot prevent this issue.

Output of yorc version

develop

Priority

Low

@trihoangvo trihoangvo added the bug Something isn't working label Sep 28, 2022
trihoangvo added a commit to opentelekomcloud/yorc that referenced this issue Oct 19, 2022
… in the joined node

* When one or more steps (e.g., Step1 and Step2) join one Step 3, it may happen occasionally
  that two workers register two duplicate task executions for one next step.
  See bug report in ystia#786
* Fix: Before registering a new task execution, a worker checks if the task execution has already
  been registered for the given step (at _yorc/tasks/<taskID>/.registeredExecutions/<stepName>)
  When deleting the ".runningExecutions" in notifyEnd(), also delete the ".registeredExecution".
* Note: If the step status is DONE, ERROR, or CANCELED, we still register a task execution.
  This is the case where we resume a workflow. The task executions are registered run again.
  But they will be bypass if the given step is DONE.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant