-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stop fencing actions when node gets healthy #159
Comments
I just meant that we don't need to fence by deleting resources or adding the out-of-service taint if the node is healthy. |
I think so too. We can just switch to the fencing completed phase if the SNR CR has the deletion timestamp. |
I think I'm leaning into staying with the original approach. |
There is a good chance that reboots solves issues on the node, and the node gets healthy again. NHC will delete the SNR CR in that case.
When SNR assumes the node rebooted by waiting some time, it just continues fencing by deleting resources or adding the out-of-service taint though. This isn't a big issue, because there shouldn't be any workloads running after the reboot (because of the "normal" NoExecute taint).
However, it probably makes sense to skip this step, because there is no need anymore to delete the remaining pods which tolerate the NoExecute taint on a healthy node. Probably we can switch directly to the "FencingCompleted" code branch, which does the usual cleanup, like removing that NoExecute taint.
@k-keiichi-rh @mshitrit
This was triggered by the discussion here: medik8s/fence-agents-remediation#92 (comment)
The text was updated successfully, but these errors were encountered: