-
Notifications
You must be signed in to change notification settings - Fork 224
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Step in plan silently not running when using list of targets #3321
Comments
Using |
So - I ditched using --target @list and instead used a group in my inventory file, pointing to a plugin which reads the same text file and turns it into json. The plugin works fine, and I see the targets listed with bolt inventory show.. However, I get exactly the same behaviour my plan - still the same as above - returns the text files, and does nothing at all about the json documents. No errors, warnings, messages, no instance of the word 'json' in the output or a trace log. I also tried this from a different system - so it's not my OS or installation of bolt. Whatever this is, it's happening for me regardless of the way I pass targets to bolt. |
Okay - I think Ive worked out what's happening - but I'm failing to understand why. If these two failing nodes are present in the list of targets the run completes without attempting the second json file download step on any of the targets - including the ones that are fine. All the working targets have their text file downloaded via the first step - but there's absolutely nothing in the output (inc. trace level) about json. No failure messages, or downloads, nothing - the download/json folder on the bolt host isn't even created. The two failing nodes are highlighted in the output as expected and are unable to download the test.txt file - but from the rest of the targets I do get the test.txt file - but not the json. If I remove the two problematic nodes, I get both the json and txt files from all remaining nodes. This also explains why I was getting different output between using a --targets @list vs --targets host1,host2 - because in this specific case @list contained a failing node neither of which were host 1 or host2. Adding catch_errors: true to both steps in the plan means I get the downloads from all (working) targets - and while this is a useable workround in this use case it doesn't seem very satisfactury. This all seems rather counter intuitive to me - unless I'm misunderstanding something or have setup something very badly wrong - it looks like the failure of the first step on one target prevents subsequent steps being attempted on all targets. This might be by design - though seems very strange to me - but doesn't really explain why there is nothing at all in the logs when this occurs. |
In the case where the first step has failures and the second step is not started (which is expected) does the plan report that it has completed successfully? |
The plan reports the failure of the first step on the failed nodes. And reports, otherwise, that the plan succeeded. However, there is absolutely zero mention of the second step in any of the logs. If I remove the two failing nodes from the list of targets, the second step is attempted. Is it really expected that the second step would not be attempted on the nodes where the first step did succeed? I've worked round this for now by setting catch_errors: true on the first step. While this all makes sense for a single node - the idea that a single failing step on just one node would cause all attempts at running the second step on any nodes - including those who succeeded with the frst step - and without any indication what is happening in the logs is extremely confusing.. If this is the intended behaviour - and it seems odd to me that it might be - though I could find nothing in the documentation discussing the ordering of multi-step plans on targets is. (clearly Bolt must be performing step 1 on all nodes before attempting step 2). But to fail in this manner so that there is no any indication at all in the logs that all subsequent steps have been skipped due to a failure on a single target is a serious failing. |
Everything appears to be working as intended. In a plan the default behavior is to stop execution on failure (regardless of whether its yaml or puppet). We have the concept of
In both cases plan execution halts and the result of the plan is "finished" with an exit code of 1 for the plan run. This is documented herre https://www.puppet.com/docs/bolt/latest/writing_yaml_plans.html#steps |
Hello - fully understand the above example - however I'm not sure it follows when extended to multiple targets. In my scenario, I have 98 targets. Two targets have conditions that cause a step in my plan to fail. The other 96 targets could execute the plan in its entirety. But when the step on one of the targets fails, not only do subsequent steps not get executed on that target, but they do not get executed on any targets. This seems unexpected to me. Equally unexpected is the lack of any logging information indicating why a step 2 has not been executed on a target where step 1 completed and did not throw an error. The lack of any errors regarding the 96 working targets suggested, without further information, that other steps ran. Very unexpected that no further steps ran on 'good' targets. E.g if the targets are node1, node 2 and node3.
Because this will fail on node1 step two isn't ever executed on any of the targets and there's no indication from the logs as to why or even that it hasn't, even though - to my mind - it should expect it to work on two out of the three targets. Or at least provide some hint that it's not even trying. A log entry that step two has been skipped due to failure of step 1 (possibly on a different node) would be helpful. Hope that makes sense. |
Hello - just to clarify - is what I have described above the intended behaviour of bolt? |
Yes this is the intended behavior. A step fails if any target fails. If you want to add logic for retry or proceeding only on targets in which previous steps have succeeded then you can do that. When you say there is no logging... I'm not seeing that behavior, both the CLI output and bolt-debug.log with all defaults show the failed step. |
I assumed, given there's nothing I could find clear in the documentation about this scenario, that there be some sort of messaging explaining why steps beyond the failed step didn't execute on any of the nodes where the previous step succeeded. I was expecting steps in a plan to be executed on nodes where there are no errors in prior steps so the total lack of any output regarding the steps which could have ran but were never tried was very surprising. That particular behaviour has encouraged me to look other tooling since this behaviour will lead to all of our nodes being in an inconsistent state with a partially executed plan on all nodes even if a failure occurs on one. We'll either switch to some other tool or instead write a wrapper script to execute bolt on one target at a time to make it behave more safely. Thanks, I'll close this. |
Describe the Bug
I have a simple plan that does two downloads from targets. This plan works as expected when I list two targets by hostname when invoking the plan. If I provide a list of targets, via -t @list.txt , then only the first download step appears to run. With log-level set to trace, there's no indication that anything is attempting to execute the second step. I have probably missed something obvious and stupid as I am new to bolt, but cannot see what.
Expected Behavior
The second step in my plan should execute.
Steps to Reproduce
My plan is
The works as I would expect if I use
But only produces any output, only downloads (and completes successfully) performing the first step regarding the .txt file if I use a target file - a flat text file of hostnames, one per line, called with -t@list
The output of trace log does not contain the text 'json' at all.
Environment
Bolt 3.29.0 on Ubuntu mint.
The text was updated successfully, but these errors were encountered: