New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Config validation for approver operate permissions can block all new config changes #11497
Comments
My gut feeling is that this should not be a fatal validation error for the reason you note. A previously valid config shouldn't be able to become invalid independently. It'd be better if there was a way to communicate 'likely pipeline misconfiguration' warnings like this independent of failure but that's likely a much bigger challenge. On the other hand I haven't thought through the consequences of this in any detail, so would need more analysis. |
I believe the problem exists in the "Step 6" of removing the role from the user. Ideally, GoCD should not allow removal of the role from the user, as this role removal will lead to the mentioned error. To fix this particular problem, I believe, we can add a specific validation while roles are being updated to make sure that config is valid. |
Thanks @GaneshSPatil With an external rbac system you kind of have to let that happen or assume it will happen (either due to some kind of 'use it or lose it' policy or perhaps even an employee/user leaving) My perspective is that with these types of things I think one probably has to ensure it does the right kind of thing at runtime (if the user logs in they can't actually do anything which likely already works so it 'fails safe') but generally expect those with permissions to commit to the config repository or overall admins to correct the problem when detected rather than do this kind of 'complete' validation. |
I've started to run into this issue as well. Surprisingly, the config repo containing the pipeline with the problematic role isn't the repo GoCD is erroring on, instead it's failing on 2 other config repos, causing both to update. Fixing the role issue doesn't seem to have an immediate affect either, so I'm in a limbo state. |
I looked into this a bit more:
So I have two simpler proposals:
|
My inclination is towards option 1 rather than adding complexity, but I also haven't thought through in much detail what this might mean and a bit worried about unintended consequences and making it harder to configure things correctly. Mitigation would be to ensure the error messages are good when there is a missing permission, and/or ensuring something on the UI or other config repo plugin docs highlight the need for the operate permission when configuring stage approval. |
Issue Type
Summary
The config validation which checks that a stage approver has operate permissions on the pipeline group can cause a previously valid config to become invalid, which prevents any further changes of the config. When this happens, changes from any config repo (even without any approver lists) are rejected by GoCD, effectively blocking all new changes until the validation problem (which only affects a specific config repo) is resolved.
Basic environment details
Steps to Reproduce
Expected Results
The change from step 7 is valid and merged in the config, since repoB doesn't even set any approval lists.
Actual Results
The change fails to merge and we get an error for repoB that is actually about repoA:
Any other info
In my rudimentary understanding, the core problem is that a previously valid config can become invalid due to "outside" circumstances (removing the role from a user, which could also happen in e.g. LDAP). When this happens, any new change from other config repos will make GoCD perform a validation of the existing config, which fails due to these circumstances.
I'm not sure how this could be fixed - the config merge & validation logic looks quite complex. The simplest solution would be dropping that approver-has-operate-permission validation. It shouldn't cause security problems, since - well - the user doesn't have operate permissions anyway, but it makes it harder to find subtle misconfiguration problems, I suppose.
The current situation is regularly causing problems for our instance, as we started implementing stricter permission handling. Since we have hundreds of config repos managed by separate teams it's not as simple as quickly fixing the config to remove the faulty approver entry. It's made even worse when using and referencing "ephemeral" roles like LDAP or SAML roles, which users can intermittently lose until they log in again. This (among other reasons) forced us to switch to using only local GoCD roles and implementing our own role sync mechanism.
The text was updated successfully, but these errors were encountered: