Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cctray reporting old failures for stages which no longer exist on the pipeline #12006

Open
ryandutton opened this issue Sep 19, 2023 · 10 comments

Comments

@ryandutton
Copy link

ryandutton commented Sep 19, 2023

Issue Type
  • Bug Report
Summary

I believe there may be a bug in the CCtray where stages are still reported as failures where the xml includes stage failures for a pipeline which no longer has those stages. I don't know whether this is by design or whether it is a bug.

Environment
Basic environment details
  • Go Version: 23.3.0
  • JAVA Version: 17.0.8.1
  • OS: Linux 5.15.109+
Additional Environment Details
Steps to Reproduce
  1. Set a pipeline to use a particular template
  2. Pipeline fails that stage
  3. CCTray reports the stage failure in the xml
  4. Change to use a different template without the failed stages
  5. As they've never resolved, xml still reports failure
Expected Results

I would expect those stage failures to resolve

Actual Results

XML still reports stage failures 24 hours after stages have been removed from pipeline

Possible Fix

Check to see if stage still exists in pipeline before collecting status of pipeline

Any other info

To be clear I'm not sure whether this was by design or whether it's an actual issue with the CCTray implementation.

@ryandutton ryandutton changed the title CCtray reporting old failures for stages which no longer exist on the pipeline cctray reporting old failures for stages which no longer exist on the pipeline Sep 19, 2023
@chadlwilson
Copy link
Member

chadlwilson commented Sep 19, 2023

If I understand your use case correctly, it's possibly by design (although I don't understand all the logic related to how pipeline config changes are reflected in previous runs/instances). If I recall correctly, the GoCD dashboard also will still show the removed-but-previously-failed stage, until the pipeline re-triggered with the new config.

What is confusing for me (personally) is that on the dashboard and the "pipeline activity/history view", when you add a new stage, it appears even without running the pipeline, and on older runs. If you delete that same stage, it disappears from the historical runs.

Personally, I don't really understand the logic behind this as it seems to contrast with the deletion behaviour. Possibly it is that when adding a new stage, for each historical run GoCD is not sure if it expects that stage to be there, so it is in "unknown state", or "not yet run" state.

In the below, the "stage2" stage only actually existed at the time of run instance 357 (and future run 359)
image

When deleted:
image

That would probably point more to a design flaw at GoCD not capturing the "plan" for a pipeline (set of expected stages) fully at the time the instance is started, making it indeterminate when there are changes as to which stages are expected later on. If that's the case, the flaw/weakness is more in the way additions are handled, rather than a flaw in the way it works when deleting a stage on the dashboard and CCTray, but I am not sure all of the methodology behind some of these decisions as they were likely made a long time ago. :-)

@ryandutton
Copy link
Author

ryandutton commented Sep 19, 2023

If I recall correctly, the GoCD dashboard also will still show the removed-but-previously-failed stage, until the pipeline re-triggered with the new config.

Even after triggering a new pipeline run with the updated template which no longer has the failed stages, it is still reporting the failed stages in the xml. Some added context, the failed stages where in pipeline run 2029, whereas the latest run of the stage which is still present is 2036.

The configuration of a pipeline could mean that future runs succeed on a certain stage succeed but aren't promoted to the next stage resulting in a legitimate result but in our case this will never resolve.

What is confusing for me (personally) is that on the dashboard and the "pipeline activity/history view", when you add a new stage, it appears even without running the pipeline, and on older runs. If you delete that same stage, it disappears from the historical runs.

Did you add the stage to the template the pipeline uses or did you change to use a different template with that new stage?

@chadlwilson
Copy link
Member

chadlwilson commented Sep 21, 2023

Err OK.

Yeah in my case I was lazy, and just used a regular pipeline. It's possible things work differently/inconsistently with templated pipelines, so I guess I'd need to try that separately.

In your case, does the stage disappear from the GoCD dashboard and Pipeline Activity etc (especially after the re-run) but is still there in CCTray? If so, that does sound like a bug.

@ryandutton
Copy link
Author

does the stage disappear from the GoCD dashboard and Pipeline Activity

Old runs will show the stages which have ran previously, however, pipeline runs after the pipeline has been changed to use a different template do not show the stage which no longer exists on the new template. Once that stage no longer exists I'd expect it to be removed from cctray.

gocd-template-change

@chadlwilson
Copy link
Member

Yeah that seems a fair expectation.

@ryandutton
Copy link
Author

Java isn't a strong point of mine but let me know if there's anything I can help with

@chadlwilson
Copy link
Member

I don't know this area of the code, but it'll be somewhere around how these classses work with each other and the GoConfig I guess https://github.com/search?type=code&q=repo%3Agocd%2Fgocd+cctray

@ryandutton
Copy link
Author

CCTray is no longer reporting these stage failures, it cleared roughly 50 hours after the pipeline template was changed. It's possible this is somewhat related to the cache.

@chadlwilson
Copy link
Member

The cache doesn't have any time expiry associated with it, and is supposed to be notified when a config change occurs. My guess is something got missed and some other change possibly caused it to go away, perhaps completely reloading the cache due to some other config change :-/ Some changes seem likely to completely reload the cache (e.g security permission changes of some sort), some not.

Just to clarify the change made, when you say "Change to use a different template" you mean switch the template like in the below, rather than edit the underlying template's stage configuration?

image

@ryandutton
Copy link
Author

I changed to use a completely different template, the original template had 3 stages, the second template had just one stage but the first stage was identical to the first stage in the original template.

@chadlwilson chadlwilson added apis and removed FeedsAPI labels Sep 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants