-
Notifications
You must be signed in to change notification settings - Fork 71
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
improve Coordinated Omission documentation and debug output #300
Conversation
I would add how long Goose expects it to take not just the abnormally long (42 seconds) normal is 13 seconds. etc. |
It doesn’t actually know this. It’s comparing an individual request against the entire time to run ALL requests. (There’s logic in git history to track all requests, but it’s too much overhead). |
These are the requests that will trigger Coordinated Omission Mitigation, but they are not themselves generated by Coordinated Omission Mitigation. In other words, these are actual requests that were made to the server, and report the actual time it took. You'd have to look further in the requests file to find the subsequent back-filled "requests" generated by Coordinated Omission Mitigation. (But there's not a 1:1 match, as Goose is monitoring the per GooseUser cadence, not the per request cadence.) |
Okay. This sort of spelunking and cross correlating just to get data to analyze if what issue #305 aims to eliminate. |
@@ -527,9 +527,27 @@ If something causes the response to a request to take abnormally long, raw Goose | |||
|
|||
### Mitigation | |||
|
|||
Goose attempts to mitigate Coordinated Omission by back-filling the metrics with the statistically expected requests. To do this, it tracks the normal "cadence" of each `GooseUser`, timing how long it takes to loop through all `GooseTasks` in the assigned `GooseTaskSet`. By default, Goose will trigger Coordinated Omission Mitigation if the time to loop through a `GooseTaskSet` takes more than twice as long as the average time of all previous loops. In this case, on the next loop through the `GooseTaskSet` when tracking the actual metrics for each subsequent request in all `GooseTasks` it will also add in statistically generated "requests" with a `response_time` starting at the unexpectedly long request time, then again with that `response_time` minus the normal "cadence", continuing to generate a metric then subtract the normal "cadence" until arriving at the expected "respone_time". In this way, Goose is able to estimate the actual effect of a slowdown. | |||
Goose attempts to mitigate Coordinated Omission by back-filling the metrics with the statistically expected requests. To do this, it tracks the normal "cadence" of each `GooseUser`, timing how long it takes to loop through all `GooseTasks` in the assigned `GooseTaskSet`. By default, Goose will trigger Coordinated Omission Mitigation if the time to loop through a `GooseTaskSet` takes more than twice as long as the average time of all previous loops. In this case, on the next loop through the `GooseTaskSet` when tracking the actual metrics for each subsequent request in all `GooseTasks` it will also add in statistically generated "requests" with a `response_time` starting at the unexpectedly long request time, then again with that `response_time` minus the normal "cadence", continuing to generate a metric then subtract the normal "cadence" until arriving at the expected `response_time`. In this way, Goose is able to estimate the actual effect of a slowdown. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
takes more than twice as long as the average time of all previous loops
I expected this to refer to the configured CO threshold, minimum, average, maximum?
|
||
If the `--requests-file` is enabled, requests back-filled by Coordinated Omission Mitigation show up in the generated log file, even though they were not actually sent to the server. In the following example, Coordinated Omission Mitigation was triggered when the server took 11,965 milliseconds to loop through all requests, instead of the average cadence of 3,162 milliseconds. This causes it to backfill a block of requests that statistically should have happened, with a `response_time` decreasing by the expected request cadence. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
3,162 milliseconds
Typo? 3,161 milliseconds
GooseUser
thread's cadence (the time it takes to run through allGooseTasks
) in the requests file (if enabled)