improve Coordinated Omission documentation and debug output #300

jeremyandrews · 2021-07-02T11:28:54Z

Always log the GooseUser thread's cadence (the time it takes to run through all GooseTasks) in the requests file (if enabled)

Generate an INFO level log message if Coordinated Omission detects an abnormally slow request, for example:

10:10:02 [INFO] coordinated omission alert 6.957s into goose attack: "GET http://apache/node/8848" [200] took abnormally long (2932 ms), task name: "(Anon) node page"

update the README with improved documenation

LionsAd · 2021-07-02T11:44:49Z

I would add how long Goose expects it to take not just the abnormally long (42 seconds) normal is 13 seconds. etc.

jeremyandrews · 2021-07-02T12:03:46Z

It doesn’t actually know this. It’s comparing an individual request against the entire time to run ALL requests. (There’s logic in git history to track all requests, but it’s too much overhead).

jeremyandrews · 2021-07-06T18:24:50Z

These are the requests that will trigger Coordinated Omission Mitigation, but they are not themselves generated by Coordinated Omission Mitigation.

In other words, these are actual requests that were made to the server, and report the actual time it took. You'd have to look further in the requests file to find the subsequent back-filled "requests" generated by Coordinated Omission Mitigation. (But there's not a 1:1 match, as Goose is monitoring the per GooseUser cadence, not the per request cadence.)

bbros-dev · 2021-07-07T08:23:16Z

Okay. This sort of spelunking and cross correlating just to get data to analyze if what issue #305 aims to eliminate.

bbros-dev · 2021-07-06T14:59:23Z

README.md

@@ -527,9 +527,27 @@ If something causes the response to a request to take abnormally long, raw Goose

 ### Mitigation

-Goose attempts to mitigate Coordinated Omission by back-filling the metrics with the statistically expected requests. To do this, it tracks the normal "cadence" of each `GooseUser`, timing how long it takes to loop through all `GooseTasks` in the assigned `GooseTaskSet`. By default, Goose will trigger Coordinated Omission Mitigation if the time to loop through a `GooseTaskSet` takes more than twice as long as the average time of all previous loops. In this case, on the next loop through the `GooseTaskSet` when tracking the actual metrics for each subsequent request in all `GooseTasks` it will also add in statistically generated "requests" with a `response_time` starting at the unexpectedly long request time, then again with that `response_time` minus the normal "cadence", continuing to generate a metric then subtract the normal "cadence" until arriving at the expected "respone_time". In this way, Goose is able to estimate the actual effect of a slowdown.
+Goose attempts to mitigate Coordinated Omission by back-filling the metrics with the statistically expected requests. To do this, it tracks the normal "cadence" of each `GooseUser`, timing how long it takes to loop through all `GooseTasks` in the assigned `GooseTaskSet`. By default, Goose will trigger Coordinated Omission Mitigation if the time to loop through a `GooseTaskSet` takes more than twice as long as the average time of all previous loops. In this case, on the next loop through the `GooseTaskSet` when tracking the actual metrics for each subsequent request in all `GooseTasks` it will also add in statistically generated "requests" with a `response_time` starting at the unexpectedly long request time, then again with that `response_time` minus the normal "cadence", continuing to generate a metric then subtract the normal "cadence" until arriving at the expected `response_time`. In this way, Goose is able to estimate the actual effect of a slowdown.


takes more than twice as long as the average time of all previous loops

I expected this to refer to the configured CO threshold, minimum, average, maximum?

bbros-dev · 2021-07-06T15:11:32Z

README.md


-If the `--requests-file` is enabled, requests back-filled by Coordinated Omission Mitigation show up in the generated log file, even though they were not actually sent to the server. In the following example, Coordinated Omission Mitigation was triggered when the server took 11,965 milliseconds to loop through all requests, instead of the average cadence of 3,162 milliseconds. This causes it to backfill a block of requests that statistically should have happened, with a `response_time` decreasing by the expected request cadence.


3,162 milliseconds

Typo? 3,161 milliseconds

Jeremy Andrews added 2 commits July 2, 2021 09:12

info level message if request likely triggered CO

9e3305f

improve reporting and consistency

56d0dd6

jeremyandrews merged commit a15914e into tag1consulting:main Jul 2, 2021

jeremyandrews deleted the co-info branch July 3, 2021 05:40

bbros-dev reviewed Jul 8, 2021

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

improve Coordinated Omission documentation and debug output #300

improve Coordinated Omission documentation and debug output #300

jeremyandrews commented Jul 2, 2021

LionsAd commented Jul 2, 2021

jeremyandrews commented Jul 2, 2021

jeremyandrews commented Jul 6, 2021

bbros-dev commented Jul 7, 2021

bbros-dev Jul 6, 2021

bbros-dev Jul 6, 2021


		If the `--requests-file` is enabled, requests back-filled by Coordinated Omission Mitigation show up in the generated log file, even though they were not actually sent to the server. In the following example, Coordinated Omission Mitigation was triggered when the server took 11,965 milliseconds to loop through all requests, instead of the average cadence of 3,162 milliseconds. This causes it to backfill a block of requests that statistically should have happened, with a `response_time` decreasing by the expected request cadence.

improve Coordinated Omission documentation and debug output #300

improve Coordinated Omission documentation and debug output #300

Conversation

jeremyandrews commented Jul 2, 2021

LionsAd commented Jul 2, 2021

jeremyandrews commented Jul 2, 2021

jeremyandrews commented Jul 6, 2021

bbros-dev commented Jul 7, 2021

bbros-dev Jul 6, 2021

Choose a reason for hiding this comment

bbros-dev Jul 6, 2021

Choose a reason for hiding this comment