Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Flaky Test] pull-gardener-extension-shoot-rsyslog-relp-e2e-kind due to logs still getting sent after disabling the extension #198

Open
plkokanov opened this issue Nov 21, 2024 · 0 comments
Labels
area/testing Testing related kind/flake Tracking or fixing a flaky test priority/3 Priority (lower number equals higher priority)

Comments

@plkokanov
Copy link
Collaborator

How to categorize this issue?

/area testing
/kind flake

Which test(s)/suite(s) are flaking:
The Shoot Rsyslog Relp Extension Tests shoot-rsyslog-relp extension with tls and openssl enabled [It] Create Shoot, enable shoot-rsyslog-relp extension then disable it and delete Shoot [tls-enabled] part of pull-gardener-extension-shoot-rsyslog-relp-e2e-kind test fails with the following error:

CI link:
https://prow.gardener.cloud/view/gs/gardener-prow/pr-logs/pull/gardener_gardener-extension-shoot-rsyslog-relp/193/pull-gardener-extension-shoot-rsyslog-relp-e2e-kind/1859607390651944960

Reason for failure:

Shoot Rsyslog Relp Extension Tests shoot-rsyslog-relp extension with tls and openssl enabled [It] Create Shoot, enable shoot-rsyslog-relp extension then disable it and delete Shoot [tls-enabled]
...
...
  [FAILED] Timed out after 30.001s.
  Expected to successfully generate logs for node machine-shoot--local--e2e-rslog-tls-local-68576-blcwr and logs to NOT be present in rsyslog-relp-echo-server
  The function passed to Eventually failed at /home/prow/go/src/github.com/gardener/gardener-extension-shoot-rsyslog-relp/test/common/verifier.go:176 with:
  Expected
      <[]string | len:2, cap:2>: [
          " local e2e-rslog-hib f01243ba-7ae8-420b-ac59-70d0664ac13d machine-shoot--local--e2e-rslog-hib-local-7bcd6-pvgzm 9 test-program[11541]: 2024-11-21T15:11:46.977421+00:00 11541 -  this should get sent to echo server ",
          " -- MARK --",
      ]

The log " local e2e-rslog-hib f01243ba-7ae8-420b-ac59-70d0664ac13d machine-shoot--local--e2e-rslog-hib-local-7bcd6-pvgzm 9 test-program[11541]: 2024-11-21T15:11:46.977421+00:00 11541 - this should get sent to echo server " should be sent when checking if the rsyslog-relp service is properly configured on the shoot's nodes:

v.verifyLogsAreForwardedToEchoServer(
ctx,
logEntry{program: "test-program", severity: "1", message: "this should get sent to echo server", shouldBeForwarded: true},
logEntry{program: "test-program", severity: "3", message: "this should not get sent to echo server", shouldBeForwarded: false},
logEntry{program: "other-program", severity: "1", message: "this should not get sent to echo server", shouldBeForwarded: false},
)

However, it appears to have been either delayed or resent after the extension was disabled, and was included in the logs fetched during the verification that no logs should be sent:

func (v *Verifier) verifyThatLogsAreNotForwardedToEchoServer(ctx context.Context, logEntries ...logEntry) {
timeBeforeLogGeneration := metav1.Now()
EventuallyWithOffset(2, func() error {
return v.generateLogs(ctx, logEntries)
}).WithTimeout(30*time.Second).WithPolling(10*time.Second).WithContext(ctx).Should(Succeed(), fmt.Sprintf("Expected to successfully generate logs for node %s", v.nodeName))
By("Wait 30 seconds before checking for logs")
timer := time.NewTimer(30 * time.Second)
select {
case <-ctx.Done():
Fail("context deadline exceeded while waiting to check for logs")
case <-timer.C:
}
By("Verify that there are no logs")
EventuallyWithOffset(2, func(g Gomega) {
logLines, err := v.getLogs(ctx, timeBeforeLogGeneration)
g.Expect(err).NotTo(HaveOccurred())
// Rsyslog outputs a "-- MARK --" message as a form of heartbeat each 1200 seconds (by default) to indicate that it is working properly.
// This log comes from the rsyslog server itself and must be ignored when checking if there were no logs sent from the rsyslog clients.
// ContainSubstring is used as there could be spaces at the start or end of the log message depending on the template with which the
// rsyslog server is configured.
g.Expect(logLines).To(Or(BeEmpty(), ConsistOf(ContainSubstring("-- MARK --"))))
}).WithTimeout(30*time.Second).WithPolling(10*time.Second).WithContext(ctx).Should(Succeed(), fmt.Sprintf("Expected to successfully generate logs for node %s and logs to NOT be present in rsyslog-relp-echo-server", v.nodeName))
}

Checking the test logs, the extension gets disabled at 15:10:45.742, however the time of this log entry is from 15:11:46.977

  STEP: Disable the shoot-rsyslog-relp extension @ 11/21/24 15:10:45.742
  {"level":"info","ts":"2024-11-21T15:10:45.777Z","logger":"test","msg":"Shoot is not yet reconciled","shoot":{"name":"e2e-rslog-tls","namespace":"garden-local"},"reason":"shoot generation did not equal observed generation"}
  {"level":"info","ts":"2024-11-21T15:10:45.777Z","logger":"test","msg":"Last Operation","shoot":{"name":"e2e-rslog-tls","namespace":"garden-local"},"lastOperation":"&LastOperation{Description:Shoot cluster has been successfully reconciled.,LastUpdateTime:2024-11-21 15:09:58 +0000 UTC,Progress:100,State:Succeeded,Type:Reconcile,}"}
  {"level":"info","ts":"2024-11-21T15:11:15.782Z","logger":"test","msg":"Shoot is not yet reconciled","shoot":{"name":"e2e-rslog-tls","namespace":"garden-local"},"reason":"condition type ControlPlaneHealthy is not true yet, had message Deployment \"gardener-resource-manager\" is unhealthy: condition \"Available\" has invalid status False (expected True) due to MinimumReplicasUnavailable: Deployment does not have minimum availability. with reason DeploymentUnhealthy"}
  {"level":"info","ts":"2024-11-21T15:11:15.782Z","logger":"test","msg":"Last Operation","shoot":{"name":"e2e-rslog-tls","namespace":"garden-local"},"lastOperation":"&LastOperation{Description:Shoot cluster has been successfully reconciled.,LastUpdateTime:2024-11-21 15:11:12 +0000 UTC,Progress:100,State:Succeeded,Type:Reconcile,}"}
  {"level":"info","ts":"2024-11-21T15:11:45.789Z","logger":"test","msg":"Shoot was successfully updated","shoot":{"name":"e2e-rslog-tls","namespace":"garden-local"}}
  STEP: Verify that shoot-rsyslog-relp extension is disabled @ 11/21/24 15:11:45.789
  {"level":"info","ts":"2024-11-21T15:11:45.813Z","logger":"test","msg":"Waiting for Pod to be ready","pod":{"name":"rootpod-r5m","namespace":"kube-system"}}
  {"level":"info","ts":"2024-11-21T15:11:50.816Z","logger":"test","msg":"Pod is ready now","pod":{"name":"rootpod-r5m","namespace":"kube-system"}}
  STEP: Wait 30 seconds before checking for logs @ 11/21/24 15:11:51.461
  STEP: Verify that there are no logs @ 11/21/24 15:12:21.461

Anything else we need to know:
N/A

@gardener-prow gardener-prow bot added area/testing Testing related kind/flake Tracking or fixing a flaky test labels Nov 21, 2024
@plkokanov plkokanov added the priority/3 Priority (lower number equals higher priority) label Dec 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/testing Testing related kind/flake Tracking or fixing a flaky test priority/3 Priority (lower number equals higher priority)
Projects
None yet
Development

No branches or pull requests

1 participant