Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rollout stuck when a scale event happens durring a service switch delay #3412

Open
2 tasks done
BrunoTarijon opened this issue Feb 29, 2024 · 2 comments · May be fixed by #3413
Open
2 tasks done

Rollout stuck when a scale event happens durring a service switch delay #3412

BrunoTarijon opened this issue Feb 29, 2024 · 2 comments · May be fixed by #3413
Labels
bug Something isn't working

Comments

@BrunoTarijon
Copy link

BrunoTarijon commented Feb 29, 2024

Checklist:

  • I've included steps to reproduce the bug.
  • I've included the version of argo rollouts.

Describe the bug

Rollout controller in a infinity loop trying to reconcile a still referenced replicaset.

Deploy 1 -> Deploy2-> delay on service switch (pods start time) -> scale event

All reconciles in this rollout execute the isEscaleEvent, and will never make the service switch.

To Reproduce

I made this e2e test, need to use a E2E_POD_DELAY=3 for the service switch delay

func (s *CanarySuite) TestRolloutScalingWhenChanged() {
	s.Given().
		RolloutObjects(`@functional/alb-canary-rollout.yaml`).
		When().
		ApplyManifests().
		WaitForRolloutStatus("Healthy").
		UpdateSpec().
		WaitForRolloutStatus("Paused").
		UpdateSpec().
                Sleep(2*time.Second). // Should be less than E2E_POD_DELAY
		ScaleRollout(3).
                Sleep(30*time.Second). // Just to stop the execution before delete the rollout
		Then().
		ExpectRevisionPodCount("1", 3).
		ExpectRevisionPodCount("2", 0).
		ExpectRevisionPodCount("3", 1)
}

Expected behavior

The controller should be able to reconcile the rollout/services.

Version

1.6.6


Message from the maintainers:

Impacted by this bug? Give it a 👍. We prioritize the issues with the most 👍.

@BrunoTarijon BrunoTarijon added the bug Something isn't working label Feb 29, 2024
@BrunoTarijon
Copy link
Author

I made a better e2e test in the PR

@wizardist
Copy link

wizardist commented Apr 17, 2024

@BrunoTarijon is this issue accompanied by the following event in the "stable" service Endpoint?

Reason: FailedToUpdateEndpoint
Message: Failed to update endpoint ns/ep-name:
           Operation cannot be fulfilled on endpoints "ep-name":
            the object has been modified; please apply your changes to the latest version and try again

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
2 participants