Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

During the Canary deployment, the Stable ReplicaSet temporarily drops to zero and then recovers, causing brief downtime. #3565

Open
2 tasks done
y0ngha opened this issue May 9, 2024 · 0 comments
Labels
bug Something isn't working

Comments

@y0ngha
Copy link

y0ngha commented May 9, 2024

Checklist:

  • I've included steps to reproduce the bug.
  • I've included the version of argo rollouts.

Describe the bug

When syncing the Rollout Object during a Canary deployment, the Stable ReplicaSet changes to 0.

To Reproduce

  1. Please set the AutoSync option in ArgoCD to False.
  2. Set up the yaml file that ArgoCD subscribes to.
  3. Write spec.replicas as 2 in the Rollout Object.
  4. Create an HPA and set the service's minReplicaCount to 3.
    : In the ArgoCD service Manifest, set RespectIgnoreDifferences to False and do not set any ignoreDifferences.
  5. Change a specific setting in the yaml file set in step 2 and upload the changes.
    : At this time, the spec.replicas in the yaml file is 2.
  6. Press Sync in ArgoCD. (prune: false, replace: false, force: false)

A problem occurs.

Expected behavior

I expected that the original Stable Replica would remain unaffected (or revert back to 2 as defined in the original yaml).

Screenshots

image

Version

quay.io/argoproj/argo-rollouts:v1.6.5

Logs

time="2024-04-26T07:39:20Z" level=info msg="Started syncing rollout" generation=655 namespace=passorder resourceVersion=151340900 rollout=point-service-production
time="2024-04-26T07:39:20Z" level=error msg="The Rollout \"point-service-production\" is invalid: spec.strategy.canary.trafficRouting.istio.virtualServices.name: Invalid value: \"point-service-production-virtualservice\": Istio VirtualService has invalid HTTP routes. Error: HTTP Route 'weighted' is not found in the defined Virtual Service." namespace=passorder rollout=point-service-production
time="2024-04-26T07:39:20Z" level=info msg="Reconciliation completed" generation=655 namespace=passorder resourceVersion=151340900 rollout=point-service-production time_ms=2.29211
time="2024-04-26T07:39:20Z" level=info msg="Started syncing rollout" generation=655 namespace=passorder resourceVersion=151340900 rollout=point-service-production
time="2024-04-26T07:39:20Z" level=info msg="Syncing replicas only due to scaling event" namespace=passorder rollout=point-service-production
time="2024-04-26T07:39:20Z" level=info msg="Enqueueing parent of passorder/point-service-production-7b74795c68: Rollout passorder/point-service-production"
time="2024-04-26T07:39:20Z" level=info msg="Event(v1.ObjectReference{Kind:\"Rollout\", Namespace:\"passorder\", Name:\"point-service-production\", UID:\"6e46b387-6798-4cdd-aa0a-3c3efeb78306\", APIVersion:\"argoproj.io/v1alpha1\", ResourceVersion:\"151340900\", FieldPath:\"\"}): type: 'Normal' reason: 'ScalingReplicaSet' Scaled down ReplicaSet point-service-production-7b74795c68 (revision 11) from 3 to 0"
time="2024-04-26T07:39:20Z" level=info msg="Scaled down ReplicaSet point-service-production-7b74795c68 (revision 11) from 3 to 0" event_reason=ScalingReplicaSet namespace=passorder rollout=point-service-production
time="2024-04-26T07:39:20Z" level=info msg="Not finished reconciling stableRS" namespace=passorder rollout=point-service-production
time="2024-04-26T07:39:20Z" level=info msg="No status changes. Skipping patch" generation=655 namespace=passorder resourceVersion=151340900 rollout=point-service-production
time="2024-04-26T07:39:20Z" level=info msg="Reconciliation completed" generation=655 namespace=passorder resourceVersion=151340900 rollout=point-service-production time_ms=19.385227
time="2024-04-26T07:39:20Z" level=info msg="Started syncing rollout" generation=655 namespace=passorder resourceVersion=151340900 rollout=point-service-production
time="2024-04-26T07:39:20Z" level=info msg="Canary steps change detected (new: 84575ff995, old: 6c94bfbdd6)" namespace=passorder rollout=point-service-production
time="2024-04-26T07:39:20Z" level=info msg="Assuming 6dd487bcb9 for new replicaset pod hash" namespace=passorder rollout=point-service-production
time="2024-04-26T07:39:20Z" level=info msg="Canary steps change detected (new: 84575ff995, old: 6c94bfbdd6)" namespace=passorder rollout=point-service-production
time="2024-04-26T07:39:20Z" level=info msg="Rollout not completed, started update to revision 12 (6dd487bcb9)" event_reason=RolloutNotCompleted namespace=passorder rollout=point-service-production
time="2024-04-26T07:39:20Z" level=info msg="Event(v1.ObjectReference{Kind:\"Rollout\", Namespace:\"passorder\", Name:\"point-service-production\", UID:\"6e46b387-6798-4cdd-aa0a-3c3efeb78306\", APIVersion:\"argoproj.io/v1alpha1\", ResourceVersion:\"151340900\", FieldPath:\"\"}): type: 'Normal' reason: 'RolloutNotCompleted' Rollout not completed, started update to revision 12 (6dd487bcb9)"
time="2024-04-26T07:39:20Z" level=info msg="Patched: {\"status\":{\"conditions\":[{\"lastTransitionTime\":\"2024-04-02T12:02:32Z\",\"lastUpdateTime\":\"2024-04-02T12:02:32Z\",\"message\":\"Rollout is paused\",\"reason\":\"RolloutPaused\",\"status\":\"False\",\"type\":\"Paused\"},{\"lastTransitionTime\":\"2024-04-26T05:45:47Z\",\"lastUpdateTime\":\"2024-04-26T05:45:47Z\",\"message\":\"Rollout has minimum availability\",\"reason\":\"AvailableReason\",\"status\":\"True\",\"type\":\"Available\"},{\"lastTransitionTime\":\"2024-04-26T07:39:20Z\",\"lastUpdateTime\":\"2024-04-26T07:39:20Z\",\"message\":\"Rollout is not healthy\",\"reason\":\"RolloutHealthy\",\"status\":\"False\",\"type\":\"Healthy\"},{\"lastTransitionTime\":\"2024-04-02T12:02:32Z\",\"lastUpdateTime\":\"2024-04-26T07:39:20Z\",\"message\":\"Rollout \\\"point-service-production\\\" is progressing.\",\"reason\":\"ReplicaSetUpdated\",\"status\":\"True\",\"type\":\"Progressing\"},{\"lastTransitionTime\":\"2024-04-26T07:39:20Z\",\"lastUpdateTime\":\"2024-04-26T07:39:20Z\",\"message\":\"RolloutCompleted\",\"reason\":\"RolloutCompleted\",\"status\":\"False\",\"type\":\"Completed\"}],\"currentPodHash\":\"6dd487bcb9\",\"currentStepHash\":\"84575ff995\",\"currentStepIndex\":0,\"message\":\"more replicas need to be updated\",\"phase\":\"Progressing\",\"updatedReplicas\":null}}" generation=655 namespace=passorder resourceVersion=151340900 rollout=point-service-production
time="2024-04-26T07:39:20Z" level=info msg="persisted to informer" generation=655 namespace=passorder resourceVersion=151340905 rollout=point-service-production

Message from the maintainers:

Impacted by this bug? Give it a 👍. We prioritize the issues with the most 👍.

The current solution

When performing a Sync in the ArgoCD service, I set RespectIgnoreDifferences to True and defined the following values in ignoreDifferences:

ignoreDifferences:
  - group: apps
    kind: Deployment
    jsonPointers:
      - /spec/replicas
  - group: argoproj.io
    kind: Rollout
    jsonPointers:
      - /spec/replicas
  - group: autoscaling
    kind: HorizontalPodAutoscaler
    jsonPointers:
      - /spec/minReplicas
      - /spec/maxReplicas

This eliminates any service downtime.

Ref
Github Issue: #3543
Slack: https://cloud-native.slack.com/archives/C01U781DW2E/p1714353130134499

@y0ngha y0ngha added the bug Something isn't working label May 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant