-
Notifications
You must be signed in to change notification settings - Fork 192
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Upgrade doesn't work on EKS #2197
Comments
I just had this happen to me in production yesterday, breaking everything for 1h30m until I finished switching back to AWS VPC CNI. I even tried to revert the changes to the When I list the Cilium Helm values for the affected production k8s cluster (using {
"affinity": {
"nodeAffinity": {
"requiredDuringSchedulingIgnoredDuringExecution": {
"nodeSelectorTerms": [
{
"matchExpressions": [
{
"key": "io.cilium/aws-node-enabled",
"operator": "NotIn",
"values": [
"true"
]
}
]
}
]
}
}
},
"eni": {
"awsEnablePrefixDelegation": true
},
"updateStrategy": {
"type": "OnDelete"
}
} But when I list the values for an unaffected staging cluster created with nearly-identical Helm values, I get {
"affinity": {
"nodeAffinity": {
"requiredDuringSchedulingIgnoredDuringExecution": {
"nodeSelectorTerms": [
{
"matchExpressions": [
{
"key": "io.cilium/aws-node-enabled",
"operator": "NotIn",
"values": [
"true"
]
}
]
}
]
}
}
},
"cluster": {
"name": "<cluster name>"
},
"egressMasqueradeInterfaces": "eth0",
"eni": {
"awsEnablePrefixDelegation": true,
"enabled": true
},
"hubble": {
"relay": {
"enabled": true
},
"ui": {
"enabled": true
}
},
"ipam": {
"mode": "eni"
},
"operator": {
"replicas": 1
},
"routingMode": "native",
"serviceAccounts": {
"cilium": {
"name": "cilium"
},
"operator": {
"name": "cilium-operator"
}
}
} In both clusters, I've run the commands:
But this is the command that broke my production cluster while I was trying to enable LocalRedirectPolicy:
I ran this a few different times with slightly different settings, resulting in the addition of Just for the sake of completeness, here's the final version of that eni:
awsEnablePrefixDelegation: true
affinity:
nodeAffinity: # added to prevent conflicts with aws-node
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: io.cilium/aws-node-enabled
operator: NotIn
values:
- 'true'
# localRedirectPolicy: true
# rollOutCiliumPods: true
updateStrategy:
type: OnDelete I've never set most of the Helm values that ended up in the staging cluster, they were applied automatically by |
Installed Cilium with CLI:
It auto-detected that installing on EKS:
Then decided to upgrade with:
However, after the upgrade, the ipam / endoint-routes / egress-masquerade-interface / routing-mode / etc values were set to defaults (i.e., cluster-pool, disabled, nil, tunnel, etc), which broke the cluster.
The text was updated successfully, but these errors were encountered: