-
Notifications
You must be signed in to change notification settings - Fork 999
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clusters became broken after upgrading to 1.14.0 #2852
Comments
Downgrading to 1.11.0 resolve my issues |
Update.
After killing failed pod it booted up OK:
logs:
After one more restart it hangs early:
So, eventually, after series of pod restarts, all cluster will be dead. Reverted to 1.13.0 |
@FxKu may I have your attention? something weird happens here |
First of all, sorry for long logs and unstructured message. To write clean issue you have to have at least some understanding of what happens, but I have no idea yet. I read release notes on 1.12, 1.13 and 1.14 and descide I can upgrade stright to 1.14.0. But...
After upgrading postgres-operator 1.11.0 to 1.14.0 my clusters won't startup:
3 clusters successfully started with updated spilo image (
payments-pg
,asana-automate-db
anddevelop-postgresql
) and 2 - not (brandadmin-pg
andgames-aggregator-pg
). Before I noticed not clusters are updated, I initialized upgrade 16 -> 17 on clusterdevelop-postgresql
and it stuck with same symptoms (at first I thought it is this reason, but now I don't thinks so, see below):and no more logs.
Some clusters managed to start there is same error:
After I delete this pod it stuck too!
Processes inside of failed clusters:
After one more deletion it is managed to start.
I notice one thing in the logs - sometimes container starts with WAL-E variables, sometimes - not. Operator shows its status as OK, but it's not:
While I wrote this issue passed like an hour or so, in despair I restarted this failed pod one more time and it STARTED (container
postgres
becameReady
), but still not working:All my clusters consisting of two nodes can't start replica node: Probably problem is with WAL variables...
It's complete mess!
Operator installed with Helm and terraform. Configured with ConfigMap:
The text was updated successfully, but these errors were encountered: