-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Units stuck in reinitialising replica
and awaiting for cluster to start
#684
Comments
Thank you for reporting us your feedback! The internal ticket has been created: https://warthogs.atlassian.net/browse/DPE-5335.
|
Hi, @kelkawi-a! Do you know if the cluster was restarted or upgraded in some way? I see the following hook being fired on Unit 1 in the logs that you shared: 03 Sep 2024 15:27:45Z juju-unit executing running upgrade-charm hook Could you share some logs from Unit 1 so we can understand what's happening? juju show-unit postgresql-k8s/1
juju ssh --container postgresql postgresql-k8s/1 pebble services
juju ssh --container charm postgresql-k8s/1 curl localhost:8008/cluster
juju ssh --container charm postgresql-k8s/0 curl localhost:8008/cluster
juju ssh --container charm postgresql-k8s/0 curl localhost:8008/history
juju ssh --container postgresql postgresql-k8s/1 cat /var/log/postgresql/patroni.log /var/log/postgresql/patroni.log.1 /var/log/postgresql/patroni.log.2
juju ssh --container postgresql postgresql-k8s/1 "find /var/log/postgresql/ -name postgresql*.log -not -empty -exec ls {} \; -exec cat {} \;" If you're using TLS, you should use The following error on Unit 2 has been fixed in revisions 332 and 333 from the PermissionError: [Errno 13] Permission denied: '/var/lib/postgresql/data/pgdata' Right now, to fix Unit 2, you can run the following command: juju ssh --container postgresql postgresql-k8s/2 chown postgres:postgres /var/lib/postgresql/data |
@marceloneppel thanks for investigating. The cluster is not managed by our team so I don't have visibility on whether or not the cluster was restarted. Below are the requested logs:
|
Thanks for the details, @kelkawi-a! Do you still have the Juju debug logs that show something (like the stack trace) from the errors shown in the Unit 1 status log? I mean, the errors in the start and update-status hooks. Those will be useful to understand what happened before the unit reached its current state. Do you know if there are a lot of clients connecting to the database, especially through the read-only endpoints (replicas)? If so, we can try to stop the PostgreSQL service in the replica by issuing the following command. juju ssh --container postgresql postgresql-k8s/1 pebble stop postgresql Then, after some seconds, we can start it again to see if it starts correctly. juju ssh --container postgresql postgresql-k8s/1 pebble start postgresql Also, did the |
Unfortunately I don't have visibility on the logs that far back. Since this issue came up. Since reporting this initial bug, the units have re-configured themselves as follows:
Note: the I can confirm that there is 5 applications (3 units each) connecting to the I've sent you an invite to try and debug this live on the environment if possible. |
Steps to reproduce
Expected behavior
The units remain in an active state
Actual behavior
After running fine for a while (i.e. all three units were active and functional", two of the three units became stuck in a waiting/maintenance state with the following status:
Versions
Operating system: Ubuntu 22.04.4 LTS
Juju CLI: 3.5.3-ubuntu-amd64
Juju agent: 3.5.3
Charm revision: 281, channel 14/stable
kubectl:
Client Version: v1.30.4
Server Version: v1.26.15
Log output
Juju debug log:
Output of
juju debug-log --include postgresql-k8s/<unit_number>
:postgresql-1.log
postgresql-2.log
Output of
juju show-status-log
of unit 1:Output of
juju show-status-log
of unit 2:Patroni logs:
Unit 1:
Unit 2:
The text was updated successfully, but these errors were encountered: