You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Deploy 3 units of postgresql-k8s charm, channel 14/stable revision 281
Expected behavior
The units remain in an active state
Actual behavior
After running fine for a while (i.e. all three units were active and functional", two of the three units became stuck in a waiting/maintenance state with the following status:
There were no reported outages to the cluster or node restarts. Upon further debugging, it seems that the patroni K8s service disappeared for unknown reason. I do not have the logs for it, but during the debugging process, the cluster itself could not identify a primary, with all three units identified as replicas.
It is worth noting that as part of the recovery process, we tried to re-initialize each of the units, but could not due to the following:
Cluster has no leader, can not reinitialize
We also tried doing a failover with the following output:
root@postgresql-k8s-1:/# curl -s -k https://<unit_ip>:8008/failover -X POST -d '{"candidate":"postgresql-k8s-1"}'
failover is not possible: no good candidates have been found
Versions
Operating system: Ubuntu 22.04.4 LTS
Juju CLI: 3.6.0-ubuntu-amd64
Juju agent: 3.4.4
Charm revision: 281, channel 14/stable
kubectl:
Client Version: v1.31.3
Server Version: v1.26.15
Log output
Juju debug log:
Patroni logs:
Unit 1:
2024-12-02 06:07:30 UTC [1481760]: INFO: waiting for leader to bootstrap
2024-12-02 06:07:40 UTC [1481760]: INFO: Lock owner: None; I am postgresql-k8s-0
2024-12-02 06:07:40 UTC [1481760]: INFO: waiting for leader to bootstrap
2024-12-02 06:07:00 UTC [1481760]: INFO: Lock owner: None; I am postgresql-k8s-0
2024-12-02 06:07:00 UTC [1481760]: INFO: waiting for leader to bootstrap
2024-12-02 06:07:10 UTC [1481760]: INFO: Lock owner: None; I am postgresql-k8s-0
2024-12-02 06:07:10 UTC [1481760]: INFO: waiting for leader to bootstrap
2024-12-02 06:07:20 UTC [1481760]: INFO: Lock owner: None; I am postgresql-k8s-0
2024-12-02 06:07:20 UTC [1481760]: INFO: waiting for leader to bootstrap
2024-12-02 06:07:30 UTC [1481760]: INFO: Lock owner: None; I am postgresql-k8s-0
2024-12-02 06:06:20 UTC [1481760]: INFO: waiting for leader to bootstrap
2024-12-02 06:06:30 UTC [1481760]: INFO: Lock owner: None; I am postgresql-k8s-0
2024-12-02 06:06:30 UTC [1481760]: INFO: waiting for leader to bootstrap
2024-12-02 06:06:40 UTC [1481760]: INFO: Lock owner: None; I am postgresql-k8s-0
2024-12-02 06:06:40 UTC [1481760]: INFO: waiting for leader to bootstrap
2024-12-02 06:06:50 UTC [1481760]: INFO: Lock owner: None; I am postgresql-k8s-0
2024-12-02 06:06:50 UTC [1481760]: INFO: waiting for leader to bootstrap
Unit 2:
2024-12-02 06:07:43 UTC [1463028]: WARNING: Failed to determine PostgreSQL state from the connection, falling back to cached role
2024-12-02 06:07:43 UTC [1463028]: INFO: Error communicating with PostgreSQL. Will try again later
2024-12-02 06:07:43 UTC [1463028]: INFO: Lock owner: None; I am postgresql-k8s-1
2024-12-02 06:07:43 UTC [1463028]: INFO: Still starting up as a standby.
2024-12-02 06:07:43 UTC [1463028]: INFO: establishing a new patroni connection to the postgres cluster
2024-12-02 06:07:43 UTC [1463028]: INFO: establishing a new patroni connection to the postgres cluster
2024-12-02 06:07:43 UTC [1463028]: WARNING: Retry got exception: connection problems
2024-12-02 06:07:43 UTC [1463028]: WARNING: Failed to determine PostgreSQL state from the connection, falling back to cached role
2024-12-02 06:07:13 UTC [1463028]: INFO: restarting after failure in progress
2024-12-02 06:07:23 UTC [1463028]: INFO: Lock owner: None; I am postgresql-k8s-1
2024-12-02 06:07:23 UTC [1463028]: INFO: not healthy enough for leader race
2024-12-02 06:07:23 UTC [1463028]: INFO: restarting after failure in progress
2024-12-02 06:07:33 UTC [1463028]: INFO: Lock owner: None; I am postgresql-k8s-1
2024-12-02 06:07:33 UTC [1463028]: INFO: not healthy enough for leader race
2024-12-02 06:07:33 UTC [1463028]: INFO: restarting after failure in progress
Unit 3:
2024-12-02 06:07:30 UTC [534630]: INFO: Lock owner: None; I am postgresql-k8s-2
2024-12-02 06:07:30 UTC [534630]: INFO: waiting for leader to bootstrap
2024-12-02 06:07:40 UTC [534630]: INFO: Lock owner: None; I am postgresql-k8s-2
2024-12-02 06:07:40 UTC [534630]: INFO: waiting for leader to bootstrap
2024-12-02 06:06:50 UTC [534630]: INFO: waiting for leader to bootstrap
2024-12-02 06:07:00 UTC [534630]: INFO: Lock owner: None; I am postgresql-k8s-2
2024-12-02 06:07:00 UTC [534630]: INFO: waiting for leader to bootstrap
2024-12-02 06:07:10 UTC [534630]: INFO: Lock owner: None; I am postgresql-k8s-2
2024-12-02 06:07:10 UTC [534630]: INFO: waiting for leader to bootstrap
2024-12-02 06:07:20 UTC [534630]: INFO: Lock owner: None; I am postgresql-k8s-2
2024-12-02 06:07:20 UTC [534630]: INFO: waiting for leader to bootstrap
2024-12-02 06:06:20 UTC [534630]: INFO: Lock owner: None; I am postgresql-k8s-2
2024-12-02 06:06:20 UTC [534630]: INFO: waiting for leader to bootstrap
2024-12-02 06:06:30 UTC [534630]: INFO: Lock owner: None; I am postgresql-k8s-2
2024-12-02 06:06:30 UTC [534630]: INFO: waiting for leader to bootstrap
2024-12-02 06:06:40 UTC [534630]: INFO: Lock owner: None; I am postgresql-k8s-2
2024-12-02 06:06:40 UTC [534630]: INFO: waiting for leader to bootstrap
2024-12-02 06:06:50 UTC [534630]: INFO: Lock owner: None; I am postgresql-k8s-2
The text was updated successfully, but these errors were encountered:
Steps to reproduce
Expected behavior
The units remain in an active state
Actual behavior
After running fine for a while (i.e. all three units were active and functional", two of the three units became stuck in a waiting/maintenance state with the following status:
There were no reported outages to the cluster or node restarts. Upon further debugging, it seems that the patroni K8s service disappeared for unknown reason. I do not have the logs for it, but during the debugging process, the cluster itself could not identify a primary, with all three units identified as replicas.
It is worth noting that as part of the recovery process, we tried to re-initialize each of the units, but could not due to the following:
We also tried doing a failover with the following output:
Versions
Operating system: Ubuntu 22.04.4 LTS
Juju CLI: 3.6.0-ubuntu-amd64
Juju agent: 3.4.4
Charm revision: 281, channel 14/stable
kubectl:
Client Version: v1.31.3
Server Version: v1.26.15
Log output
Juju debug log:
Patroni logs:
Unit 1:
Unit 2:
Unit 3:
The text was updated successfully, but these errors were encountered: