You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Today we had a production outage of a couple minutes, suspectedly because our stolon cluster did not failover correctly.
I manually fixed the problem by restarting a keeper, but another keeper is still in an unhealthy state from the cluster's perspective, while the keeper itself seems to think that it is fine.
stolonctl status shows keeper node_6 as HEALTHY = false, PG HEALTHY = false, and PG LISTENADDRESS = (unknown):
What happened:
Today we had a production outage of a couple minutes, suspectedly because our stolon cluster did not failover correctly.
I manually fixed the problem by restarting a keeper, but another keeper is still in an unhealthy state from the cluster's perspective, while the keeper itself seems to think that it is fine.
stolonctl status
shows keepernode_6
asHEALTHY = false
,PG HEALTHY = false
, andPG LISTENADDRESS = (unknown)
:Click to expand full `stolonctl status` output
For reading convenience,
Keepers
formatted nicely:Click to expand full `stolonctl spec` output
Click to expand full `stolonctl clusterdata read | jq .` output
Relevant here is
The
stolon-keeper
seems to be running fine according tosystemctl status stolon-keeper.service
:Its logs show some earlier errors while
pg_rewind
was copying data over, but no indication that anything failed permanently:Relevant sections:
What you expected to happen:
stolonctl status
finds that the keeper is up and working, orHow to reproduce it (as minimally and precisely as possible):
Unclear.
Stolon had been running uninterrupted for 2 months until this happened.
Environment:
master
commit 4bb4107The text was updated successfully, but these errors were encountered: