You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We observed an issue, where prometheus statefulset with 2 replicas were in a not running state, crashing all the time.
In a discussion it turned out that there is probably a 5 minutes timeout before deleting the PDB.
The argument is, if all pods are crashing that match by a PDB, then you can safely delete the PDB to help with faster recovery.
The text was updated successfully, but these errors were encountered:
With this bug fixed I suggest we try with the 5 minutes ttl and see how effective it is. We could also lower it a bit, but the reason we may not want to completely remove it is that we determine if a PDB should be removed by looking at pod ready state which may take a bit if the pods have a slow startup. We could ofc. also look at a more specific signal like crashloopbackoff but I would rather stay with the simple generic signal of PodReady state and a ttl unless we really need to have a very specific check.
We observed an issue, where prometheus statefulset with 2 replicas were in a not running state, crashing all the time.
In a discussion it turned out that there is probably a 5 minutes timeout before deleting the PDB.
The argument is, if all pods are crashing that match by a PDB, then you can safely delete the PDB to help with faster recovery.
The text was updated successfully, but these errors were encountered: