-
-
Notifications
You must be signed in to change notification settings - Fork 433
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
specified neither primary_conninfo nor restore_command
on restore from scratch
#803
Comments
@Hannsre Thanks. Please attach You can also try running the playbook in debug mode ( |
Hey, sure, here's the log, it's the same on both replicas, despite the timestamps:
Note: I replaced the S3 Endpoint and Stanza. And on the Leader:
As well as
I ran ansible with single verbosity Edit/Addition:
Both initial replicas are fine. Another Update: On another restore run it worked fine until it tried to bring the leader up after restoring. This is from the
and in ansible:
and it keeps on counting |
Thanks! I just tried a PITR and it worked perfectly fine! No error, all Replicas and the Leader came back up fine without any issues/error. I also tried restoring from a backup after destroying the cluster and it still failed. Here's the logs regarding restore: Ansible status:
|
cc @SDV109 |
Hey everyone,
sorry to be back yet again. This is more of a report though, as I already got it working. But since it's unpredictable behavior I'd like to let you know.
I destroyed my previous cluster to rebuild it from scratch using a backup. This eventually works, but it took me a total of 4 runs to get it fully restored and started - without changing any settings.
First I got this on node 3, then after another try on node 2, both are replicas:
This was during
name: Wait for the PostgreSQL start command to complete
I could mitigate it by running
sudo -u postgres /usr/lib/postgresql/16/bin/pg_ctl start -D /var/lib/postgresql/16/main -o '-c hot_standby=off'
. After that I gotserver started
, but the playbook already failed so I had to start over.There was nothing else in the logs so there's nothing else to post. Ansible failed due to the exit code not being 0.
On my third run this worked fine, but another issue on the leader came up:
I guess the config wasn't properly populated/passed to the restore command?
Ansible would be stuck at
waiting for port 8008 to become available
.My "fix" was to add
at the end of
/var/lib/postgresql/16/main/postgresql.conf
, then runsudo -u postgres /usr/lib/postgresql/16/bin/pg_ctl start -D /var/lib/postgresql/16/main -o '-c hot_standby=off' -w -t 180
.This led to
So at least some progress? At this point I decided to run the playbook again because the steps above that worked on the replicas did not work here.
That 4. run now just finished without issues and the cluster was back up, our app is also working and not complaining.
Config is basically the same as in #770 , I only changed the ansible command to
ansible-playbook deploy_pgcluster.yml -e '@pgcn.yml' --ask-vault-pass -v
Here's what I set for
pgbackrest
in thevars/main.yml
Let me know if you need any more information to debug this.
The text was updated successfully, but these errors were encountered: