[bitnami/postgresql-ha] ERROR: cannot execute INSERT in a read-only transaction #23219

rohitkgupta · 2024-02-07T09:10:55Z

Name and Version

bitnami/postgresql-ha 6.5.5

What architecture are you using?

None

What steps will reproduce the bug?

Deploy PostgreSQL with HA architecture (using Replication Manager (repmgr) and Pgpool using bitnami helm charts
Postgresql replicas=2 and Pgpool replicas=2
C++ application connected to DB via Pgpool using the libpq library and executing SELECT, INSERT and Update queries.
Occasionally, when a master PostgreSQL node switches to slave and slave node to master due to a restart or any other reason, Pgpool may continue to send all queries to the previous master, which is now acting as a slave. This results in the following error:ERROR: cannot execute INSERT in a read-only transaction
This error occurs because Pgpool attempts to execute INSERT and UPDATE queries on a PostgreSQL instance that is in a read-only state.

Are you using any custom parameters or values?

----------------------------
pgPool-II configuration file
----------------------------

backend_clustering_mode = 'streaming_replication'

listen_addresses = '*'
port = '5432'
socket_dir = '/opt/build/pgpool/tmp'
reserved_connections = 0



pcp_listen_addresses = '*'
pcp_port = 9898
pcp_socket_dir = '/opt/build/pgpool/tmp'
listen_backlog_multiplier = 2
serialize_accept = off


backend_hostname0 = 'host1'
backend_port0 = 5432
backend_weight0 = 1
backend_data_directory0 = '/data'
backend_flag0 = 'ALLOW_TO_FAILOVER'
backend_application_name0 = 'server0'


enable_pool_hba = 'on'
authentication_timeout = '30'

allow_clear_text_frontend_auth = 'off'


ssl = 'on'
ssl_key = '/opt/build/pgpool/certs/tls.key'
ssl_cert = '/opt/build/pgpool/certs/tls.crt'

ssl_ciphers = 'HIGH:MEDIUM:+3DES:!aNULL'
ssl_prefer_server_ciphers = off
ssl_ecdh_curve = 'prime256v1'
ssl_dh_params_file = ''



num_init_children = '200'
max_pool = '15'


child_life_time = 5min
child_max_connections = 0
connection_life_time = 0
client_idle_limit = 0




log_destination = 'stderr'



log_connections = 'off'
log_disconnections = off
log_hostname = 'on'
log_statement = off
log_per_node_statement = 'off'
log_client_messages = off
log_standby_delay = 'if_over_threshold'


syslog_facility = 'LOCAL0'
syslog_ident = 'pgpool'



pid_file_name = '/opt/build/pgpool/tmp/pgpool.pid'
logdir = '/opt/build/pgpool/logs'



connection_cache = on

reset_query_list = 'ABORT; DISCARD ALL'



replicate_select = off

insert_lock = off
lobj_lock_table = ''


replication_stop_on_mismatch = off

failover_if_affected_tuples_mismatch = off



load_balance_mode = 'off'
ignore_leading_white_space = on
read_only_function_list = ''
write_function_list = ''

primary_routing_query_pattern_list = ''

database_redirect_preference_list = ''

app_name_redirect_preference_list = ''
allow_sql_comments = off

disable_load_balance_on_write = 'transaction'

dml_adaptive_object_relationship_list= ''

statement_level_load_balance = 'off'



sr_check_period = '30'
sr_check_user = 'repmgr'

sr_check_database = 'postgres'
delay_threshold = 10000000


follow_primary_command = ''


health_check_period = '30'
health_check_timeout = '10'
health_check_user = 'repmgr'

health_check_database = ''
health_check_max_retries = '5'
health_check_retry_delay = '5'
connect_timeout = 10000



failover_command = 'echo ">>> Failover - that will initialize new primary node search!"'
failback_command = ''

failover_on_backend_error = 'off'

detach_false_primary = off

search_primary_node_timeout = '0'


recovery_user = 'nobody'
recovery_password = ''

recovery_1st_stage_command = ''
recovery_2nd_stage_command = ''
recovery_timeout = 90
client_idle_limit_in_recovery = 0

auto_failback = off
auto_failback_interval = 1min



use_watchdog = off


trusted_servers = ''
ping_path = '/bin'


hostname0 = ''
wd_port0 = 9000
pgpool_port0 = 9999



wd_priority = 1

wd_authkey = ''

wd_ipc_socket_dir = '/tmp'



delegate_IP = ''
if_cmd_path = '/sbin'
if_up_cmd = '/usr/bin/sudo /sbin/ip addr add $_IP_$/24 dev eth0 label eth0:0'
if_down_cmd = '/usr/bin/sudo /sbin/ip addr del $_IP_$/24 dev eth0'
arping_path = '/usr/sbin'
arping_cmd = '/usr/bin/sudo /usr/sbin/arping -U $_IP_$ -w 1 -I eth0'


clear_memqcache_on_escalation = on
wd_escalation_command = ''
wd_de_escalation_command = ''


failover_when_quorum_exists = on

failover_require_consensus = on

allow_multiple_failover_requests_from_node = off


enable_consensus_with_half_votes = off




wd_lifecheck_method = 'heartbeat'
wd_interval = 10


heartbeat_hostname0 = ''
heartbeat_port0 = 9694
heartbeat_device0 = ''


wd_heartbeat_keepalive = 2
wd_heartbeat_deadtime = 30


wd_life_point = 3
wd_lifecheck_query = 'SELECT 1'
wd_lifecheck_dbname = 'template1'
wd_lifecheck_user = 'nobody'
wd_lifecheck_password = ''

relcache_expire = 0
relcache_size = 256

check_temp_table = catalog

check_unlogged_table = on
enable_shared_relcache = on

memory_cache_enabled = off
memqcache_method = 'shmem'
memqcache_memcached_host = 'localhost'
memqcache_memcached_port = 11211
memqcache_total_size = 64MB
memqcache_max_num_cache = 1000000
memqcache_expire = 0
memqcache_auto_cache_invalidation = on
memqcache_maxcache = 400kB
memqcache_cache_block_size = 1MB
memqcache_oiddir = '/var/log/pgpool/oiddir'
cache_safe_memqcache_table_list = ''
cache_unsafe_memqcache_table_list = ''
backend_hostname0 = 'postgresql-ha-postgresql-0.postgresql-ha-postgresql-headless.ns1.svc.cluster.local'
backend_port0 = 5432
backend_weight0 = 1
backend_data_directory0 = '/opt/build/pgpool/data'
backend_flag0 = 'ALLOW_TO_FAILOVER'
backend_hostname1 = 'postgresql-ha-postgresql-1.postgresql-ha-postgresql-headless.ns1.svc.cluster.local'
backend_port1 = 5432
backend_weight1 = 1
backend_data_directory1 = '/opt/build/pgpool/data'
backend_flag1 = 'ALLOW_TO_FAILOVER'

What is the expected behavior?

Pgpool should consistently direct queries to the master node and promptly switch to the new master node whenever a change occurs.

What do you see instead?

Pgpool doesn't switch to the master node until we restart pgpool deployment.

The text was updated successfully, but these errors were encountered:

rohitkgupta · 2024-02-07T09:12:01Z

-----------------------------

PostgreSQL configuration file

-----------------------------

listen_addresses = '*'
port = '5432'
max_connections = '512'

ssl = 'on'
ssl_cert_file = '/certs/tls.crt'
ssl_key_file = '/certs/tls.key'

wal_level = 'hot_standby'
fsync = 'on'

max_wal_size = '400MB'

archive_mode = 'on'
archive_command = '/bin/true'

max_wal_senders = '16'
wal_keep_size = '128MB'

max_replication_slots = '10'

primary_conninfo = 'host=postgresql-ha-postgresql-1.postgresql-ha-postgresql-headless.ns1.svc.cluster.local port=5432 user=repmgr
promote_trigger_file = '/tmp/postgresql.trigger.5432'
hot_standby = 'on'

logging_collector = 'on'

log_directory = '/opt/build/postgresql/logs'
log_filename = 'postgresql.log'

log_connections = 'false'
log_disconnections = 'false'
log_hostname = 'true'

client_min_messages = 'error'

shared_preload_libraries = 'repmgr, pgaudit, repmgr'

include_dir = 'conf.d'

pgaudit.log_catalog = 'off'

github-actions · 2024-02-23T01:20:14Z

This Issue has been automatically marked as "stale" because it has not had recent activity (for 15 days). It will be closed if no further activity occurs. Thanks for the feedback.

rohitkgupta · 2024-02-25T05:55:19Z

ping

migruiz4 · 2024-02-27T12:18:20Z

Hi @rohitkgupta,

I'm sorry for the late response.

I may need more information to help you with this issue. Does your issue persist over time when the master/slave switch happens? Or is this issue transient?

Please notice that short downtimes may happen when master is switched because in case of failure, the chart behavior would be the following:

Repmgr will notice the Master node is not available and select a new one.
Pgpool-II health_check will detect the master node is not available and stop accepting new requests.
Because of new request requests being refused, first the readiness probe will fail and pgpool-II will no longer receive traffic.
Eventually, pgpool-II will be restarted by the liveness probe failure, connecting to the new master node.

If the error ERROR: cannot execute INSERT in a read-only transaction is indeed transient, I would recommend fine-tunning both the pgpool-II healthcheck settings and the chart liveness/readiness probe settings for quicker recovery from failure.

rohitkgupta · 2024-02-28T05:50:43Z

@migruiz4
Thanks for your response.
It is not a transient issue. Pgpool didn't restart and there were no probe failure for many hours, we had to restart Pgpool manually in this case.

github-actions · 2024-03-15T01:21:27Z

This Issue has been automatically marked as "stale" because it has not had recent activity (for 15 days). It will be closed if no further activity occurs. Thanks for the feedback.

BK-STAR77 · 2024-03-17T21:18:08Z

same issue for me with pgpool:4.5.1-debian-12-r0 downgrade to 4.5.0-debian-12-r10 to fix

migruiz4 · 2024-03-21T13:39:49Z

Thank you very much for your feedback @BK-STAR77!

The container logic has not received any change from 4.5.0 to 4.5.1 that could cause this behavior. Checking pgpool release notes, it appears that it could be related to the change of the default value for failover_on_backend_shutdown.

Could you please confirm if setting failover_on_backend_shutdown=on fixes the issue? If that is the case, we should set it as the default value because that may be desired for Kubernetes.

apoca · 2024-03-27T12:05:25Z

I have same issue here and reverting manually to 4.5.0-debian-12-r10 in deployment is working now... but, this could be very dangerous for production...

BK-STAR77 · 2024-03-27T12:57:18Z

Thank you very much for your feedback @BK-STAR77!

The container logic has not received any change from 4.5.0 to 4.5.1 that could cause this behavior. Checking pgpool release notes, it appears that it could be related to the change of the default value for failover_on_backend_shutdown.

Could you please confirm if setting failover_on_backend_shutdown=on fixes the issue? If that is the case, we should set it as the default value because that may be desired for Kubernetes.

Is it a variable in the values.yaml ?

apoca · 2024-03-28T20:14:34Z

I don't see this variable in values.yaml :/

Stevenpc3 · 2024-04-08T15:23:42Z

We are experiencing very bad failover issues also.

We updated to the newer charts from waaaaaay back and the latest charts appear to not failover correctly at all. We are using kubernetes.

We deploy then deploy our other pods to connect and all is fine.
Next we delete the primary node pod using kubectl delete pod
everything "appears" fine, but then we delete our other pods connected and they start a db-migration using liquibase and we get this error.

with a corresponding error in pgpool of the following

There have been other errors too that state it is in read-only mode

Caused by: org.postgresql.util.PSQLException: ERROR: cannot execute UPDATE in a read-only transaction

We have attempted to downgrad the image of pgpool to 4.5.0 while still using the current charts but pgpool will fail to start with the following error.

cp: cannot stat '/opt/bitnami/pgpool/etc/pgpool.conf.sample': No such file or directory

So currently, it appears the pg 16 chart version 14+ do not properly failover which makes HA useless and dangerous.

We were hoping to update since previous issues with pgpool and the chart caused pgpool to restart beacause of liveness probes and those updates to the probe script were added and we hoped all was fixed. There was also a previous issue where the DB would go into split brain pretty often and that was super dangerous because it wouldn't fail and you would have one outdated db and one current and it would read from both... so we had to shut off loadbalancing.

Edit:

After restarting PGPool they all EVENTUALLY came back up. Will test again to see if it works eventually WITHOUT restart and see if it replicates again.

I am also pouring through the other issues to see if any are related.

apoca · 2024-04-08T15:50:34Z

@Stevenpc3 Exactly the same thing is happening to me, but as I only started using postgresql-ha about a month ago I thought it was a problem with some bad configuration on my part. But I had this problem for weeks, and only by deleting the extra primary one did it start to work, because it recreated everything, but automatically it doesn't work. In my case I ended up having different random data and I noticed that I had two primary.

Stevenpc3 · 2024-04-08T16:29:31Z

@apoca that issue you are describing is split-brain #20998 more on the internet about it too. The issue I mentioned at the bottom of my comment is more like replication stops working with no errors or issues detected and is super dangerous and makes failover pointless.

Both of those above issues are not related to the current issue in this thread though. This thread appears to be an issue with PGPool and/or the bitnami chart configs that use it.

Stevenpc3 · 2024-04-09T16:30:12Z

You can add the flag

postgresql-ha:
  pgpool:
    configuration: |
      connection_cache = false
      failover_on_backend_error = on
      failover_on_backend_shutdown = on

RaddishIoW · 2024-04-20T22:10:42Z

You can add the flag

postgresql-ha:
  pgpool:
    configuration: |
      connection_cache = false
      failover_on_backend_error = on
      failover_on_backend_shutdown = on

Is the "postgresql-ha:" name correct? My config file has the name as "postgresql:" - also, the pgpool: section is a top-level item in the values.yaml... Sorry if I'm missing something here - I've just started using this.

Is it not meant to be:

postgresql:
...
pgpool:
  configuration: |-
    connection_cache = false
    failover_on_backend_error = on
    failover_on_backend_shutdown = on

?

Stevenpc3 · 2024-04-20T22:25:28Z

My apologies, your way is correct for using the chart directly. I am using a wrapper chart. We have a chart that uses this chart as a dependency so we can template a few things.

apoca · 2024-04-20T22:32:50Z

Since you talk about this, I have a question here... I have software as a service and I create a schema database and a user/password as well as the necessary permissions in real time... Sometimes I have a problem with Kind because saying that the database does not exist (but it exists) my question is whether this “false” cache solves this type of problem in pgpool?

Stevenpc3 · 2024-06-27T13:32:10Z

We had someone do a bit of testing...

We tried to reproduce the steps described in a comment by user "rafariossaa" (#23219 (comment)).

Summary:

The psql command from rafariossaa generally does not reproduce the pgpool issue.
We wrote a different psql command that generally does cause the pgpool issue (against the current beast configuration).
On recommendation, we tested against the current "vanilla" bitnami postgresql-ha (14.2.7) and were able to reproduce the problem again.
Setting some pgpool config items (connection_cache = false, failover_on_backend_error = on, failover_on_backend_shutdown = on) against the 14.2.7 version seems to fix it.

The longer version:

The comment from rafariossaa sets up the database, adds a separate pod to run psql from (which is a good trick), and then runs a command to connect to pgpool and continuously insert into a test table. The watch is inside psql, so there's one db connection that is kept open.

echo "insert into test values ('1234'); \watch" | psql -U postgres -h  mypg-postgresql-ha-pgpool

Generally, failing the primary pod here causes the psql connection to break and you can restart the command and things work seem fine.

You can change the psql command so that it creates a db connection for each command (here, an insert and a select).

while true; do echo "insert into test values ('123'); select count(*) from test;" | psql -t -U postgres -h mypg-postgresql-ha-pgpool ; sleep 3; done

Generally, this reproduces the pgpool issue. Failing the primary pod causes a few failed connections (connection refused while things fail over) and then an error about the read-only transaction ("ERROR: cannot execute INSERT in a read-only transaction"), where pgpool has apparently failed to point to the correct postgres instance.

On suggestion, we installed the "vanilla" bitnami charts. In the as-is configuration, this will reproduce the read-only transaction issue.

After adding the settings from the Github thread, I've been unable to reproduce the issue so far:

pgpool:
  configuration: |-
   connection_cache = false
   failover_on_backend_error = on
   failover_on_backend_shutdown = on

Will do more testing.

rafariossaa · 2024-06-27T15:08:42Z

Hi,
Thanks for your feedback.

rafariossaa · 2024-07-11T09:00:56Z

Not sure if after the feedback this issue could be closed.
@rohitkgupta WYT ?

gtudan · 2024-07-19T12:36:44Z

We are still experiencing this issue. Disabling the connection cache feels like a bad move, as it pretty much defeats the purpose of using a pooler in the first place

rafariossaa · 2024-07-24T12:47:08Z

@gtudan, consider that if you loose the master, you are going to have some connection cut/re-connection. Could you elaborate a bit more in you case or open a new issue ? In that case, feel free to link to this one if consider it.

gtudan · 2024-07-24T19:39:15Z

@rafariossaa Sure, if the master is gone, the connection cache for that node has to be flushed.

In our scenario we observed writes on the slave even when both nodes were up - I assume that happened after a helm upgrade when a failover/failback was not properly performed, but I'm not sure.

I totally understand that disabling the connection cache prevents those issues, as the connections are reestablished every time. But I don't quite see why the cache should have to be disabled in normal operations to prevent write operations ending up on the slave.

Unfortunately, we had to rollback to the non-ha chart, because we could not figure out this issue, so I'm currently not able to reproduce this. We'll plan to retry in a few weeks, hopefully I'll be able to provide further insights then.

rafariossaa · 2024-07-30T09:57:22Z

I don't like the idea of disable the cache either.
Thanks, don't hesitate to share your findings.

github-actions · 2024-08-17T01:26:52Z

This Issue has been automatically marked as "stale" because it has not had recent activity (for 15 days). It will be closed if no further activity occurs. Thanks for the feedback.

github-actions · 2024-08-22T01:27:14Z

Due to the lack of activity in the last 5 days since it was marked as "stale", we proceed to close this Issue. Do not hesitate to reopen it later if necessary.

yukha-dw · 2024-11-07T10:49:21Z

You can add the flag

postgresql-ha:
  pgpool:
    configuration: |
      connection_cache = false
      failover_on_backend_error = on
      failover_on_backend_shutdown = on

This config does not working for me. One of 3 Pgpool nodes keeps setting previously replica node as primary node when doing helm upgrade.

rohitkgupta added the tech-issues The user has a technical issue about an application label Feb 7, 2024

github-actions bot added the triage Triage is needed label Feb 7, 2024

github-actions bot assigned carrodher Feb 7, 2024

carrodher added in-progress postgresql-ha labels Feb 7, 2024

github-actions bot removed the triage Triage is needed label Feb 7, 2024

github-actions bot assigned migruiz4 and unassigned carrodher Feb 7, 2024

github-actions bot added the stale 15 days without activity label Feb 23, 2024

github-actions bot removed the stale 15 days without activity label Feb 26, 2024

github-actions bot added the stale 15 days without activity label Mar 15, 2024

github-actions bot removed the stale 15 days without activity label Mar 18, 2024

carrodher assigned migruiz4 and unassigned rafariossaa Jul 31, 2024

fmulero self-assigned this Aug 1, 2024

fmulero mentioned this issue Aug 7, 2024

pgpool redirects traffic to a slave #28222

Closed

github-actions bot added the stale 15 days without activity label Aug 17, 2024

github-actions bot added the solved label Aug 22, 2024

bitnami-bot closed this as not planned Won't fix, can't repro, duplicate, stale Aug 22, 2024

github-actions bot removed the in-progress label Aug 22, 2024

juan131 reopened this Nov 26, 2024

github-actions bot added triage Triage is needed and removed solved labels Nov 26, 2024

github-actions bot assigned carrodher Nov 26, 2024

juan131 added on-hold Issues or Pull Requests with this label will never be considered stale and removed triage Triage is needed stale 15 days without activity labels Nov 26, 2024

juan131 assigned juan131 and unassigned carrodher, fmulero and migruiz4 Nov 26, 2024

juan131 mentioned this issue Nov 26, 2024

[bitnami/pgpool] Enhance pgpool-II Recovery Mechanism bitnami/containers#75225

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[bitnami/postgresql-ha] ERROR: cannot execute INSERT in a read-only transaction #23219

[bitnami/postgresql-ha] ERROR: cannot execute INSERT in a read-only transaction #23219

rohitkgupta commented Feb 7, 2024 •

edited by carrodher

Loading

rohitkgupta commented Feb 7, 2024 •

edited

Loading

github-actions bot commented Feb 23, 2024

rohitkgupta commented Feb 25, 2024

migruiz4 commented Feb 27, 2024

rohitkgupta commented Feb 28, 2024

github-actions bot commented Mar 15, 2024

BK-STAR77 commented Mar 17, 2024 •

edited

Loading

migruiz4 commented Mar 21, 2024

apoca commented Mar 27, 2024

BK-STAR77 commented Mar 27, 2024

apoca commented Mar 28, 2024

Stevenpc3 commented Apr 8, 2024 •

edited

Loading

apoca commented Apr 8, 2024

Stevenpc3 commented Apr 8, 2024

Stevenpc3 commented Apr 9, 2024 •

edited

Loading

RaddishIoW commented Apr 20, 2024

Stevenpc3 commented Apr 20, 2024

apoca commented Apr 20, 2024

Stevenpc3 commented Jun 27, 2024 •

edited

Loading

rafariossaa commented Jun 27, 2024

rafariossaa commented Jul 11, 2024

gtudan commented Jul 19, 2024

rafariossaa commented Jul 24, 2024 •

edited

Loading

gtudan commented Jul 24, 2024

rafariossaa commented Jul 30, 2024

github-actions bot commented Aug 17, 2024

github-actions bot commented Aug 22, 2024

yukha-dw commented Nov 7, 2024

[bitnami/postgresql-ha] ERROR: cannot execute INSERT in a read-only transaction #23219

[bitnami/postgresql-ha] ERROR: cannot execute INSERT in a read-only transaction #23219

Comments

rohitkgupta commented Feb 7, 2024 • edited by carrodher Loading

Name and Version

What architecture are you using?

What steps will reproduce the bug?

Are you using any custom parameters or values?

What is the expected behavior?

What do you see instead?

rohitkgupta commented Feb 7, 2024 • edited Loading

-----------------------------

PostgreSQL configuration file

-----------------------------

github-actions bot commented Feb 23, 2024

rohitkgupta commented Feb 25, 2024

migruiz4 commented Feb 27, 2024

rohitkgupta commented Feb 28, 2024

github-actions bot commented Mar 15, 2024

BK-STAR77 commented Mar 17, 2024 • edited Loading

migruiz4 commented Mar 21, 2024

apoca commented Mar 27, 2024

BK-STAR77 commented Mar 27, 2024

apoca commented Mar 28, 2024

Stevenpc3 commented Apr 8, 2024 • edited Loading

Edit:

apoca commented Apr 8, 2024

Stevenpc3 commented Apr 8, 2024

Stevenpc3 commented Apr 9, 2024 • edited Loading

RaddishIoW commented Apr 20, 2024

Stevenpc3 commented Apr 20, 2024

apoca commented Apr 20, 2024

Stevenpc3 commented Jun 27, 2024 • edited Loading

rafariossaa commented Jun 27, 2024

rafariossaa commented Jul 11, 2024

gtudan commented Jul 19, 2024

rafariossaa commented Jul 24, 2024 • edited Loading

gtudan commented Jul 24, 2024

rafariossaa commented Jul 30, 2024

github-actions bot commented Aug 17, 2024

github-actions bot commented Aug 22, 2024

yukha-dw commented Nov 7, 2024

rohitkgupta commented Feb 7, 2024 •

edited by carrodher

Loading

rohitkgupta commented Feb 7, 2024 •

edited

Loading

BK-STAR77 commented Mar 17, 2024 •

edited

Loading

Stevenpc3 commented Apr 8, 2024 •

edited

Loading

Stevenpc3 commented Apr 9, 2024 •

edited

Loading

Stevenpc3 commented Jun 27, 2024 •

edited

Loading

rafariossaa commented Jul 24, 2024 •

edited

Loading