Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lower max connections in postgres from 300 to 125 on India #6383

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

gherceg
Copy link
Contributor

@gherceg gherceg commented Sep 6, 2024

https://dimagi.atlassian.net/browse/SAAS-15954

We saw a spike in db connections on India that almost resulted in the instance running out of memory. We typically sit somewhere around ~50 connections, and the spike was up to 156, so we certainly couldn't handle 300 connections with the current machine size. I think we should decrease the number of max connections in this env to accommodate the RDS instance size.

I think because of our usage of pgbouncer in front of the postgres instance, we are already setup in a way that this change won't have an impact on users. If we do reach 125 connections, a client might need to wait a little longer for a request to complete since pgbouncer will wait for a connection to be freed up, but given we aren't hitting 125 connections regularly I don't anticipate this being an issue. We can monitor this graph to see if there is an increase in wait times.

Environments Affected

India

@@ -4,15 +4,15 @@ REPORTING_DATABASES:
aaa-data: aaa-data

pgbouncer_override:
pgbouncer_default_pool: 290
pgbouncer_default_pool: 115
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This leaves room for the 5 reserved connections and 5 additional connections if we ever need direct connections to a database for debugging/firefighting.

@gherceg
Copy link
Contributor Author

gherceg commented Sep 6, 2024

Will need to run cchq --control india ap deploy_postgres.yml --tags=pgbouncer --limit=pgbouncer to apply this change.

@millerdev
Copy link
Contributor

millerdev commented Sep 9, 2024

Do we have any way of measuring if the memory size per connection is reasonably consistent, or is there wide variation? If they are consistent then this seems reasonable, but if they are not then maybe there are circumstances where the previous limit of 300 could be useful and reasonable? Or possibly the inverse where maybe a lower number of connections could cause an OOM?

Seems like the ideal would be to have a setting to limit based on free memory (reserving enough to establish a few emergency connections) and otherwise allow unlimited connections as memory permits. But I'm guessing that's not possible.

@gherceg
Copy link
Contributor Author

gherceg commented Sep 9, 2024

Yeah that is a good point, and good reminder that this change won't necessarily solve any potential problems we may encounter in the future. I can't say how much memory on average was held by each connection when we saw the spike to 156, but given the limited memory on this machine, it seems prudent to lower it at the very least. But this is a lot of hand waving. Based on historical trends, I don't think there is any need for a limit higher than 125 connections, so I'm content with this change, but am acknowledging that it does not guarantee we will avoid OOMs on this RDS instance in the future.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants