feat: With this PR we add the possibility to have multiple connection pools in Orca #4619

ovidiupopa07 · 2023-12-21T12:09:12Z

With this PR we add the possibility to have multiple connection pools in Orca, one for the writes and one for reads.

Orca’s operations are primarily READ operations from the database (almost 85% of the SQL transactions are SELECT statements). For high-scale customers with hundreds of applications/pipelines with big execution contexts, this translates to extreme pressure towards the backend database and high network utilization, transferring high volumes of data from the database to the Orca pods.

We noticed that for high-scale customers, we have doubled twice the instance types, and now we have reached a situation where the Writer endpoint of the database is getting throttled on the network bandwidth side (hitting the max bandwidth of 20gbps)

On the flip side, the Reader instance of the Orca database sits idle since Orca doesn't support ReadOnly operations through the SQL connection pools. Having Orca splitting the traffic between the READS and WRITES on the database endpoints will dramatically increase the performance/utilization but also provide cost savings for high-scale customers.

We have tested this in our environment; below you can find the statistics

spinnakerbot · 2023-12-21T12:10:18Z

The following commits need their title changed:

d72604d: Rename .java to .kt

Please format your commit title into the form:

<type>(<scope>): <subject>, e.g. fix(kubernetes): address NPE in status check

This allows us to easily generate changelogs & determine semantic version numbers when cutting releases. You can read more about commit conventions here.

kkotula

LGTM 🚀 Thank you!

dogonthehorizon

Let's make sure to document this change alongside SQL configs for other services. Otherwise LGTM, and would be good to get @dbyron0's take as well.

dbyron-sf

I appreciate the effort here. Sounds like we've been running into similar struggles with the scalability of orca's database. We have a bit of a different way of solving this...also with multiple connection pools, but using a different way of configuring them that works with the current mechanism (i.e. the "write" connection pool is still named default).

Besides that difference though, we found that using the read replica for some read operations doesn't work....or at least it doesn't work without some other signficant changes to teach orca to be aware of replication lag. When one task writes something to the database and another task runs immediately afterwards and reads, the current code expects to get exactly what was read, but with replication lag that doesn't always happen.

We're still working through the steps to get the changes to handle that rolled out in prod and gain confidence in it.

We have been using a read replica for a subset of operations though. Lemme see if I can get a PR for that.

jasonmcintosh · 2023-12-21T19:07:32Z

Curious on the places you've found that don't handle async reads after writes. I know the correlation ids are one we hit. Our solution was to modify those to "ignore if there's a unique constraint post read when it tries to insert". I'd not traced all the places that look for a similar read after write operation that would be impacted, but I THOUGHT most of those stages either did "ignore constraint failures" OR "hey retry the read on next queue operation". That said would NOT surprise me if there are more such places.

ovidiupopa07 added 2 commits December 14, 2023 17:38

feat: Add read connection pool support

a6f801d

feat: Add read connection pool support

f35cb18

ovidiupopa07 added 2 commits December 21, 2023 14:11

refactor: Rename .java to .kt

d123fd6

feat: Add write connection pool support

d5e2843

ovidiupopa07 force-pushed the read-connection-pool branch from 774e6d4 to d5e2843 Compare December 21, 2023 12:11

chore: Remove workflow file

42f0030

kkotula approved these changes Dec 21, 2023

View reviewed changes

dogonthehorizon approved these changes Dec 21, 2023

View reviewed changes

dbyron-sf requested changes Dec 21, 2023

View reviewed changes

jasonmcintosh added 4 commits December 22, 2023 11:00

Merge branch 'master' into read-connection-pool

7dad414

Merge branch 'master' into read-connection-pool

253f035

Merge branch 'master' into read-connection-pool

4710819

Merge branch 'master' into read-connection-pool

627feee

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: With this PR we add the possibility to have multiple connection pools in Orca #4619

feat: With this PR we add the possibility to have multiple connection pools in Orca #4619

ovidiupopa07 commented Dec 21, 2023

spinnakerbot commented Dec 21, 2023

kkotula left a comment

dogonthehorizon left a comment

dbyron-sf left a comment

jasonmcintosh commented Dec 21, 2023

feat: With this PR we add the possibility to have multiple connection pools in Orca #4619

Are you sure you want to change the base?

feat: With this PR we add the possibility to have multiple connection pools in Orca #4619

Conversation

ovidiupopa07 commented Dec 21, 2023

spinnakerbot commented Dec 21, 2023

kkotula left a comment

Choose a reason for hiding this comment

dogonthehorizon left a comment

Choose a reason for hiding this comment

dbyron-sf left a comment

Choose a reason for hiding this comment

jasonmcintosh commented Dec 21, 2023