Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate Spanner's reported session count #1493

Open
data-sync-user opened this issue Oct 19, 2023 · 1 comment
Open

Investigate Spanner's reported session count #1493

data-sync-user opened this issue Oct 19, 2023 · 1 comment

Comments

@data-sync-user
Copy link
Collaborator

The Spanner Session metric (the number of sessions reported from Spanner itself) radically differs from syncstorage’s own internal count of sessions. We’ve previously had an issue where the session count was too high to the point of it degrading Spanner’s performance, so it’s important that we’re confident in these numbers.

We should contact GCP support for advise and to look into why these metrics don’t line up.

┆Issue is synchronized with this Jira Task

@data-sync-user
Copy link
Collaborator Author

➤ Philip Jenvey commented:

Per https://mozilla-hub.atlassian.net/browse/SYNC-3350 ( https://mozilla-hub.atlassian.net/browse/SYNC-3350|smart-link )

The support case for when we had too many open sessions (from 2020-10):

https://console.cloud.google.com/support/cases/detail/v2/25438594?authuser=0&cloudshell=false&organizationId=442341870013&project=moz-fx-sync-prod-3f0c ( https://console.cloud.google.com/support/cases/detail/v2/25438594?authuser=0&cloudshell=false&organizationId=442341870013&project=moz-fx-sync-prod-3f0c )

excerpts from it:

question from us:

“Our metrics and Spanner's session metrics mostly track pretty well, but not always. Can you gives us some more information on what the spanner session metrics represent? Yesterday evening, for instance, we showed Spanner sessions in the 5k to 6k range, while our connection pool metrics show 1100 to 1200 connections.

Also, how long does a closed out gRPC connection take to be reflected in the Spanner session metrics? What about an abandoned / improperly closed gRPC connection? We have theorized that we may not be closing them in correctly pod eviction events.”

answer:

“The sessions metric counts each "communication channel" regardless of the number of session cache(connection pools) they have. Generally, Stackdriver metrics uses a delta window of 1 minute to capture incremental values.

The answer to the second question is more open ended - and depends on a variety of factors (networking, latency, how the client handles connections and session pools, etc) including how the client is implemented. Correct me if I'm wrong, I see that you're using a rust client [1]. Based on other client implementations, this is the following logic used for managing session pools:

Use BatchCreateSessions to init the pool with min sessions.

For subsequent session increases (when min is not enough), increase these in batches as well. In some clients, these are increased in batches of 10. By doing this, you also have to ensure sessions are roughly equally distributed across the num channels (gRPC channels).”

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant