Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(sentryapps) Add RPC caching to installation for organization call #80488

Merged
merged 6 commits into from
Nov 19, 2024

Conversation

markstory
Copy link
Member

The get_installed_for_organization is our highest volume RPC call, as it is part of post_process_group. It also rarely changes are sentry-app installation rates are low compared to ingest volumes.

Having the response of this RPC call cached should help reduce tail call times for post_process_group and free up control silo webworkers.

This is our highest frequency RPC call right now. The results of this
RPC call don't often change (as orgs infrequently setup new sentryapps).

These changes add a new RPC call which caching wrappers that remove
a significant number of RPC calls from error ingestion tasks.
The sentry_apps method requires a new style caching adapter.
Move the option read into the wrapper method and update call sites to
call the wrapper method now. This will simplify cleanup.
@markstory markstory requested review from a team as code owners November 8, 2024 21:45
@github-actions github-actions bot added the Scope: Backend Automatically applied to PRs that change backend components label Nov 8, 2024
Comment on lines +82 to +85
if in_random_rollout("app_service.installations_for_org.cached"):
return get_installations_for_organization(organization_id)
else:
return self.get_installed_for_organization(organization_id=organization_id)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once these changes are fully deployed, I plan on using this option to shift load onto caching and hope to observe changes in RPC throughput and post_process task duration.

Copy link

codecov bot commented Nov 8, 2024

Codecov Report

Attention: Patch coverage is 93.93939% with 4 lines in your changes missing coverage. Please review.

✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
src/sentry/hybridcloud/rpc/caching/service.py 93.18% 1 Missing and 2 partials ⚠️
src/sentry/sentry_apps/services/app/service.py 92.30% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           master   #80488       +/-   ##
===========================================
+ Coverage   57.53%   78.46%   +20.93%     
===========================================
  Files        7199     7221       +22     
  Lines      319053   319642      +589     
  Branches    43912    43971       +59     
===========================================
+ Hits       183561   250815    +67254     
+ Misses     130738    62441    -68297     
- Partials     4754     6386     +1632     

When cache read returns no data, the wrapped function will be
invoked. The result of the wrapped function is then stored in cache.

Ideal for 'get many X for organization' style methods
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q; How is this class and the SiloCacheManyBackedCallable different? It seems like they have the same purpose.

Like could we adapt the result processing in SiloCacheManyBackedCallable to process a list of objects. Would that lead to a lot of downstream refactoring? I guess I'm trying to understand the +/- of each approach. :thonk

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q; How is this class and the SiloCacheManyBackedCallable different? It seems like they have the same purpose.

SiloCacheManyBackedCallable has a similar return value, but different parameters and behavior. The CacheMany decorator takes a list of ids, and will attempt a multi-get on cache. Then if the results from cache have 'holes', it emits a single RPC request to get the balance. Each id maps to a different cache key so that we can share cache results with the get_by_id style cache.

The new caching decorator, takes a single id as parameter and gets a list of results. There is only a single cache key and the result is always a list. The two approaches could probably be combined, but I was concerned about the shared code path just moving the complexity into if/else logic.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we'd have to add too much complexity to CacheMany ? Like if we input a list of 1 org_id (list[int]) then CacheMany would return a list[list[SentryAppInstallation]], the added complexity being we need to serialize a list of lists instead of calling record.json. Am I misunderstanding how CacheMany works and/or where would the if/else logic be added to?

Or is it like kinda jank because they have separate "definitions" i.e CacheMany is list[X] -> list[Y] where as the new decorator is more X -> list[Y]

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or is it like kinda jank because they have separate "definitions" i.e CacheMany is list[X] -> list[Y] where as the new decorator is more X -> list[Y]

I think this could contribute to jank/complexity. The internals of resolve_from would be a big if/else block which lead me to keeping them separate classes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kinda sees, but the logic between resolve_from & get_many is not that different though if we're only working on 1 id though (??). I do agree on trying refactor one function to work with another definition is not the best though

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kinda sees, but the logic between resolve_from & get_many is not that different though if we're only working on 1 id though

Yeah the logic there would be similar. I felt that duplicating the code would result in easier to read and change logic over inheritance though.

src/sentry/hybridcloud/rpc/caching/service.py Outdated Show resolved Hide resolved
@markstory markstory merged commit 15cb8de into master Nov 19, 2024
50 checks passed
@markstory markstory deleted the chore-installed-for-org-caching branch November 19, 2024 15:15
andrewshie-sentry pushed a commit that referenced this pull request Nov 19, 2024
#80488)

The `get_installed_for_organization` is our highest volume RPC call, as
it is part of post_process_group. It also rarely changes are sentry-app
installation rates are low compared to ingest volumes.

Having the response of this RPC call cached should help reduce tail call
times for post_process_group and free up control silo webworkers.
markstory added a commit that referenced this pull request Nov 21, 2024
With all traffic using the cached method call we can remove the
deprecated RPC call.

Refs #80488
Copy link

sentry-io bot commented Nov 22, 2024

Suspect Issues

This pull request was deployed and Sentry observed the following issues:

  • ‼️ OperationalError: OperationalError('server closed the connection unexpectedly\n\tThis probably means the server ter... sentry.sentry_apps.tasks.sentry_apps.process_re... View Issue

Did you find this useful? React with a 👍 or 👎

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Scope: Backend Automatically applied to PRs that change backend components
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants