-
-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ref(transactions): add functionality to sample transactions in save_event_transaction #81077
base: master
Are you sure you want to change the base?
Conversation
What are the implications of transactions sampling not happening exactly once? Won't this potentially sample some messages in flight twice or not at all depending on the direction of option change |
I think #81079 could probably be part of this same PR, otherwise all this option does is double sample. |
@lynnagara the goal of the PRs is to be reviewed separately, not to be deployed separately. i would never turn the option on without the other one deployed |
from reading about the transactions sampler, i believe it is a feature that helps with parameterizing high cardinality transaction names. i don't believe double sampling (wouldn't happen as i've implemented) or not having samples (would be the case for a few moments for this deploy) would have a large effect on the product. |
b07a90f
to
a44c8d6
Compare
a44c8d6
to
74e168f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We originally put the "record transaction" logic in post process to move it off the critical path. But if that task is going to disappear completely, this makes sense.
Yeah, that makes. Unfortunately at this point the load we're putting on RabbitMQ and potential instability it causes isn't worth having a whole other task for transactions. I think once we have more durable execution with task workers, it's certainly worth investigating breaking up again. |
@@ -1540,6 +1541,133 @@ def test_transaction_event_span_grouping(self) -> None: | |||
# the basic strategy is to simply use the description | |||
assert spans == [{"hash": hash_values([span["description"]])} for span in data["spans"]] | |||
|
|||
@override_options({"transactions.do_post_process_in_save": True}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What value does this test add in contrast with the next one? I think they're the same but the next one tests for the two calls.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there is a comment below, but the next test ensures that the functions get called with the correct mocks, while this one ensures that the functions themselves don't error / run correctly. (didn't see a great way to validate that the side effects those functions call), so have two.
74e168f
to
48e4329
Compare
48e4329
to
80c2656
Compare
80c2656
to
6f9afa7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good with #81079 so that we aren't double sampling.
merged that PR into this one so it's easier to read / can assure they go out together. (won't do anything until the option is modified) |
rollout strategy: #81065 (comment) |
src/sentry/event_manager.py
Outdated
): | ||
continue | ||
project = job["event"].project | ||
record_transaction_name_for_clustering(project, job["event"].data) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that this is probably fine, although I don't really know how this sampling is used. If this feature relies on this transaction being indexed in snuba then there could be downstream problems.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
confirmed it does not rely on the transaction being indexed -- this system uses redis only.
We also log sentry/src/sentry/tasks/post_process.py Lines 653 to 659 in d25bb29
|
We have to be very careful about how this impacts performance of save event. I think we should look into a couple of things first before making this change:
|
I agree we should time it to be safe, but I think it's unlikely that it really takes very long compared to sentry/src/sentry/event_manager.py Line 2635 in 9ee399c
save_transaction_events . This does a lot of relatively expensive analysis on all the details of the transaction.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO: let's use two options for the rollout
we spent some time looking at the performance of the signals / transactions clustering on current telemetry in post_process, and both looked to be sub 10ms average, with no large outliers. this PR includes timing telemetry so we can monitor perf in S4S to confirm before any wider rollout takes place. also confirmed that transaction clustering doesn't look at anything in clickhouse, it only uses redis. i've split the option used to roll this out into two, so we can roll out by turning on after we confirm looks good, we can flip on |
a4a40d8
to
605d980
Compare
Codecov ReportAll modified and coverable lines are covered by tests ✅ ✅ All tests successful. No failed tests found. Additional details and impacted files@@ Coverage Diff @@
## master #81077 +/- ##
==========================================
+ Coverage 78.48% 80.36% +1.87%
==========================================
Files 7215 7217 +2
Lines 319812 319896 +84
Branches 44045 20741 -23304
==========================================
+ Hits 251012 257078 +6066
- Misses 62411 62426 +15
+ Partials 6389 392 -5997 |
Currently there are 3 things that happen in post processing:
I think instead of splitting these under different options you can combine this under a single one sentry/src/sentry/tasks/post_process.py Line 528 in 2e30c63
I think there are a few benefits of this approach:
What do you think? |
i've gone ahead and implemented this -- it's much cleaner. thank you for the suggestion. closing the follow up PR. |
updated rollout plan: #81065 (comment) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if ( | ||
consumer_type == ConsumerType.Transactions | ||
and event_id | ||
and in_rollout_group("transactions.do_post_process_in_save", event_id) | ||
): | ||
# we won't use the transaction data in post_process | ||
# so we can delete it from the cache now. | ||
if cache_key: | ||
processing_store.delete_by_key(cache_key) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just double checking - should this be in the finally
block, or in an else
block?
do we want to run it if if we hit the save event exception above?
first PR of #81065
no behavior changes until option is turned on. need to change post_process as well before we can turn the option on. equivalent of
sentry/src/sentry/tasks/post_process.py
Line 614 in b84d3d9