Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RPC handles same name attributes #6581

Merged
merged 5 commits into from
Nov 21, 2024
Merged

RPC handles same name attributes #6581

merged 5 commits into from
Nov 21, 2024

Conversation

xurui-c
Copy link
Member

@xurui-c xurui-c commented Nov 19, 2024

Copy link

codecov bot commented Nov 19, 2024

❌ 1 Tests Failed:

Tests completed Failed Passed Skipped
2666 1 2665 5
View the top 1 failed tests by shortest run time
tests.web.rpc.test_common.TestCommon::test_expression_trace_id
Stack Traces | 0.003s run time
Traceback (most recent call last):
  File ".../web/rpc/test_common.py", line 11, in test_expression_trace_id
    assert attribute_key_to_expression(
AssertionError: assert CAST(\n  trace_id,\n  'String'\n) AS `sentry.trace_id TYPE_STRING` == CAST(\n  trace_id,\n  'String'\n) AS `sentry.trace_id`
  
  Matching attributes:
  ['function_name', 'parameters']
  Differing attributes:
  ['alias']
  
  Drill down into differing attribute alias:
    alias: 'sentry.trace_id TYPE_STRING' != 'sentry.trace_id'
    - sentry.trace_id
    + sentry.trace_id TYPE_STRING
  Full diff:
    CAST(
      trace_id,
      'String'
  - ) AS `sentry.trace_id`
  + ) AS `sentry.trace_id TYPE_STRING`
  ?                      ++++++++++++

To view more test analytics, go to the Test Analytics Dashboard
Got feedback? Let us know on Github

@xurui-c
Copy link
Member Author

xurui-c commented Nov 20, 2024

This is what CH looks like after generating the message:
Screenshot 2024-11-19 at 11 46 53 PM

This is what columns look like in the RPC request:
{ "key": { "type": "TYPE_INT", "name": "foo" }, "label": "tags[foo,number]" }, { "key": { "type": "TYPE_STRING", "name": "foo" }, "label": "tags[foo,string]" }, { "key": { "type": "TYPE_STRING", "name": "foo" }, "label": "tags[foo]" },

after processing the first column (label=tags[foo,number]), the query pipeline caches the name foo in ParsingContext() here. So when processing the second column (label=tags[foo,string]) and the third column (label=tags[foo]), the query pipeline sees that the name (or alias in this part of the codebase) is the same (foo), so it will use the cached result here.

What this caching alias does is turn the CH query into this: SELECT (name AS sentry.name), (CAST(arrayElement(attr_num_3, 'foo'), 'Int64') AS foo), foo, foo, (lower(hex(span_id)) AS sentry.span_id), (transform(CAST((project_id AS sentry.project_id), 'String'), ['4555075531898880'], ['bar'], 'unknown') AS project.name) FROM eap_spans_2_local WHERE in(project_id, [1]) AND equals(organization_id, 1) AND less(_sort_timestamp, toDateTime(1732107600)) AND greaterOrEquals(_sort_timestamp, toDateTime(1732104000)) AND true GROUP BY sentry.name, foo, foo, foo, sentry.span_id, project.nameORDER BYsentry.name ASC LIMIT 10000 OFFSET 0. So the second and third foo columns (which should've been string "five") are cached to be the same as (CAST(arrayElement(attr_num_3, 'foo'), 'Int64') AS foo), which is integer 5.

Potential solutions:

  1. What if we cache at the end of a query, instead of as we process each column? I tried that in a previous commit of this PR, and made the CH query look like this:

SELECT (name AS `sentry.name`), (CAST(arrayElement(attr_num_3, 'foo'), 'Int64') AS foo), (arrayElement(attr_str_3, 'foo') AS foo), (arrayElement(attr_str_3, 'foo') AS foo), (lower(hex(span_id)) AS `sentry.span_id`), (transform(CAST((project_id AS `sentry.project_id`), 'String'), ['4555075531898880'], ['bar'], 'unknown') AS `project.name`) FROM eap_spans_2_local WHERE in(project_id, [1]) AND equals(organization_id, 1) AND less(_sort_timestamp, toDateTime(1732107600)) AND greaterOrEquals(_sort_timestamp, toDateTime(1732104000)) AND true GROUP BY `sentry.name`, foo, foo, foo, `sentry.span_id`, `project.name` ORDER BY `sentry.name` ASC LIMIT 10000 OFFSET 0

But, clickhouse does not allow the same alias (foo). The error is snuba.clickhouse.errors.ClickhouseError: DB::Exception: Different expressions with the same alias foo: attr_str_3['foo'] AS foo and CAST(attr_num_3['foo'], 'Int64') AS foo : While processing attr_str_3['foo'] AS foo. Stack trace:

  1. We don't allow attributes with the same name but different types

  2. Change the query pipeline so that we use the label as the alias in the CH query, so groupby will use label as well. My understanding is that label is always unique (and would thus avoid alias collisions). This effectively changes (CAST(arrayElement(attr_num_3, 'foo'), 'Int64') AS foo), (arrayElement(attr_str_3, 'foo') AS foo), (arrayElement(attr_str_3, 'foo') AS foo) in potential solution 1 to (CAST(arrayElement(attr_num_3, 'foo'), 'Int64') AS tags[foo,number]), (arrayElement(attr_str_3, 'foo') AS tags[foo,string]), (arrayElement(attr_str_3, 'foo') AS tags[foo])

@volokluev
Copy link
Member

Great job reproducing this @xurui-c

I think what would make sense is to have the expression be generated by us here to avoid alias collisions.

And the SelectedExpression alias should have the label the client assigned in the API call. Does that make sense?

@xurui-c xurui-c changed the title reproduce RPC handles same name attributes Nov 20, 2024
@xurui-c xurui-c marked this pull request as ready for review November 21, 2024 23:05
@xurui-c xurui-c requested review from a team as code owners November 21, 2024 23:05
Copy link
Member

@volokluev volokluev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

almost there, just change the delimiter and then you are good to merge. Also @davidtsuk does this mess up your extrapolation aliasing business?

snuba/web/rpc/common/common.py Show resolved Hide resolved
@xurui-c xurui-c merged commit c7bb869 into master Nov 21, 2024
30 checks passed
@xurui-c xurui-c deleted the rachel/reproduce branch November 21, 2024 23:14
xurui-c added a commit that referenced this pull request Nov 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants