Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

StrawberryIntegration slows down query performance dramatically #3670

Open
pcraciunoiu opened this issue Oct 18, 2024 · 11 comments
Open

StrawberryIntegration slows down query performance dramatically #3670

pcraciunoiu opened this issue Oct 18, 2024 · 11 comments

Comments

@pcraciunoiu
Copy link

How do you use Sentry?

Sentry Saas (sentry.io)

Version

2.17.0

Steps to Reproduce

  1. Set up a Django project with Strawberry async.
  2. Set up a query for an endpoint that returns many objects. I tested with 2,000 objects and only 2 fields for each - an ID and a string, e.g.
@strawberry_django.type(models.Item, fields=["id", "value"], pagination=True)
class Item:
   pass


@strawberry.type
class Query:
    items: list[types.Item] = strawberry_django.field()
query Items {
  items(pagination: { limit: 2000 }) {
    id
    value
  }
}

Expected Result

Not much performance difference with or without StrawberryIntegration

Actual Result

With StrawberryIntegration, this took 3-4x longer. In my case, about 800ms. Without it, I got the results back in 200-250ms.

@getsantry getsantry bot moved this to Waiting for: Product Owner in GitHub Issues with 👀 3 Oct 18, 2024
@sentrivana
Copy link
Contributor

Hey @pcraciunoiu, thanks for reporting.

I'm thinking this might be due to the SDK in general having a lot of room for improvement in the async department. Might be the transport blocking. (There are plans for a fully async transport/SDK, but this will take some time.)

Couple things to check:

  • If you don't configure the DSN (i.e., make the SDK not actually send anything out), do you still observe this? If not, my guess is that it's the transport.
  • Do you by any chance have profiling on and can see where we spend the extra time?

@pcraciunoiu
Copy link
Author

Hey @pcraciunoiu, thanks for reporting.

I'm thinking this might be due to the SDK in general having a lot of room for improvement in the async department. Might be the transport blocking. (There are plans for a fully async transport/SDK, but this will take some time.)

Couple things to check:

* If you don't configure the DSN (i.e., make the SDK not actually send anything out), do you still observe this? If not, my guess is that it's the transport.

It doesn't make a difference if the dsn value is passed in or not. Only the StrawberryIntegration makes a difference, and I have to set auto_enabling_integrations=False to ensure it doesn't get added automatically.

* Do you by any chance have profiling on and can see where we spend the extra time?

I did try with pyinstrument, but nothing stood out. I had to trial and error for a few hours until I figured out it's sentry, and then which integration.

I'll attach it in case it helps. If you have recommendations for other profiling tools that might do better, I'd like to learn about them.

1.15s - middleware.py_54 - pyinstrument.zip

@getsantry getsantry bot moved this to Waiting for: Product Owner in GitHub Issues with 👀 3 Oct 18, 2024
@szokeasaurusrex
Copy link
Member

@pcraciunoiu Interesting, I also did not see anything in the profile you attached, although I am not really sure how pyinstrument works (sampling or deterministic).

I think @sentrivana was suggesting that you could try using Sentry's built-in profiler. You can enable the profiler by setting profiles_sample_rate to a value above zero when initializing the Sentry SDK. You also need to set traces_sample_rate to a value above zero. The traces_sample_rate controls the fraction of Sentry transactions which are sent to Sentry, and the profiles_sample_rate controls the fraction of transactions which are profiled. Here is an example of how to set these.

Then, you can view the generated profiles directly in Sentry linked to the corresponding transaction.

@pcraciunoiu
Copy link
Author

@szokeasaurusrex for the profiling I sent, I do not have sentry set up yet as it's a new project.

I do have another project that was running the strawberry integration until late last week, when I found this issue and disabled it. I confirmed on that project too, that it was adding on 150ms or more to GraphQL requests as I tested locally. I have some profiling from last week before I disabled this integration, but it's a private repository and I'm not comfortable including links or details here. If you want to, I can share a sentry link privately via email or otherwise, and/or you can let me know what you need to look into this further.

@getsantry getsantry bot moved this from Waiting for: Community to Waiting for: Product Owner in GitHub Issues with 👀 3 Oct 21, 2024
@sentrivana
Copy link
Contributor

@pcraciunoiu Happy to look at the Sentry link -- you can send it to ivana.kellyer (at) sentry.io.

I will try to repro this today. Also want to compare the performance with a sync Django/Strawberry app. (My money is still on this being a general problem with the SDK blocking.) Will report back.

@sentrivana
Copy link
Contributor

So I'm seeing an increase too, and only with the async extension. I assume you're setting StrawberryIntegration(async_execution=True) explicitly? If so, can you try with async_execution=False? This makes the extension much faster for me even when using strawberry-django's AsyncGraphQLView.

Also, could you post your sentry_sdk.init? Are you using performance/tracing?

@getsantry getsantry bot moved this to Waiting for: Community in GitHub Issues with 👀 3 Oct 22, 2024
@pcraciunoiu
Copy link
Author

@pcraciunoiu Happy to look at the Sentry link -- you can send it to ivana.kellyer (at) sentry.io.

Just sent you an email

So I'm seeing an increase too, and only with the async extension. I assume you're setting StrawberryIntegration(async_execution=True) explicitly? If so, can you try with async_execution=False? This makes the extension much faster for me even when using strawberry-django's AsyncGraphQLView.

I can confirm it's much faster with async_exection=False, but it's still 200ms slower. My test case is 2000 items with a related nested item through a data loader, let's pretend it's this:

query Articles {
   articles({ pagination: { limit: 2000 }) {
     id
     title
     status
     author {  # data loader here
        id
        firstName
     }
  }
}

This is what I'm seeing (roughly from running each 10x)

  • No integration - average 700ms
  • With integration but no async execution - average 900ms
  • With integration and async execution - average 1,500ms

I assumed that async execution is required to use in an async environment, but I guess that's not true? Still, there is a noticeable slowdown.

@getsantry getsantry bot moved this from Waiting for: Community to Waiting for: Product Owner in GitHub Issues with 👀 3 Oct 24, 2024
@pcraciunoiu
Copy link
Author

Also, could you post your sentry_sdk.init? Are you using performance/tracing?

Here it is. I've tried both with/without sample rates for traces/profiles. Doesn't seem to matter, but that's probably because the DSN string is empty.

sentry_sdk.init(
    dsn=SENTRY_DSN,
    integrations=[
        DjangoIntegration(),
        CeleryIntegration(),
        LoggingIntegration(level=logging.INFO),
       #  StrawberryIntegration(async_execution=True),
    ],
    auto_enabling_integrations=False,
    release=GIT_SHA,
    environment=ENV,
    send_default_pii=True,
    # traces_sample_rate=0.5,
    # relative to traces sample rate
    # profiles_sample_rate=0.5,
)

@szokeasaurusrex
Copy link
Member

Does setting the SENTRY_DSN to your actual DSN make a difference?

@pcraciunoiu
Copy link
Author

Does setting the SENTRY_DSN to your actual DSN make a difference?

It does not seem to make a difference from my quick tests

@getsantry getsantry bot moved this to Waiting for: Product Owner in GitHub Issues with 👀 3 Oct 31, 2024
@sentrivana
Copy link
Contributor

The profile in Sentry looks thoroughly uneventful, but the pyinstrument profile you attached is interesting. Seems like there's a lot of time spent waiting on threading locks/conditions. Don't know if that's what's causing the slowness but we can verify that.

Would you be able to get us a pyinstrument profile like the one above for the query:

  1. without Sentry
  2. with Sentry and Strawberry integration enabled sync
  3. with Sentry and Strawberry integration enabled async
  4. optionally also with Sentry but Strawberry integration disabled (just with disabled_integrations=[StrawberryIntegration()] (see here) so that you still get all the other integrations that would be enabled in options 2 and 3)

We could then see what stands out.

Feel free to attach here or send them to me via email.

@getsantry getsantry bot removed the status in GitHub Issues with 👀 3 Nov 4, 2024
@stephanie-anderson stephanie-anderson removed the Type: Bug Something isn't working label Nov 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: No status
Development

No branches or pull requests

4 participants