Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API Freezing When Using Sentry ASP.NET Core with Quartz Scheduler #3774

Closed
Victorvhn opened this issue Nov 20, 2024 · 7 comments
Closed

API Freezing When Using Sentry ASP.NET Core with Quartz Scheduler #3774

Victorvhn opened this issue Nov 20, 2024 · 7 comments

Comments

@Victorvhn
Copy link
Contributor

Package

Sentry.AspNetCore

.NET Flavor

.NET Core

.NET Version

8.0.0

OS

Linux

SDK Version

4.13.0

Self-Hosted Sentry Version

No response

Steps to Reproduce

Context

I recently added Sentry to all APIs in my company. For most services, it works as expected. However, in one specific service, I encountered a problem. This service uses Quartz to schedule a worker that runs every second to process outbox messages. After adding Sentry, the API freezes and becomes completely unresponsive. For example:

  • Calling /healthz results in an infinite load.
  • Nothing is logged to the console.

Workaround

After multiple tests, I added a "warmup" period to the Quartz job. Instead of starting immediately on API startup, the job now starts 15 seconds after the application starts. With this change, the API works normally, and the issue no longer occurs.

Hypothesis

I suspect this issue might be related to excessive logging or resource contention during startup, potentially causing the Sentry SDK to block the main thread. However, I lack sufficient knowledge of the internals to confirm this.

Steps to Reproduce

  1. Create an ASP.NET Core API with the following dependencies:
    • Sentry.AspNetCore 4.13.0
    • Serilog 4.1.0
    • OpenTelemetry 1.9.0
    • Quartz 3.13.1
  2. Configure Serilog
       webApplicationBuilder.Host
           .UseSerilog((_, configureLogger) =>
           {
               configureLogger.WriteTo
                   .Sentry(
                       minimumEventLevel: LogEventLevel.Verbose,
                       minimumBreadcrumbLevel: LogEventLevel.Verbose,
                       formatProvider: CultureInfo.InvariantCulture
                   );
    
               // The rest of serilog config...
           });
  3. Configure Sentry
       webApplicationBuilder.Services
           .AddOpenTelemetry()
           .ConfigureResource(resourceBuilder =>
           {
               resourceBuilder
                  .AddService("MyApi")
                  .AddTelemetrySdk();
           })
          .WithMetrics(meterProviderBuilder =>
          {
              meterProviderBuilder
                  .AddAspNetCoreInstrumentation()
                  .AddMeter(
                      "Microsoft.AspNetCore.Hosting",
                      "Microsoft.AspNetCore.Server.Kestrel"
                   )
                  .AddHttpClientInstrumentation()
                  .AddRuntimeInstrumentation();
          })
         .WithTracing(tracerProviderBuilder =>
         {
             tracerProviderBuilder
                 .AddAspNetCoreInstrumentation()
                 .AddHttpClientInstrumentation()
                 .AddAWSInstrumentation()
                 .AddXRayTraceId()
                 .AddSentry();
        });
    
       webApplicationBuilder.WebHost
           .UseSentry(options =>
           {
               options.Dsn = "my-dsn";
    
               options.SendDefaultPii = true;
               options.TracesSampleRate = 1.0f;
               options.ProfilesSampleRate = 1.0f;
    
               options.AddDiagnosticSourceIntegration();
               options.UseOpenTelemetry();
    
               options.ExperimentalMetrics = new ExperimentalMetricsOptions
               {
                    EnableCodeLocations = true,
                    CaptureSystemDiagnosticsMeters = BuiltInSystemDiagnosticsMeters.All
               };
               options.AddIntegration(new ProfilingIntegration());
           });
  4. Configure Quartz
            webApplicationBuilder.Services.AddQuartz(configurator =>
            {
                var scheduler = Guid.NewGuid();
                configurator.SchedulerId = $"default-id-{scheduler}";
                configurator.SchedulerName = $"default-name-{scheduler}";
            });
    
            webApplicationBuilder.Services.AddQuartzHostedService(options => options.WaitForJobsToComplete = true);
    
            webApplicationBuilder.Services.ConfigureOptions<ConfigureProcessOutboxJob>();
  5. Create worker
       internal sealed class ConfigureProcessOutboxJob() : IConfigureOptions<QuartzOptions>
       {
           public void Configure(QuartzOptions options)
           {
               var jobName = typeof(ProcessOutboxJob).FullName!;
    
               options
                   .AddJob<ProcessOutboxJob>(configure => configure.WithIdentity(jobName))
                   .AddTrigger(configure =>
                        configure
                            .ForJob(jobName)
                            // Below is the solution I found to get things working
                            .StartAt(DateTimeOffset.UtcNow.AddSeconds(15))
                            .WithSimpleSchedule(schedule =>
                                 schedule.WithIntervalInSeconds(1).RepeatForever()));
           }
        }

Expected Result

The API should respond to requests normally, and all logs should be written to the console/sentry.

Actual Result

The API becomes unresponsive, with /healthz loading indefinitely. No logs are written to the console during this state.

@jamescrosswell
Copy link
Collaborator

Thanks for the report @Victorvhn.

There are quite a few moving parts there (OpenTelemetry, Serilog, ASP.NET Core and Quartz). It'd be interesting to know whether the issue manifests without one/some of those. For example, if not using OpenTelemetry, do you still get it? And if so, how about if also not using Serilog... etc.

In any event, we'll see if we can reproduce as a starting point....

@Victorvhn
Copy link
Contributor Author

Victorvhn commented Nov 21, 2024

I agree with you. We tried to make it as platform-agnostic as possible, and we might have overcomplicated things. Anyway, while doing my tests, I tried removing Serilog, and it worked. I left just OpenTelemetry + Sentry exporter + Quartz, and the system was working. I also removed OpenTelemetry and Sentry, leaving just Serilog + Quartz, and it also worked (though, of course, without writing traces to Sentry). One important thing to note, as I mentioned, is that this issue only occurs in the service that starts performing background tasks as soon as the service starts. For now, the approach that worked best, without needing to make any changes to Serilog, OpenTelemetry, or Sentry, was to delay the start of the Quartz worker. I’ll try to do more testing when I have time

@getsantry getsantry bot moved this to Waiting for: Product Owner in GitHub Issues with 👀 3 Nov 21, 2024
@jamescrosswell
Copy link
Collaborator

@Victorvhn I'm having a bit of difficulty reproducing this. Where does the ProcessOutboxJob type come from?

@Victorvhn
Copy link
Contributor Author

Victorvhn commented Nov 25, 2024

Oh, sorry for that. I omitted this information. You can see what it is in the code below. It's basically the worker that will run every second. It gets a batch of entries from the database and processes them:

[DisallowConcurrentExecution]
internal sealed class ProcessOutboxJob(
	IServiceScopeFactory serviceScopeFactory,
	DbContext dbContext,
	IUnitOfWork unitOfWork,
	IOptions<OutboxOptions> outboxOptions,
	ILogger<ProcessOutboxJob> logger)
	: IJob
{
	public async Task Execute(IJobExecutionContext context)
	{
		logger.LogDebug("Beginning to process outbox messages");

		await unitOfWork.BeginTransactionAsync(context.CancellationToken);

		var outboxMessages = await GetOutboxMessagesAsync(context.CancellationToken);

		foreach (var outboxMessage in outboxMessages)
		{
			var domainEvent = JsonConvert.DeserializeObject<IDomainEvent>(
				outboxMessage.Content,
				SerializerSettings.Instance)!;

			var handlerTypes = DomainEventHandlersFactory.GetOutboxMessageHandlerTypes(
				domainEvent.GetType(),
				Application.AssemblyReference.Assembly);

			await Parallel.ForEachAsync(
				handlerTypes,
				context.CancellationToken,
				async (handlerType, cancellationToken) =>
				{
					await using var scope = serviceScopeFactory.CreateAsyncScope();

					var handler = DomainEventHandlersFactory.GetHandler(handlerType, scope.ServiceProvider);
					try
					{
						await handler.Handle(domainEvent, cancellationToken);
					}
					catch (Exception exception)
					{
						logger.LogError(
							exception,
							"Exception while processing outbox message {MessageId} - {Handler}",
							outboxMessage.Id, handlerType.FullName);
					}
				});

			await UpdateOutboxMessageAsync(outboxMessage, context.CancellationToken);
		}

		await unitOfWork.CommitTransactionAsync(context.CancellationToken);

		logger.LogDebug("Completed processing outbox messages");
	}

	private async Task<IReadOnlyList<OutboxMessageResponse>> GetOutboxMessagesAsync(CancellationToken cancellationToken = default)
	{
		var outboxMessages = await dbContext
			.OutboxMessages
			.FromSql(
				$"""
                         SELECT
                             Id AS Id,
                             Content AS Content
                         FROM outbox_messages
                         WHERE ProcessedOnUtc IS NULL
                         ORDER BY OccurredOnUtc
                         LIMIT {outboxOptions.Value.BatchSize}
                         FOR UPDATE SKIP LOCKED
                 """)
			.Select(entity => new OutboxMessageResponse(entity.Id, entity.Content))
			.ToListAsync(cancellationToken);

		return outboxMessages.AsReadOnly();
	}

	private async Task UpdateOutboxMessageAsync(OutboxMessageResponse outboxMessage, CancellationToken cancellationToken = default)
	{
		await dbContext
			.OutboxMessages
			.Where(message => message.Id == outboxMessage.Id)
			.ExecuteUpdateAsync(setters => setters.SetProperty(s => s.ProcessedOnUtc, DateTime.UtcNow), cancellationToken: cancellationToken);

		await dbContext.SaveChangesAsync(cancellationToken);
	}

	private sealed record OutboxMessageResponse(Guid Id, string Content);
}

@getsantry getsantry bot moved this to Waiting for: Product Owner in GitHub Issues with 👀 3 Nov 25, 2024
@jamescrosswell
Copy link
Collaborator

jamescrosswell commented Nov 25, 2024

OK thanks @Victorvhn - it's a bit hard to replicate everything from databases to AWS etc. I've created a dummy job so that I can get it to compiled. I'm not able to reproduce the error you're running into though (with the API freezing). See:

And in particular this line, where I've disabled your workaround:
https://github.com/jamescrosswell/ApiFreeze/blob/70eeb2271345636a496eaa8c9863ba05763cfa1f/ConfigureProcessOutboxJob.cs#L18

Running this locally however, the api runs fine and continues to serve requests on https://localhost:7082/weatherforecast

If you're able to tweak that project such that it reproduces the issue you're describing or if you can create another, that I can easily run here, I can take a look. Otherwise it's pretty hard to say what might be happening.

@Victorvhn
Copy link
Contributor Author

I created a new Web API and set everything up, and to my surprise, it's working! So the issue is likely very specific to the API where I'm encountering the problem.

For now, I'll continue using the 15-second delay to start the job for this particular API until I have more time to run further tests.

If I need additional assistance, I'll reach out. Thank you so much for your time, I truly appreciate it!

@jamescrosswell
Copy link
Collaborator

All good - I'll close for now but feel free to reopen if you can provide a way to reproduce the issue.

@github-project-automation github-project-automation bot moved this to Done in GDX Nov 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Archived in project
Development

No branches or pull requests

3 participants