Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Combine global/local logic into one unified flush and parallelize sinks. #292

Merged
merged 4 commits into from
Oct 25, 2017

Conversation

cory-stripe
Copy link
Contributor

Summary

Improve the Server's Flush work from calling FlushLocal and FlushGlobal, which was a poorly compartmentalized implementation that led to code duplication and confusion.

Motivation

At the end of the Datadog sink PR we are in a world with a single, hardcoded sink. This isn't the intent! We want to convert plugins to sinks and execute all of them.

This patch aims to do just that, but also takes an opportunity to refactor away the FlushLocal and FlushGlobal functions. This was done by first inlining them, removing duplicate code, and then refactoring the special cases away. Notably, the tallyMetrics size calculation.

Notes

  • This is intended for post 1.7 so >= 1.8
  • At present DD is still the only metric sink, as such I didn't add any tests of consequence because the existing tests do the thing. We'll add "did all sinks run" as a test when we introduce the next patches.
  • I added some comments to improve what flush does. Right now it's a rat's nest. This improves the rat's nest incrementally and avoids some tremendously big PR.
  • I left a half-ass set of context.TODO() sprinkled around because we want to implement a deadline for sinks. I will fix this.

Test plan

Existing tests.

r? @aditya-stripe

flusher.go Outdated
span := tracer.StartSpan("flush").(*trace.Span)
defer span.Finish()

// TODO Move this to an independent ticker routine or something?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, I don't think we can do this in a separate goroutine, because we want this to be reporting the memory at the time of maximum usage (right before we clear and flush everything).

The alternate way of looking at it: we need to do it in a separate ticker, and choose our interval for that ticker such that we are sampling the underlying, sinusoidal data accurately.

Either way, this would also be a great place to report softnet data, once this is merged: #291

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is separately addressed in #294

flusher.go Outdated

go s.metricSinks[0].FlushEventsChecks(span.Attach(ctx), events, checks) // we can do all of this separately
go s.flushTraces(span.Attach(ctx))
go s.flushTraces(span.Attach(ctx)) // this too!
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this comment no longer makes sense?

flusher_test.go Outdated
@@ -174,7 +175,7 @@ func testFlushTraceDatadog(t *testing.T, protobuf, jsn io.Reader) {
assert.NoError(t, err)

server.HandleTracePacket(packet)
server.Flush()
server.Flush(context.TODO())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we actually want context.Background() for all of these, no?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ha! Yes, I was doing TODO out of habit for a temporary fix when in fact Background was the permanent answer here. :) Thanks!

flusher.go Outdated
go func(ms metricSink) {
err := ms.Flush(span.Attach(ctx), finalMetrics)
if err != nil {
log.WithError(err).WithField("sink", ms.Name()).Warn("Error flushin sink")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

flushing :)

for _, sink := range s.metricSinks {
wg.Add(1)
go func(ms metricSink) {
err := ms.Flush(span.Attach(ctx), finalMetrics)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that this is happening asynchronously, we'll want to make doubly-sure that our interface definition specifies that the Flush method is not allowed to mutate the slice it receives.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely! How do we do that? :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd add it to the interface definition in metric_sink.go.

@stripe-ci
Copy link

Gerald Rule: Copy Observability on Veneur and Unilog pull requests

cc @stripe/observability
cc @stripe/observability-stripe

@cory-stripe
Copy link
Contributor Author

re-r? @aditya-stripe

plugin_test.go Outdated
@@ -82,7 +82,7 @@ func TestGlobalServerPluginFlush(t *testing.T) {
})
}

f.server.Flush()
f.server.Flush(context.TODO())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: we also want Background() here too. context.TODO is intended for functions that are not yet fully traced, and for static analysis tools that can track whether a context originates with a TODO or a BACKGROUND.

@ChimeraCoder
Copy link
Contributor

One comment. lgtm after that, so feel free to self-approve :)

@cory-stripe
Copy link
Contributor Author

Self-approval post rebase

@cory-stripe cory-stripe merged commit 3163ea5 into master Oct 25, 2017
@cory-stripe cory-stripe deleted the cory-metric-sink-cleanup branch November 22, 2017 19:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants