Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce a new metrics client that supports dependency injection. #577

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

clin-stripe
Copy link
Contributor

@clin-stripe clin-stripe commented Nov 8, 2018

I'd like to propose a metrics client approach that avoids top level functions and allows for dependency injection.

Motivation
We have two prescribed ways of adding metrics today:

a.) creating a span, adding metrics, and finishing it.
b.) creating a SSF.Sample, adding metrics to it, and calling metrics.ReportBatch to submit it.

These approaches are tightly coupled with implementation and impose strict dependencies on clients. It's not possible to replace ReportBatch with a mock object, for example, or a different stats backend. Consider this use case:

type MyTask struct{
    metricsClient metrics
}

type metrics interface {
    Count(string, int, map[string]string)
}

func (t MyTask) DoJob(ctx context.Context) {
    // do things ...
    t.metricsClient.Count("finished", 1)
}

The above code has a couple of nice qualities about it:

a.) It doesn't depend on the metrics implementation in any way. It doesn't even import anything metrics related! We can change out the metrics implementation to a mock for testing, for example. This also facilitates simpler migrations if we need to change out the metrics implementation.

b.) It doesn't need to know anything about tracing. This is important since it's not always practical to start a span in every function. For example, I may want to use runtime.trace instrumentation instead of distributed tracing for in-process traces. Moreover, extracting spans from context is very fragile. If DoJob's caller isn't using veneur's tracing library, for example, we can't use that approach to insert metrics.

c.) We don't lose any of the functionality we had before. If we want to add metrics to a span, whoever constructs MyTask needs to inject a SSFAddingClient configured with the trace we want to attach to. This is entirely opaque to MyTask's code.

This approach is much more idiomatic golang and affords us greater flexibility.

By contrast, this is what today's code looks like:

type MyTask struct{}

type metrics interface {
    Count(string, int, map[string]string)
}

// either this, which can't be mocked
func (t MyTask) DoJob(ctx context.Context) {
    // do things ...
    sample := &ssf.SSFSample{}
    sample.Add(...)
    metrics.BatchReport(...)
}

// or this
func (t MyTask) DoJob(ctx context.Context) {
    span, ok := opentracing.SpanFromContext(ctx).(*trace.Trace)
    if !ok {
        // no access to a span so... do we create a new one? but what if there's good reasons not to here?
    }
}

For now, I've hidden the implementation inside internal/ because I don't want to make API commitments to external users until we have some experience and can finalize the API.

For a bit more context on why this is the preferred idiom in Go, I wrote about dependency injection and interface use here: https://docs.google.com/document/d/1phD0rjOtmOg2nwponuiXmM02YTpYUtW_KmTun_b_UTo/edit#heading=h.z2a5c0bfp590

r? @stripe/observability
r? @ChimeraCoder @asf-stripe @cory-stripe

I'd like to propose a metrics client approach that avoids top level
functions and allows for dependency injection.
@stripe-ci
Copy link

Gerald Rule: Copy Observability on Veneur, Unilog, Falconer pull requests

cc @stripe/observability
cc @stripe/observability-stripe

@asf-stripe
Copy link
Contributor

I like the way you think, but I would much rather go a different direction. Some reasons:

  • I super disagree with the "not having to know about tracing" point - the trace span is the atomic unit of the SSF philosophy, and we should embrace it, or completely throw out SSF (though I think it's the right level of abstraction).

  • I feel pretty attached to this interface, partially because I wrote it (lol) but also because I genuinely think it's the one that gives users the most control what happens with their traces. That may again be because I wrote it, but hm.

  • We do lose some functionality we had before - with the SSF client, you can report a batch of metrics, which (at least as far as I know) reduces the amount of protobuf-encoding and syscalling we do to in order to submit these things. This is not a thing that we would need to worry about in tests, but we do have services currently in use that spew enough data that veneur's syscall volume itself consumes almost a whole cpu core. Adding more wouldn't be great.

  • We can do it without changing the interface out at all - you could just as well write a statsd backend for the currently-existing trace client, for use with NewBackendClient, which is a thing that we already use to construct trace clients for tests. IN FACT, that could be used to really really good effect in the span-processing internals, where we have previously had to be very careful so we don't write-amplify. Something like:

// NewStatsdOnlyBackend returns a statsd backend that can be used with
// veneur's trace.NewBackendClient to construct a trace client that
// reports only metrics, through the statsd protocol.
func NewStatsdOnlyBackend(c *statsd.Client) ClientBackend {
	return &statsdBackend{
		c: c,
	}
}

// Close is a no-op.
func (sb *statsdBackend) Close() error {
	return nil
}

// SendSync reports every sample on the span as a statsd metric, and
// discards the span data itself.
func (sb *statsdBackend) SendSync(ctx context.Context, span *ssf.SSFSpan) error {
	for _, s := range span.Metrics {
		srMultiplier := 1 / s.SampleRate

		tags := make([]string, 0, len(s.Tags))
		for tn, tv := range s.Tags {
			tags = append(tags, fmt.Sprintf("%s:%s", tn, tv))
		}

		switch s.Metric {
		case ssf.SSFSample_COUNTER:
			sb.c.Count(s.Name, int64(s.Value*srMultiplier), tags, 1)
		case ssf.SSFSample_GAUGE:
			sb.c.Gauge(s.Name, float64(s.Value), tags, 1)

			// ...
		}
	}
        return nil
}

...so I think the above would be kinda neat because then we can use trace clients all the way (no more of that statsd lurk) and not worry about write-amplifying.

I should add I'm also not super in love with the interface you wrote up... having to write all these methods seems not-great and adds a lot of overhead for prospective trace client writers, compared the two (with an optional third) that ClientBackend requires...

@clin88
Copy link

clin88 commented Nov 8, 2018

Let's set up some time to discuss, I think some signal is being lost over text.

@cory-stripe
Copy link
Contributor

Coming along a bit late here! :ie: The gist of making this more ergonomic and testable is all marvelous!

If I'm reading this right — and forgive me, I am poor at this when traveling — @asf-stripe's suggestion feels more in line with how Go's HTTP libraries work in that you use the same top level client but swap out the Transport if you want to test. This way clients don't have to make an implementation of the client, only of the backend and we can easily create a testable one you can plop in!

But, to reiterate any improvement here is good!

@clin-stripe
Copy link
Contributor Author

@asf-stripe and I had a good conversation.

We'd like to keep talking about it but we should pace the conversation more deliberately. I don't feel the need to push this now... copying what I wrote in slack:

hey, regarding dependency injection and trace library
asf and i had a good conversation
it sounds like this is a big change for the team
id like to keep having this conversation, but i dont feel the need to introduce this now.
maybe after everyone has commented, we can get the people with the biggest stake into a room and hash it out
the thing i think we need to agree on before hashing out implementation is what problems we’re solving for. what use cases can we anticipate? what does the client have to look like to facilitate this?
ill follow up w people and figure out whats a good pace to have that talk... id like to wait for aditya to be back and i know kris had a proposal for the client lib as well

@asf-stripe asf-stripe assigned clin-stripe and unassigned asf-stripe Nov 12, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants