Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move runtime metrics reporting to a separate timer routine. #294

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

cory-stripe
Copy link
Contributor

@cory-stripe cory-stripe commented Oct 19, 2017

Summary

Move runtime metric reporting to a separate ticker and goroutine.

Motivation

At present our emission of runtime metrics — gc, heap etc — is tied directly to the flusher. This seems like a bad mix of concerns. It also means that our runtime metrics are tied directly to our flush "tick". In a future where this is adjustable or non-existent we need a plan!

Notes

I also added the current goroutine count, which seems like good information in case we leak goroutines.

Test plan

Existing tests. We don't test these now and this doesn't change that.

r? @stripe/observability

@cory-stripe cory-stripe force-pushed the cory-runtime-reporting-thread branch from b374b33 to 80b93e0 Compare October 19, 2017 15:34
@stripe-ci
Copy link

Gerald Rule: Copy Observability on Veneur and Unilog pull requests

cc @stripe/observability
cc @stripe/observability-stripe

@an-stripe
Copy link
Contributor

r? @stripe/observability

I don't think I have the state + focus to do this during /dev/start, sorry!

@an-stripe an-stripe removed their assignment Oct 19, 2017
@an-stripe
Copy link
Contributor

r? @stripe/observability

@cory-stripe cory-stripe force-pushed the cory-runtime-reporting-thread branch from 80b93e0 to 3316668 Compare October 20, 2017 21:11
@cory-stripe
Copy link
Contributor Author

cory-stripe commented Oct 20, 2017

Hey @aditya-stripe, this one is ready. Note that the intervals for this and the flush are ~synchronized and it should do what you suggested in a separate PR with regards to being at a high water mark. I would argue that it's important it be separated so as not to just give us the skewed, flush-time numbers. Having a separate ticker without the "bucketing" behavior should give us something that has a bit of periodicity to it which I think is an improvement.

@cory-stripe cory-stripe changed the title Move runtime metrics reporting to a seprate timer routine. Move runtime metrics reporting to a separate timer routine. Oct 23, 2017
@ChimeraCoder
Copy link
Contributor

At the moment, we're collecting these metrics immediately after each flush is initiated. This may or may not actually be the exact peak (or nadir) for metrics like memory usage, because of timing jitter as goroutines are scheduled. Though it is likely to be close, and at the very least, highly-correlated with the higher end of the distribution.

In reality, what we want is a measurement of the full range of the distribution, because these will change over the span of the 10-second flush cycle. We'll want to choose something that's less than 5 seconds, like 3s or 4s.

And if we're emitting these metrics multiple times per flush cycle, we'll also want to change them to be histograms, not gauges.

So, lgtm for now, but let's file a ticket to update this as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants