-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question: statsrelay dropping packet?. #25
Comments
What are you measuring, exactly here? Make sure you are counting received metrics and not received packets. Some implementations don't make that distinction and statsrelay tries to pack UDP packets has much as it can. There are, frankly, a lot of ways we could be leaking UDP packets. Remember, UDP doesn't guarantee delivery, and the StatsD design aims to collect a statistically significant sample of the data points rather than accounting for each and every metric end to end. One of the reasons I wrote this was because the node implementation of Etsy's StatsD is really quite bad at dropping packets. You might want to look at running an implementation that's, uhh, more robust like Statsite. https://github.com/statsite/statsite Ok, let's figure out where you are dropping packets. Look at |
Thanks for your response, In our env, we have statsd installed on all machines and aggregate locally and send to Graphite cluster, most of the applications are autoscaling so we don't need per instance metrics. Currently I'm forwarding a single application[consists of 10-20 ec2 machines] metrics using statsd repeater, so the throghput is not that hight[10k-30k per min], So if you check this graph for API 2xx[application metrics]
And if I stop the statsdrelay and directly forward[statsd repeater -> statsd > graphite] I don't see this drift in the graph, also I don't see any drops in |
I'd agree that the traffic you have here should be low enough to work even in un-tuned environments. What expressions are you graphing in the Grafana graphs? How are you running Statsrelay? What's the script, arguments, options, etc that you are giving Statsrelay? |
Graphite expressions are pretty basic Eg : statsrelay startup script [
statsd config :
|
StatsD binds to 0.0.0.0, but you are binding statsrelay to a specific IP address. I'm wondering if you are perhaps missing packets from a local version of the application here?
What would be helpful is to look at the metrics reported by StatsD and Statsrelay itself and see if the daemons are encountering the same number of metrics. That will give us a better idea about where the leaking is happening. There should be a Likewise, Statsd will have a similar counter that it generates internally and emits that counts the number of metrics it has seen. (And I forget what the metric name is, its been so long since I've used Etsy's StatsD.) These counters over time would be what I would compare to fully understand where the leak is. |
CJ, Those numbers suggest that you are dropping 0.4% of packets. Which is a LOT better than the previous numbers suggesting around 10% drop. My usual goal in a very high throughput StatsD setup was to keep UDP and metric drop below 1%. Have you tried running StatsRelay in verbose mode and see if it is dropping statsd metrics that do not parse correctly? |
I have started testing statsrelay in our envioment by using statsd repeater. Onething I noticed is difference in metrics recieved in graphite vs statsd proxy.
statsd repeater -> statsrelay -> statsd > graphite ^ [currently using one statsrelay and statsd]
And the difference is huge when we add more statsd backends.
The same graphs works fine when I replace statsrelay with statsd. [statsd repeater -> statsd > graphite]
Any thoughts on this @jjneely @szibis
The text was updated successfully, but these errors were encountered: