Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

duplicate metrics for "auth" system #298

Open
macb opened this issue Oct 6, 2016 · 7 comments
Open

duplicate metrics for "auth" system #298

macb opened this issue Oct 6, 2016 · 7 comments

Comments

@macb
Copy link

macb commented Oct 6, 2016

I would expect dns_error_count_total to be a subset of dns_request_duration_seconds_count as dns_request_duration_seconds_count should include the non-error requests in addition to those that happened to be errors. However, it seems like dns_error_count_total includes about twice as many requests.

ie I have
2.5m auth sum(skydns_skydns_dns_request_duration_seconds_count) by (system)
5m auth sum(skydns_skydns_dns_error_count_total) by (system)

Tracing through the code I was thinking maybe this may be involved as it seems to be the only place an error is recorded without a duration to go along side it and would be tagged as system=auth.

@macb macb changed the title seemingly duplicate metrics duplicate metrics for "auth" system Oct 6, 2016
@miekg
Copy link

miekg commented Oct 7, 2016

[ Quoting [email protected] in "[skynetservices/skydns] seemingly d..." ]

I would expect dns_error_count_total to be a subset of dns_request_duration_seconds_count as dns_request_duration_seconds_count should include the non-error requests in addition to those that happened to be errors. However, it seems like dns_error_count_total includes about twice as many requests.

ie I have
2.5m dns_request_duration_seconds_count
5m dns_error_count_total

Tracing through the code I was thinking maybe this may be involved as it seems to be the only place an error is recorded without a duration to go along side it.

Have you seen CoreDNS? (github.com/miekg/coredns) which uses a middleware
approach to build a cleaner SkyDNS (among other things).

CoreDNS is waaaaaay newer than SkyDNS, but the code is cleaner IMHO.

@macb
Copy link
Author

macb commented Oct 7, 2016

Using skydns as part of the kube-dns addon so came across this (which arguably may be caused by some additional caching that pod does with dnsmasq). I'm not sure if they're planning on switching that piece out any time soon, maybe theres a plan for that!

@xvello
Copy link

xvello commented Jul 19, 2018

We can replicate this with k8s.gcr.io/k8s-dns-kube-dns-amd64:1.14.10, that vendors in the affected code. So does #337

I think @macb 's investigation is right. From what I understand of the code, for each auth error, we:

@miekg could you please confirm that is the case, or if we are missing something?

@miekg
Copy link

miekg commented Jul 19, 2018 via email

@xvello
Copy link

xvello commented Jul 19, 2018

Thanks for your quick answer @miekg. Could you recommend someone else to investigate this issue with?

@miekg
Copy link

miekg commented Jul 19, 2018 via email

@bketelsen
Copy link
Member

I'm happy to merge a fix for this if someone wants to send a PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants