-
Notifications
You must be signed in to change notification settings - Fork 474
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Numbers are crazy in diskstats plugin after reboot #426
Comments
"After a node restart", is "node" the machine running munin-node, or is it the "munin-node" program? |
Machine restart. Restarting munin_node only doesn't seem to cause any issues. |
It looks like a consequence of how the plugin reports its data. When the machine boots, the disk IO counter resets. The plugin then reports wrong numbers to the munin master. This is visible as large spikes in the graphs. A fix for this plugin may be to:
|
I'm not totally sure on this diagnosis, but commit 0d7505f, and possibly also 768894f, look suspicious- I don't remember diskstats always doing this, and that commit lies between the 2.0.21 and 2.0.22 releases which weren't all that long ago. It also roughly matches when the spikes start showing up on this yearly graph: Let me know what I can do to help; I'm able to dive in a bit, I can definitely test patches, but am no munin internals expert. |
Loosely related, but i created issue munin-monitoring/munin-c#29 to rewrite that plugin. Coupled with issue munin-monitoring/munin-c#30, it might be very interesting. |
* munin-monitoring#426 * check /proc/uptime to detect system reboot * use uptime second instead of interval if uptime < interval * reset all previous status values to zero on reboot
… reboot) diskstats ver 2.0.22 and later gives weird numbers on some entries after system reboot - how I fix - check /proc/uptime to detect system reboot - use uptime second instead of interval if uptime < interval - reset all previous status values to zero if uptime < interval fundamental solution should be as munin-monitoring#426 (comment) but it might be significant rewrite
Both the monitoring server and the node are running Arch Linux, x86_64, munin version 2.0.25. This is after a node restart.
You can see pretty obviously in the graph below that things aren't acting right. This hasn't always been the case in the 2.0.X series, but I have noticed it for at least a little while doing this:
It looks like the problem is limited to the diskstats plugin? I see issues in these (screenshots of all above):
Not showing it:
The text was updated successfully, but these errors were encountered: