Numbers are crazy in diskstats plugin after reboot #426

toofishes · 2015-04-07T21:48:07Z

Both the monitoring server and the node are running Arch Linux, x86_64, munin version 2.0.25. This is after a node restart.

You can see pretty obviously in the graph below that things aren't acting right. This hasn't always been the case in the 2.0.X series, but I have noticed it for at least a little while doing this:

It looks like the problem is limited to the diskstats plugin? I see issues in these (screenshots of all above):

diskstats_iops
diskstats_utilization
diskstats_throughput

Not showing it:

iostat
if_, if_err_
forks
irqstats

ssm · 2015-04-07T21:50:48Z

"After a node restart", is "node" the machine running munin-node, or is it the "munin-node" program?

toofishes · 2015-04-07T21:55:18Z

Machine restart. Restarting munin_node only doesn't seem to cause any issues.
I do notice that the diskstats plugin seems to be the only thing keeping data in /var/lib/munin/plugin-state/nobody/ on the host node machine, not sure if that helps or is just a red flag.
Let me know if dumps or anything from the RRD files themselves would be helpful too, or the plugin config, I'm happy to provide whatever needed.

ssm · 2015-04-07T22:21:06Z

It looks like a consequence of how the plugin reports its data.

When the machine boots, the disk IO counter resets. The plugin then reports wrong numbers to the munin master. This is visible as large spikes in the graphs.

A fix for this plugin may be to:

Change the RRD Data Source Type from GAUGE to DERIVE
Report the numbers directly instead of calculating a delta from the last number stored in the state file
Remove the use of the state file from the plugin entirely, unless something else in there needs it.

toofishes · 2015-04-07T22:35:42Z

I'm not totally sure on this diagnosis, but commit 0d7505f, and possibly also 768894f, look suspicious- I don't remember diskstats always doing this, and that commit lies between the 2.0.21 and 2.0.22 releases which weren't all that long ago. It also roughly matches when the spikes start showing up on this yearly graph:

Let me know what I can do to help; I'm able to dive in a bit, I can definitely test patches, but am no munin internals expert.

steveschnepp · 2015-04-08T06:11:52Z

Loosely related, but i created issue munin-monitoring/munin-c#29 to rewrite that plugin.

Coupled with issue munin-monitoring/munin-c#30, it might be very interesting.

* munin-monitoring#426 * check /proc/uptime to detect system reboot * use uptime second instead of interval if uptime < interval * reset all previous status values to zero on reboot

… reboot) diskstats ver 2.0.22 and later gives weird numbers on some entries after system reboot - how I fix - check /proc/uptime to detect system reboot - use uptime second instead of interval if uptime < interval - reset all previous status values to zero if uptime < interval fundamental solution should be as munin-monitoring#426 (comment) but it might be significant rewrite

ssm added the [component] plugins label Apr 7, 2015

ssm added this to the 2.0.26 milestone Apr 7, 2015

ssm added the [type] bug label Apr 7, 2015

mittyorz mentioned this issue Apr 26, 2015

fix issues:426 (Numbers are crazy in diskstats plugin after reboot) #458

Merged

ssm added the [affects] 2.0-stable label Sep 13, 2015

sumpfralle removed this from the 2.0.26 milestone Mar 6, 2018

steveschnepp closed this as completed in bd6f37b Jun 20, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Numbers are crazy in diskstats plugin after reboot #426

Numbers are crazy in diskstats plugin after reboot #426

toofishes commented Apr 7, 2015

ssm commented Apr 7, 2015

toofishes commented Apr 7, 2015

ssm commented Apr 7, 2015

toofishes commented Apr 7, 2015

steveschnepp commented Apr 8, 2015

Numbers are crazy in diskstats plugin after reboot #426

Numbers are crazy in diskstats plugin after reboot #426

Comments

toofishes commented Apr 7, 2015

ssm commented Apr 7, 2015

toofishes commented Apr 7, 2015

ssm commented Apr 7, 2015

toofishes commented Apr 7, 2015

steveschnepp commented Apr 8, 2015