Add metrics for monitoring the broker and connection status #22

daviddetorres · 2019-11-10T23:23:42Z

As proposed in issue #12 , it is needed a way to check the availability and health of the broker in order to be able to raise alarms in case of broker down or malfunction.

This is due to the metrics shown are the last one received, but as they are served as a push service the exporter acts as a proxy and in case of disconnection the exporter will continue showing results.

Two scenarios are contemplated:

Lost of connection with the broker: new metric "broker_connection_up" (0 if down, 1 if up)
Connection is ok, but the broker does not send updates due to problems in the queue, blockage, etc. Added new metric "seconds_since_last_update". (-1 if never received an update and > 0 since first update, restarting to 0 after every update from the broker).

- Added metric Gauge "up" for the state of teh connection with the broker - Added get dependencies in the Makefile

- Take out the connection loop to a function and called also in lost of connection - Changed the name of the up metric to "broker_connection_up"

metric

- Added metric - Added functions for increase and reset to zero that metric - Added ID to exporter mqtt client - change name of metric broker up/down to broker_connection_up - Launch first connection in independent thread to be able to start gathering metrics before connection (like status of connection)

forsberg · 2020-05-26T13:45:54Z

main.go

-		gaugeMetrics["up"].Set(0)
+		gaugeMetrics["broker_connection_up"].Set(0)
+		// try to reconnect
+		mqttConnect()


Is this actually required? When connecting after having used NewClientOptions, AutoReconnect is set to true.

I don't know for sure, but I think the code will reconnect by itself even without this code.

I don't think it's the exporter's responsibility (or even prometheus's responsibility) to care if the broker is up.

If we're not scraping, there's a problem. If the scrape is old, there's a problem. Don't add dimensions or complexity to it.

It's really an anti-pattern for exporters to use up. In the event THIS exporter breaks, the last value up is true.

Really, it's the orchestrator's problem if it's down (e.g. Docker). Your Orchestrator dashboard should see a rise in failing containers.

You don't even have to publish a last_scrape_time, metrics would not be coming in, set your alerts there!

daviddetorres added 4 commits November 10, 2019 23:22

Added metric "up" for teh broker

ff57fb1

- Added metric Gauge "up" for the state of teh connection with the broker - Added get dependencies in the Makefile

Added reconnection after loosing connection with broker

9b519eb

- Take out the connection loop to a function and called also in lost of connection - Changed the name of the up metric to "broker_connection_up"

Added client to connect function

93a4532

Created functions for the registration of the seconds_since_last_update

72173bc

metric

daviddetorres changed the title ~~WIP: Add metrics for monitoring the broker connection status~~ WIP: Add metrics for monitoring the broker and connection status Nov 11, 2019

daviddetorres changed the title ~~WIP: Add metrics for monitoring the broker and connection status~~ Add metrics for monitoring the broker and connection status Nov 11, 2019

daviddetorres marked this pull request as ready for review November 11, 2019 22:16

daviddetorres mentioned this pull request Nov 12, 2019

Changing naming of counter metrics to comply with the Prometheus naming conventions #23

Open

5 tasks

Fixed Makefile

dacd42a

forsberg reviewed May 26, 2020

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add metrics for monitoring the broker and connection status #22

Add metrics for monitoring the broker and connection status #22

daviddetorres commented Nov 10, 2019 •

edited

Loading

forsberg May 26, 2020

jnovack Jun 20, 2020 •

edited

Loading

Add metrics for monitoring the broker and connection status #22

Are you sure you want to change the base?

Add metrics for monitoring the broker and connection status #22

Conversation

daviddetorres commented Nov 10, 2019 • edited Loading

forsberg May 26, 2020

Choose a reason for hiding this comment

jnovack Jun 20, 2020 • edited Loading

Choose a reason for hiding this comment

daviddetorres commented Nov 10, 2019 •

edited

Loading

jnovack Jun 20, 2020 •

edited

Loading