Skip to content

Monitoring

benoit74 edited this page Sep 1, 2023 · 7 revisions

The technical monitoring of our infrastructure is based on:

  • UpTime Robot for external monitoring of our web properties
  • Grafana for monitoring of our servers

Grafana

We use a Free Grafana Cloud instance. Our Grafana Cloud instance is https://kiwixorg.grafana.net/. This instance is configured only for k8s logs and metrics.

k8s configuration

Configuration has been done based on https://grafana.com/docs/grafana-cloud/monitor-infrastructure/kubernetes-monitoring/configuration/config-k8s-agent-flow/.

Architecture:

  • Grafana Cloud provides us:
    • a Grafana instance displaying dashboards
    • a Prometheus instance: scrape / store metrics + respond to queries
    • a Loki instance for logs : store logs + respond to queries
  • Our cluster is hosting in the monitoring namespace:
    • kube-state-metrics: service that listens to the Kubernetes API server and generates metrics about the state of the objects
    • opencost: measures infrastructure costs
    • prometheus-operator-crd: operator to configure Prometheus based on k8s resources
    • prometheus-node-exporter: running on each k8s node, grabs metrics at the node level
    • grafana-agent: agent grabing metrics (from kube-state-metrics, node-exporter, kubelet, cadvisor, opencost) and sending them to Prometheus
    • grafana-agent-log: same binary as above, but grabing logs (Pods + Cluster events) and sending them to Loki

⚠️ Since for now we use k8s 1.23, kube-state-metrics version 2.4.2 is adapted (next versions of kube-state-metrics is not aligned).

other servers (workers, ...)

ToDo