Monitoring

The technical monitoring of our infrastructure is based on:

UpTime Robot for external monitoring of our web properties
Grafana for monitoring of our servers

Grafana

We use a Free Grafana Cloud instance. Our Grafana Cloud instance is https://kiwixorg.grafana.net/. This instance is configured only for k8s logs and metrics.

k8s configuration

Configuration has been done based on https://grafana.com/docs/grafana-cloud/monitor-infrastructure/kubernetes-monitoring/configuration/config-k8s-agent-flow/.

Architecture:

Grafana Cloud provides us:
- a Grafana instance displaying dashboards
- a Prometheus instance: scrape / store metrics + respond to queries
- a Loki instance for logs : store logs + respond to queries
Our cluster is hosting in the monitoring namespace:
- kube-state-metrics: service that listens to the Kubernetes API server and generates metrics about the state of the objects
- opencost: measures infrastructure costs
- prometheus-operator-crd: operator to configure Prometheus based on k8s resources
- prometheus-node-exporter: running on each k8s node, grabs metrics at the node level
- grafana-agent: agent grabing metrics (from kube-state-metrics, node-exporter, kubelet, cadvisor, opencost) and sending them to Prometheus
- grafana-agent-log: same binary as above, but grabing logs (Pods + Cluster events) and sending them to Loki

⚠️ Since for now we use k8s 1.23, kube-state-metrics version 2.4.2 is adapted (next versions of kube-state-metrics is not aligned).

other servers (workers, ...)

ToDo

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Monitoring

Grafana

k8s configuration

other servers (workers, ...)

Clone this wiki locally