Monitoring

The technical monitoring of our infrastructure is based on:

UpTime Robot for external monitoring of our web properties
Grafana for monitoring of our servers

Grafana

We use a Free Grafana Cloud instance. Our Grafana Cloud instance is https://kiwixorg.grafana.net/. This instance is configured only for k8s logs and metrics.

k8s configuration

Configuration has been done based on https://grafana.com/docs/grafana-cloud/monitor-infrastructure/kubernetes-monitoring/configuration/config-k8s-agent-flow/.

Configuration is deployed via Helm, see https://github.com/kiwix/k8s/tree/main/grafana

Architecture:

Grafana Cloud provides us:
- a Grafana instance displaying dashboards
- a Prometheus instance: scrape / store metrics + respond to queries
- a Loki instance : store logs + respond to queries
We host in our grafana namespace:
- kube-state-metrics (deployment) : service that listens to the Kubernetes API server and generates metrics about the state of the objects
- opencost (deployment): measures infrastructure costs
- prometheus-operator-crd (not used yet): operator to configure Prometheus based on k8s resources
- prometheus-node-exporter (daemonset) : running on each k8s node, grabs metrics at the node level
- grafana-agent (statefulset): agent grabing metrics (from kube-state-metrics, node-exporter, kubelet, cadvisor, opencost) and sending them to Prometheus
- grafana-agent-logs (daemonset): same binary as above, but grabing logs (Pods + Cluster events) and sending them to Loki

Grafana agent is installed in Flow Mode configuration.

⚠️ Since for now we use k8s 1.23, kube-state-metrics version 2.4.2 is adapted (next versions of kube-state-metrics is not aligned).

other servers (workers, ...)

ToDo

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Monitoring

Grafana

k8s configuration

other servers (workers, ...)

Clone this wiki locally