MOSt

Monitoring Observability Stack

I have been desinging this stack (and all deploy in it) to use like "Plug and play Monitoring" and trying not to depend of a single cloud provider (the famous "vendor lock in"). This is the "key of my focus" and the greatest effort that I did.
This stack contains everything necessary for a correct visualization and troubleshooting of the infrastructure and analysis of the behavior of your platform.

To use it, you only need to deploy it in your kubernetes cluster, wait a few minutes to all service started (Running state) and after that you will see all data in your Grafana.
Yes its a super "Easy to implement"!!!!

In grafana you will see:

All Kubernetes Metrics (of all resources)
All logs of your pods and another logs like AutoScaller (requiere that your pods send logs to STDOUT)
Metrics and data from Cloudwatch. ( Relax, this metrics will not save in prometheus, this metrics only will be consumed from Cloudwatch).

In my daily work, I use this stack to troubleshoot all infra issues. I use Polaris to mantain my deploys using the best practices in kubernetes. And finally use chaos-mesh to do some chaos experiments (crashes controlled by myself) and try to analize new scopes or in the best case can "predict" (more or less) what could be do if something happen (like my experiment)
It was created based on my own experience working with kubernetes and how to do a simple troubleshoot or analysis of the most common problems using this dashboards

Stack

Name	Info
Kubernetes	Where you will deploy this super cool stack
Prometheus	The service in charge of consuming and storing metrics of the infrastructure and the resources of the components (pods, nodes, volumes, etc)
Loki	The service in charge of consuming and storing the logs exposed by the components
Cloudwatch	The service through which specific metrics of each component of aws can be obtained
Grafana	The one in charge of visualizing the data stored in the aforementioned services. All dashboards are stored in the repository as code.
Polaris	Polaris keeps your clusters sailing smoothly. It runs a variety of checks to ensure that Kubernetes pods and controllers are configured using best practices, helping you avoid problems in the future.
Chaos-Mesh	A Powerful Chaos Engineering Platform for Kubernetes

Steps to deploy it

Command	Required	Obs
kubectl apply -f 01-hpa_and_autoscaler/	Optional	-
kubectl apply -f 02-troublePOD/	Optional	Only if you use a pod to investigate something, is not the best way but works
kubectl apply -f 03-ingress/	Yes	-
kubectl create -f 04-setup-monitoring/	Yes	Go for a coffee, its take a few time
kubectl create -f 05-monitoring/	Yes	Go for another coffee. Its take a few time. (here is where the magic happen).
kubectl apply -f 06-Loki_log_monitoring/	Yes	-
kubectl apply -f 08-polaris/	Optional	Useful to maintain your deployed with kubernetes best practices
kubectl apply -f 09-kubernetes-dashboard/	Optional	I dont like use UI, I prefer the CLI, but I need to admit that it's really is useful

Important files:

File name	Info	Obs
grafana-dashboardDatasources.yaml	Configure al datasources like, prometheus, loki, cloudwatch, etc	-
grafana-service.yaml	Configure AWS Load Balancer to acces to grafana.	-
grafana-deployment.yaml	Configure Grafana credetials, plugins and add your dashboard path using volume files.	-
grafana-dashboardSources.yaml	Configure grafana dashboards path, folders, providers, etc.	-
grafana-dashboardDefinitions.yaml	Bassic comunity dasbhards	-
grafana-dashboard-(sub-name).yaml	Dashboards that I have been creating and getting from our lovely community	-
06-Loki_log_monitoring	Configure your own deploy and modify storage size (EBS)	-

Architecture

Important Dashboards

Detailed information about Pod status, volume status, volume attached, last logs and more

Detailed information on resources used by each pod (similar to kubectl top pod)

Logs for each pod (similar to: kubectl -n NAMESPACE logs POD )

VPN and tunnel status information

Remember, to manage cloudwatch you need a user with ReadOnly permissions

What information have each dashboard?

Default / Logs by App (one of my favourites)

 - Application logs (similar to kubectl logs pod )
 - Possibility to filter up to 3 levels in real time

Default / Nodes

- Real-time information on CPU usage of the node
- Node load average information
- Node memory information
- Disk information (I/O) of the node
- Node disk space information
- Network Information (In/Out)

Default / Kubernetes simple and fast Troubleshooting (I spent a lot of time on this, super useful)

- Total nodes    
- Total Pods     
- Pods in a state other than running
- Status of the pods (type kubectl get pod )
- Number of restarts that each pod has
- The ip of the node in which that pod is hosted
- Volume status and total space, used
- eks version of the kubernetes node (kubelet + kubeproxy)
- version of docker that runs the node
- The last 100 logs if there is a failed pod (if there are no failed pods it is not shown)
- Detailed status of the volumes (name, type, and capacity of the same)
- Repository and Container Image used in the pod.
- Reasons of last fail pod/container

Default / AWS VPN

- IN/OUT traffic of each tunnel
- State of the VPN tunnels discriminated by account

Default / AWS RDS

- CPU information from the database
- Disk Queue Deph Information
- ReadLatency
- Read Thought
- Network IN/OUT

Default / kubernetes-pvc-fast-view

- Detailed status of the volumes (name, type, and capacity of the same)
- PVCs size status
- PVCs Above Warning Threshold
- PVCs in Pending State
- PVCs in Lost State
- Disk usage trend
- Disk usage rate

Default / AWS Certificate Manage

- Information on when the certificates expire

Default / AWS Auto Scaling

- Cluster autoscaling information

Autoscaler kubernetes

- Detailed information on what happens to the cluster when it scales up or down

Default / Kubernetes / Compute Resources / Pod

- Information on cpu usage of each pod
- Information on throttling (https://www.geeknetic.es/Throttling/que-es-y-para-que-sirve)
- Information on assigned resources (request/Limits) of each pod
- Memory usage information for each pod
- Network information (In/Out) of each pod
- Storage IO information of each pod

Default / AWS ELB Application Load Balancer (ALB) & AWS Network Load Balancer (NLB)

- Information of AWS load balancers discriminated by account
- HTTPCode_Target information (2xx, 3xx, 4xx, 5xx)
- HTTPCode information (2xx, 3xx, 4xx, 5xx)

Other super cool tools

Polaris

Polaris keeps your clusters sailing smoothly. It runs a variety of checks to ensure that Kubernetes pods and controllers are configured using best practices, helping you avoid problems in the future.

Chaos-Mesh

A Powerful Chaos Engineering Platform for Kubernetes

Access

Grafana:

Option A) kubectl -n monitoring get svc|grep grafana-external-service
Option B) Create a Route53 alias and attach it to load balancer
admin : SuperPowerPassword (change it in grafana-deployment.yaml #L35 )

Prometheus custom config

prometheus-prometheus.yaml
prometheus-additional.yaml

Roadmap

Feature	Status
Open it to all community	Done
Generate dashboards for simple and fast troubleshoot	Done
Generate dashboards for aws resources	Done
Generate documentation	Done
Configure Prometheus Federation	Ongoing
Configure AlertManager	Pending
Document how to Encrypt all data in rest	Pending
Add videos how to troubleshoot using this stack	Pending

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MOSt

Stack

Steps to deploy it

Important files:

Architecture

Important Dashboards

What information have each dashboard?

Default / Logs by App (one of my favourites)

Default / Nodes

Default / Kubernetes simple and fast Troubleshooting (I spent a lot of time on this, super useful)

Default / AWS VPN

Default / AWS RDS

Default / kubernetes-pvc-fast-view

Default / AWS Certificate Manage

Default / AWS Auto Scaling

Autoscaler kubernetes

Default / Kubernetes / Compute Resources / Pod

Default / AWS ELB Application Load Balancer (ALB) & AWS Network Load Balancer (NLB)

Other super cool tools

Polaris

Chaos-Mesh

Access

Grafana:

Prometheus custom config

Roadmap

About

Releases

Packages

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
01-hpa_and_autoscaler		01-hpa_and_autoscaler
02-troublePOD		02-troublePOD
03-ingress		03-ingress
04-setup-monitoring		04-setup-monitoring
05-monitoring		05-monitoring
06-Loki_log_monitoring		06-Loki_log_monitoring
08-polaris		08-polaris
09-kubernetes-dashboard		09-kubernetes-dashboard
img		img
README.md		README.md
argocd-manifest.yaml		argocd-manifest.yaml

jpradoar/MOSt

Folders and files

Latest commit

History

Repository files navigation

MOSt

Stack

Steps to deploy it

Important files:

Architecture

Important Dashboards

What information have each dashboard?

Default / Logs by App (one of my favourites)

Default / Nodes

Default / Kubernetes simple and fast Troubleshooting (I spent a lot of time on this, super useful)

Default / AWS VPN

Default / AWS RDS

Default / kubernetes-pvc-fast-view

Default / AWS Certificate Manage

Default / AWS Auto Scaling

Autoscaler kubernetes

Default / Kubernetes / Compute Resources / Pod

Default / AWS ELB Application Load Balancer (ALB) & AWS Network Load Balancer (NLB)

Other super cool tools

Polaris

Chaos-Mesh

Access

Grafana:

Prometheus custom config

Roadmap

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages