-
Notifications
You must be signed in to change notification settings - Fork 149
incomplete install instructions? #58
Comments
Looking closer, I'm seeing a lot of these errors in the kube-state-metrics pods Do we know what permissions this container requires? I do not see any serviceaccountName in the deployment configuration. Once we know permissions, how are they to be assigned?
|
I fixed the permission issues by adding a serviceaccount to the kubernetes configuration for kube-state-metrics with the following configs.
kube-state-metrics-role.yaml
|
i added a service for the node_exporter to expose 9100 and set the Prometheus datasource in grafana to it. I don't think this is what I'm supposed to do, and naturally it still doesn't work. |
I also created a service for kube-state-metrics, still not the metrics this is looking for.
If I examine with query inspector, it's trying to do the following. Which I assume it is trying to hit the kubernetes api? although api/v1/query_range is not a k8 route I'm familiar with. Right now my Prometheus data source is set to the kube-state-metrics, and obviously this query fails. I'm not sure how to update it to check the kubernetes datasource again. the way this app works is very confusing.
|
w00t. some progress. As I suspected at the start, in addition to the the kube-state-metrics and node-exporter, you need to manually create the config for a prometheus pod with the provided configmap, expose this pod via a service and then use that as the prometheus data source in the k8s app config for this cluster. I am now getting metrics for these dashboards:
I would love get the per-pod metrics working as these are probably the most useful stats for establishing resource constraints - one of the more challenging tasks in managing a k8s cluster. If I can get this last part figured out, I'll wrap up all these findings in a pull request (improving the readme if nothing else) |
So far, I have narrowed it down to the prometheus configs for getting cadvisor is not populating, naming is off, or possibly a permissions issue.
However, when I go directly to the /metrics in prometheus, container_memory_usage_bytes is not an optional metric. http://127.0.0.1:8001/api/v1/nodes/my-hostname/proxy/metrics/cadvisor
Prometheus configuration.
Queries for kube-state appear to be fine. |
yep, permission issue.
|
Oh, this is one of those fun problems that make you question all of your life decisions. This can't be fixed with simple rbac rules, it requires flags set on the kubelet which nicely reduce security. Problem and solutions nicely summarized prometheus-operator/prometheus-operator#633 While it's easy to accomplish manually on currently running nodes, in order to be maintained you will need to dig into whatever tool you use to create/manage clusters.
edit: nope, not quite. adding those flags may have worked, but it blocked cert authentication to view logs using kubectl with cert authentication. Reading carefully at https://github.com/coreos/prometheus-operator/tree/master/contrib/kube-prometheus#prerequisites i see some solutions around using an http rather than https request, kubernetes/kops#5176 (comment) |
after 3 days of troubleshooting, i unfortunately must concede defeat. if anyone get's this to work on a k8 cluster 1.8+, please do chime in. edit: I CONCEDE NOTHING! Finally got the K8 Container Dashboard to work and now have memory stats by container...yay! I came across this post again (which ironically, one of the first things i read in troubleshooting) Everything will vary a little bit based on your cluster setup. The material component appears to be my usage of kops vs kubeadm for cluster setup.
kubeadm does the opposite of both of those, for better or worse, which is why getting a straight answer has been challenging. This is my final configmap for the prometheus scraper. Only one line is modified on kubernetes-cadvisor
The ServiceAccount ClusterRole for prometheus can be grabbed from: If your curious, the prometheus-kubernetes.yml example in that same directory has some conflicting information in regards to the scrape configs depending on the version of k8s deployed. There is a versioning issue, but also the issue with how your cluster and kubelet authorization and authentication flags are set, which is not addressed. I'll aim for a pull request which will hopefully add clarity to the situation. |
I keep getting "Query support not implemented yet" in the dashboard. But I can see all the pod metrics individually. Any ideas? |
@illectronic that error comes from this repo kubernetes-app/dist/datasource/datasource.ts Line 134 in ddf616e
Looks like it's related to the kubernetes api source, but i'm not sure what triggers it exactly. |
somewhere in my toiling I lost the node data :( . not having an arch diagram that describes the source aggregation is driving me nuts. |
hi, I am stuck; not sure what i have to do with respect to prometheus. could you pls help me with that; i have running k8s cluster and grafana configuration is set up; what i have to do with respect to prometheus; pls help me pointers right from installation? |
how to install configmap? which one? could you please elaborate? |
@sakthishanmugam02 im working on a pull request that should hopefully help you. give me a few hours. |
@sakthishanmugam02 here ya go: sensay-nelson#1 |
@sensay-nelson when i try to deploy the configuration, |
@sakthishanmugam02 did you create the service account first? what does |
@sensay-nelson |
@sensay-nelson output of describe deploy: root@hwim-perf-test:~/prom-config# kubectl describe deploy kube-state-metrics -n kube-system Progressing True NewReplicaSetCreated Normal ScalingReplicaSet 44s deployment-controller Scaled up replica set kube-state-metrics-6bdd878bd7 to 1 |
try checking the issue with the replicaset i guess. isnt |
no pod is not listing as part of kubectl get pods -n kube-system |
@sensay-nelson one update: |
i got it working; since namespace not specified in service account, it created in default; updated the namespace to kube-system; pod deployed @sensay-nelson thanks for your support |
@sensay-nelson now prometheus server is up and running; how to configure grafana dashboard; i am getting Bad HTTP gateway error |
@sensay-nelson how to setup data source and cluster? detailed steps please |
I added data source as prometheus with :30690 NodePort ip; Setup the New cluster and have chosen the created data source; but there are no metrics; all metrics shows as NA and no node and namesapace details are listing; Unexpected error pop-up came in between also following pop up |
@sensay-nelson update: some progress; I am able to see metrics now; |
thanks a lot; i have changed the configmap 'replacement' properties; it started working :) |
@sensay-nelson one clarification; pod level metrics slightly higher than kubectl top command? what could be the reason? |
different data sources will produce different values |
@sensay-nelson thank you so much for your detailed writeup! Would've given up long ago if it weren't for this thread! I wanted to add something since this is the most comprehensive thread when setting up the Grafana Kubernetes App, maybe it helps anyone. For people using Grafana Cloud, you can easily make Prometheus write to their hosted Prometheus endpoint by configuring
This way, you don't have to store any metrics in the cluster and don't have to add a data source for each cluster. |
Update:
tip: if the app fails to install and you see giant list of issues in red.. it's probably the yaml. Unfortunately, node metrics still don't work for me even though kube-state-metrics is installed by If you have EKS clusters and Rancher, then this works too (aside from node metrics dashboard). Take url/credentials from Rancher's kubeconfig file for each cluster. |
It appears that in addition to the Node Exporter and Kube State Metrics, a 3rd component (prometheus scraper) must be manually added by the user in order for this to function.
A user must manually do the following for this to work:
Without these steps, almost no metrics will work. These requirements are missing from the readme.
Unless I am missing something else possibly?
The text was updated successfully, but these errors were encountered: