Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Filtering metrics (metric_relabel_configs) in the scrapeConfigs settings will delay metric collection. #2815

Open
Ari-suhyeon opened this issue Aug 28, 2024 · 1 comment
Labels

Comments

@Ari-suhyeon
Copy link

Describe the bug
When I filter metrics using metric_relabel_configs in the scrapeConfigs settings, I am not getting the metrics at the scrape time I want.
Send metrics to AWS Prometheus via ADOT (scraping interval: 10 seconds) and range query the metric information every 10 seconds.

I set metric_relabel_configs to ensure that only certain metrics are stored in Prometheus to save on AWS Prometheus costs.
However, when I set metric_relabel_configs in multiple scrape configs, there is an intermittent delay in metric collection.
If I search directly with the current time, the metrics are intermittently not collected when I search, but when I search again after a few seconds with the same time, the metrics are collected well. Therefore, I think the collection is delayed.
If you look at the image below, you can see that metrics are intermittently not collected for a few seconds on the left, and then all metrics are collected.
The metric_relabel_configs condition was present in the config when the metrics were intermittently not collected, and the metric_relabel_configs condition was missing when the metrics were all collected.
(container_cpu_usage_seconds_total metric image)
image

Is the delay due to slow filtering?
How do I filter metrics by the metric_relabel_configs condition while collecting data every 10 seconds?
I've deployed EKS 1.29, ADOT as helm chart and am using v0.40.0.

it works well

scrapeConfigs: |
  - job_name: 'k8s_metrics_scrape'
    scrape_interval: 10s
    scrape_timeout: 10s
    metrics_path: /metrics
    kubernetes_sd_configs:
      - role: pod
    relabel_configs:
      - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
        action: keep
        regex: true
      - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
        action: replace
        regex: (.+)
        target_label: __metrics_path__
      - source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
        action: replace
        regex: ([^:]+)(?::\d+)?;(\d+)
        replacement: $$1:$$2
        target_label: __address__
      - action: labelmap
        regex: __meta_kubernetes_pod_label_(.+)
      - source_labels: [__meta_kubernetes_pod_node_name]
        action: keep
        regex: ${K8S_NODE_NAME}
      - source_labels: [__meta_kubernetes_namespace]
        action: replace
        target_label: kubernetes_namespace
      - source_labels: [__meta_kubernetes_pod_name]
        action: replace
        target_label: kubernetes_pod_name
      - source_labels: [__meta_kubernetes_pod_container_name]
        action: replace
        target_label: kubernetes_container_name
      - action: drop
        regex: Pending|Succeeded|Failed
        source_labels:
          - __meta_kubernetes_pod_phase

  # for 'container_*' metric
  - job_name: kubernetes-nodes-cadvisor
    scrape_interval: 10s
    scrape_timeout: 10s
    bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
    kubernetes_sd_configs:
    - role: node
    relabel_configs:
    - action: labelmap
      regex: __meta_kubernetes_node_label_(.+)
    - replacement: kubernetes.default.svc:443
      target_label: __address__
    - regex: (.+)
      replacement: /api/v1/nodes/$$1/proxy/metrics/cadvisor
      source_labels:
      - __meta_kubernetes_node_name
      target_label: __metrics_path__
    - action: keep
      regex: $K8S_NODE_NAME
      source_labels: [__meta_kubernetes_node_name]
    scheme: https
    tls_config:
      ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
      insecure_skip_verify: true

When there is a delay in collect

scrapeConfigs: |
  - job_name: 'k8s_metrics_scrape'
    scrape_interval: 10s
    scrape_timeout: 10s
    metrics_path: /metrics
    kubernetes_sd_configs:
      - role: pod
    relabel_configs:
      - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
        action: keep
        regex: true
      - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
        action: replace
        regex: (.+)
        target_label: __metrics_path__
      - source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
        action: replace
        regex: ([^:]+)(?::\d+)?;(\d+)
        replacement: $$1:$$2
        target_label: __address__
      - action: labelmap
        regex: __meta_kubernetes_pod_label_(.+)
      - source_labels: [__meta_kubernetes_pod_node_name]
        action: keep
        regex: ${K8S_NODE_NAME}
      - source_labels: [__meta_kubernetes_namespace]
        action: replace
        target_label: kubernetes_namespace
      - source_labels: [__meta_kubernetes_pod_name]
        action: replace
        target_label: kubernetes_pod_name
      - source_labels: [__meta_kubernetes_pod_container_name]
        action: replace
        target_label: kubernetes_container_name
      - action: drop
        regex: Pending|Succeeded|Failed
        source_labels:
          - __meta_kubernetes_pod_phase
    # metric name filter
    metric_relabel_configs:
    - source_labels: [__name__]
      regex: 'istio_requests_total|istio_request_duration_milliseconds_.*'
      action: keep

  # for 'container_*' metric
  - job_name: kubernetes-nodes-cadvisor
    scrape_interval: 10s
    scrape_timeout: 10s
    bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
    kubernetes_sd_configs:
    - role: node
    relabel_configs:
    - action: labelmap
      regex: __meta_kubernetes_node_label_(.+)
    - replacement: kubernetes.default.svc:443
      target_label: __address__
    - regex: (.+)
      replacement: /api/v1/nodes/$$1/proxy/metrics/cadvisor
      source_labels:
      - __meta_kubernetes_node_name
      target_label: __metrics_path__
    - action: keep
      regex: $K8S_NODE_NAME
      source_labels: [__meta_kubernetes_node_name]
    scheme: https
    tls_config:
      ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
      insecure_skip_verify: true
    # metric name filter
    metric_relabel_configs:
    - source_labels: [__name__]
      regex: 'container_cpu_usage_seconds_total|container_memory_working_set_bytes|container_network_receive_bytes_total'
      action: keep
Copy link
Contributor

This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 30 days.

@github-actions github-actions bot added the stale label Oct 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant