Filtering metrics (metric_relabel_configs) in the scrapeConfigs settings will delay metric collection. #2815

Ari-suhyeon · 2024-08-28T04:14:19Z

Describe the bug
When I filter metrics using metric_relabel_configs in the scrapeConfigs settings, I am not getting the metrics at the scrape time I want.
Send metrics to AWS Prometheus via ADOT (scraping interval: 10 seconds) and range query the metric information every 10 seconds.

I set metric_relabel_configs to ensure that only certain metrics are stored in Prometheus to save on AWS Prometheus costs.
However, when I set metric_relabel_configs in multiple scrape configs, there is an intermittent delay in metric collection.
If I search directly with the current time, the metrics are intermittently not collected when I search, but when I search again after a few seconds with the same time, the metrics are collected well. Therefore, I think the collection is delayed.
If you look at the image below, you can see that metrics are intermittently not collected for a few seconds on the left, and then all metrics are collected.
The metric_relabel_configs condition was present in the config when the metrics were intermittently not collected, and the metric_relabel_configs condition was missing when the metrics were all collected.
(container_cpu_usage_seconds_total metric image)

Is the delay due to slow filtering?
How do I filter metrics by the metric_relabel_configs condition while collecting data every 10 seconds?
I've deployed EKS 1.29, ADOT as helm chart and am using v0.40.0.

it works well

scrapeConfigs: |
  - job_name: 'k8s_metrics_scrape'
    scrape_interval: 10s
    scrape_timeout: 10s
    metrics_path: /metrics
    kubernetes_sd_configs:
      - role: pod
    relabel_configs:
      - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
        action: keep
        regex: true
      - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
        action: replace
        regex: (.+)
        target_label: __metrics_path__
      - source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
        action: replace
        regex: ([^:]+)(?::\d+)?;(\d+)
        replacement: $$1:$$2
        target_label: __address__
      - action: labelmap
        regex: __meta_kubernetes_pod_label_(.+)
      - source_labels: [__meta_kubernetes_pod_node_name]
        action: keep
        regex: ${K8S_NODE_NAME}
      - source_labels: [__meta_kubernetes_namespace]
        action: replace
        target_label: kubernetes_namespace
      - source_labels: [__meta_kubernetes_pod_name]
        action: replace
        target_label: kubernetes_pod_name
      - source_labels: [__meta_kubernetes_pod_container_name]
        action: replace
        target_label: kubernetes_container_name
      - action: drop
        regex: Pending|Succeeded|Failed
        source_labels:
          - __meta_kubernetes_pod_phase

  # for 'container_*' metric
  - job_name: kubernetes-nodes-cadvisor
    scrape_interval: 10s
    scrape_timeout: 10s
    bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
    kubernetes_sd_configs:
    - role: node
    relabel_configs:
    - action: labelmap
      regex: __meta_kubernetes_node_label_(.+)
    - replacement: kubernetes.default.svc:443
      target_label: __address__
    - regex: (.+)
      replacement: /api/v1/nodes/$$1/proxy/metrics/cadvisor
      source_labels:
      - __meta_kubernetes_node_name
      target_label: __metrics_path__
    - action: keep
      regex: $K8S_NODE_NAME
      source_labels: [__meta_kubernetes_node_name]
    scheme: https
    tls_config:
      ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
      insecure_skip_verify: true

When there is a delay in collect

scrapeConfigs: |
  - job_name: 'k8s_metrics_scrape'
    scrape_interval: 10s
    scrape_timeout: 10s
    metrics_path: /metrics
    kubernetes_sd_configs:
      - role: pod
    relabel_configs:
      - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
        action: keep
        regex: true
      - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
        action: replace
        regex: (.+)
        target_label: __metrics_path__
      - source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
        action: replace
        regex: ([^:]+)(?::\d+)?;(\d+)
        replacement: $$1:$$2
        target_label: __address__
      - action: labelmap
        regex: __meta_kubernetes_pod_label_(.+)
      - source_labels: [__meta_kubernetes_pod_node_name]
        action: keep
        regex: ${K8S_NODE_NAME}
      - source_labels: [__meta_kubernetes_namespace]
        action: replace
        target_label: kubernetes_namespace
      - source_labels: [__meta_kubernetes_pod_name]
        action: replace
        target_label: kubernetes_pod_name
      - source_labels: [__meta_kubernetes_pod_container_name]
        action: replace
        target_label: kubernetes_container_name
      - action: drop
        regex: Pending|Succeeded|Failed
        source_labels:
          - __meta_kubernetes_pod_phase
    # metric name filter
    metric_relabel_configs:
    - source_labels: [__name__]
      regex: 'istio_requests_total|istio_request_duration_milliseconds_.*'
      action: keep

  # for 'container_*' metric
  - job_name: kubernetes-nodes-cadvisor
    scrape_interval: 10s
    scrape_timeout: 10s
    bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
    kubernetes_sd_configs:
    - role: node
    relabel_configs:
    - action: labelmap
      regex: __meta_kubernetes_node_label_(.+)
    - replacement: kubernetes.default.svc:443
      target_label: __address__
    - regex: (.+)
      replacement: /api/v1/nodes/$$1/proxy/metrics/cadvisor
      source_labels:
      - __meta_kubernetes_node_name
      target_label: __metrics_path__
    - action: keep
      regex: $K8S_NODE_NAME
      source_labels: [__meta_kubernetes_node_name]
    scheme: https
    tls_config:
      ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
      insecure_skip_verify: true
    # metric name filter
    metric_relabel_configs:
    - source_labels: [__name__]
      regex: 'container_cpu_usage_seconds_total|container_memory_working_set_bytes|container_network_receive_bytes_total'
      action: keep

The text was updated successfully, but these errors were encountered:

github-actions · 2024-10-27T20:02:05Z

This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 30 days.

github-actions bot added the stale label Oct 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Filtering metrics (metric_relabel_configs) in the scrapeConfigs settings will delay metric collection. #2815

Filtering metrics (metric_relabel_configs) in the scrapeConfigs settings will delay metric collection. #2815

Ari-suhyeon commented Aug 28, 2024

github-actions bot commented Oct 27, 2024

Filtering metrics (metric_relabel_configs) in the scrapeConfigs settings will delay metric collection. #2815

Filtering metrics (metric_relabel_configs) in the scrapeConfigs settings will delay metric collection. #2815

Comments

Ari-suhyeon commented Aug 28, 2024

github-actions bot commented Oct 27, 2024