detected_level detects wrong severity, even when own level label is set and severity_text field exists #15444

jadesoturi · 2024-12-17T15:18:30Z

When detected_level (discover_log_levels: true) is enabled, severity is sometime labeled wrong. e.g. severity_text is set to INFO(or WARN or DEBUG), level label is set to INFO (or WARN or DEBUG) - once logs get to loki explore, detected_level i set to ERROR and the level label is also overwritten to ERROR.

If I understand this correctly, detected_logs only supposed to "detect" the severity if none exists, which it does here, both in the form a level label, and a severity_text/severity_number fields in the Structured Metadata.

To Reproduce
Deploy loki with helm chart(simple scalable) v 3.3.1 using the config below.
Deploy OTEL collector via operator using the config below.
Deploy some app with OTEL instrumentation, either manual or auto-instrumented/operator.
Ship logs to loki.
View the logs in explore (Grafana v 11.3.0-pre) with the following query:

{service_name=log-generator-spring-boot} | logfmt | level="ERROR" | severity_text!="ERROR"

Expected behavior
Expect the level and detected_level labels to match the severity_text field and the above query should return nothing,

Environment:

Infrastructure: Kubernetes 1.30.4
Deployment tool: Helm/Terraform

Screenshots, Promtail config, or terminal output

We use OTEL collector and use the transform processor to add level as a label, with a value set from severity_text resource attribute:

apiVersion: opentelemetry.io/v1beta1
kind: OpenTelemetryCollector
metadata:
  name: otel-collector
  namespace: monitoring
spec:
  mode: daemonset
  config:
    receivers:
      otlp:
        protocols:
          grpc:
            endpoint: 0.0.0.0:4317
          http:
            endpoint: 0.0.0.0:4318
    processors:
      memory_limiter:
        check_interval: 1s
        limit_percentage: 75
        spike_limit_percentage: 15
      k8sattributes:
        auth_type: "serviceAccount"
        passthrough: false
        filter:
          node_from_env_var: K8S_NODE_NAME
        pod_association:
        - sources:
          - from: resource_attribute
            name: k8s.pod.ip
        - sources:
          - from: resource_attribute
            name: k8s.pod.uid
        - sources:
          - from: connection
        extract:
          metadata:
            - "k8s.namespace.name"
            - "k8s.deployment.name"
            - "k8s.statefulset.name"
            - "k8s.daemonset.name"
            - "k8s.cronjob.name"
            - "k8s.job.name"
            - "k8s.node.name"
            - "k8s.pod.name"
            - "k8s.pod.uid"
            - "k8s.pod.start_time"
          labels:
            - tag_name: $$1
              key_regex: (.*)
              from: pod
          annotations:
            - tag_name: $$1
              key_regex: (.*)
              from: pod
      transform:
        error_mode: ignore
        log_statements:
          - context: log
            statements:
              - set(resource.attributes["level"], severity_text)
          - context: resource
            statements:
              - set(attributes["namespace"], attributes["k8s.namespace.name"])
              - set(attributes["deployment"], attributes["k8s.deployment.name"])
              - delete_key(attributes, "k8s.namespace.name")
              - delete_key(attributes, "k8s.deployment.name")
      batch:
        send_batch_size: 10000
        timeout: 10s

    exporters:
      otlphttp/logs:
        endpoint: http://loki-gateway.loki:80/otlp
        tls:
          insecure: true
        headers:

    service:
      telemetry:
        metrics:
          level: none
      pipelines:
        logs:
          receivers: [otlp]
          processors: [memory_limiter, k8sattributes, transform, batch]
          exporters: [otlphttp/logs]

We also have the following config in loki:

global:
  clusterDomain: ${kubernetes_cluster_domain}

loki:
  pattern_ingester:  
    enabled: true    
  limits_config:
    allow_structured_metadata: true
    #    discover_log_levels: false
    volume_enabled: true
    retention_period: 744h
    otlp_config:
      resource_attributes:
        ignore_defaults: true
        attributes_config:
          - action: index_label
            regex: namespace
          - action: index_label
            regex: service.name
          - action: index_label
            regex: deployment
          - action: index_label
            regex: level
  compactor:
    retention_enabled: true
    delete_request_store: azure
  schemaConfig:
    configs:
      - from: 2024-04-01
        store: tsdb
        object_store: azure
        schema: v13
        index:
          prefix: loki_index_
          period: 24h
  ingester:
    chunk_encoding: snappy
  tracing:
    enabled: true
  querier:
    # Default is 4, if you have enough memory and CPU you can increase, reduce if OOMing
    max_concurrent: 4
  storage:
    type: azure
    azure:
      # Comprehensive connection string for Azure Blob Storage account (Can be used to replace endpoint, accountName, and accountKey)
      connectionString: $${LOGS_CONNSTRING}
      container_name:   "logs"
    bucketNames:
      chunks: "logs"
      ruler: "logs"
      admin: "admin"

read:
  # Current version of loki-simple-scalable helm chart doesn't include '-config.expand-env=true' option by default. To allow referencing
  # sensitive values in loki config from env vars, this option has to be enabled explicitly
  extraArgs:
    - -config.expand-env=true

  # Environment variables with Azure Storage Account details. Referenced in loki config
  extraEnv:
    - name: LOGS_CONNSTRING
      valueFrom:
        secretKeyRef:
          name: logs
          key: connectionString

write:
  # Current version of loki-simple-scalable helm chart doesn't include '-config.expand-env=true' option by default. To allow referencing
  # sensitive values in loki config from env vars, this option has to be enabled explicitly
  extraArgs:
    - -config.expand-env=true

  # Environment variables with Azure Storage Account details. Referenced in loki config
  extraEnv:
    - name: LOGS_CONNSTRING
      valueFrom:
        secretKeyRef:
          name: logs
          key: connectionString

backend:
  # Current version of loki-simple-scalable helm chart doesn't include '-config.expand-env=true' option by default. To allow referencing
  # sensitive values in loki config from env vars, this option has to be enabled explicitly
  extraArgs:
    - -config.expand-env=true

  # Environment variables with Azure Storage Account details. Referenced in loki config
  extraEnv:
    - name: LOGS_CONNSTRING
      valueFrom:
        secretKeyRef:
          name: logs
          key: connectionString

gateway:
  ingress:
    ingressClassName: "kong"
    enabled: true
    hosts:
      - host: example.domain.com
        paths:
          - path: /loki
            pathType: Prefix
  tls: {}

chunksCache:
  # -- Amount of memory allocated to chunks-cache for object storage (in MB).
  # By default a safe memory limit will be requested based on allocatedMemory value (floor (* 1.2 allocatedMemory)).
  allocatedMemory: 4096

# Zero out replica counts of other deployment modes
singleBinary:
  replicas: 0

ingester:
  replicas: 0
querier:
  replicas: 0
queryFrontend:
  replicas: 0
queryScheduler:
  replicas: 0
distributor:
  replicas: 0
compactor:
  replicas: 0
indexGateway:
  replicas: 0
bloomCompactor:
  replicas: 0
bloomGateway:
  replicas: 0

If we set discover_log_levels: false, then we only get "logs" as a detected_level, and all level labels match the severity_text field, but we loose the nice separations of levels for easy filtering.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

detected_level detects wrong severity, even when own level label is set and severity_text field exists #15444

detected_level detects wrong severity, even when own level label is set and severity_text field exists #15444

jadesoturi commented Dec 17, 2024

detected_level detects wrong severity, even when own level label is set and severity_text field exists #15444

detected_level detects wrong severity, even when own level label is set and severity_text field exists #15444

Comments

jadesoturi commented Dec 17, 2024