Releases: DataDog/datadog-agent
7.60.1
Agent
7.60.1
Prelude
Release on: 2024-12-19
Security Notes
- Update
golang.org/x/crypto
to fix CVE-2024-45337.
Datadog Cluster Agent
7.60.1
Prelude
Released on: 2024-12-19 Pinned to datadog-agent v7.60.1: CHANGELOG.
7.60.0
Datadog Agent
Release Notes
7.60.0
Prelude
Release on: 2024-12-16
- Please refer to the 7.60.0 tag on integrations-core for the list of changes on the Core Checks
Upgrade Notes
-
-
Parameter
peer_tags_aggregation
(a.k.a. environment variableDD_APM_PEER_TAGS_AGGREGATION
) is now enabled by default. This means that aggregation of peer related tags (e.g., peer.service, db.instance, etc.) now happens in the Agent, which enables statistics for Inferred Entities. If you want to disable this feature, set peer_tags_aggregation to false in your Agent configuration. -
Parameter
compute_stats_by_span_kind
(a.k.a. environment variableDD_APM_COMPUTE_STATS_BY_SPAN_KIND
) is now enabled by default. This means spans with an eligible span.kind will have stats computed. If disabled, only top-level and measured spans will have stats computed. If you want to disable this feature, set compute_stats_by_span_kind to false in your Agent configuration.Note: When using
peer_tags_aggregation
andcompute_stats_by_span_kind
, a high cardinality of peer tags or APM resources can contribute to higher CPU and memory consumption. If enabling both causes the Agent to consume too many resources, try disabling compute_stats_by_span_kind first.
It is recommended that you update your tracing libraries according to the instructions here and set
DD_TRACE_REMOVE_INTEGRATION_SERVICE_NAMES_ENABLED
(ordd.trace.remove.integration-service-names.enabled
) totrue
. -
-
Upgraded JMXFetch to 0.49.5 which adds support for
UnloadedClassCount
metric and IBM J9 gc metrics. See 0.49.5 for more details.
New Features
-
Inferred Service dependencies are now Generally Available (exiting Beta) and enabled by default. Inferred Services of all kinds now have trace metrics and are available in dependency maps. apm_config.peer_tags_aggregation and apm_config.compute_stats_by_span_kind both now default to true unless explicitly set to false.
-
Add check_tag_cardinality parameter config check.
By default check_tag_cardinality is not set which doesn't change the behavior of the checks. Once it is set in pod annotaions, it overrides the cardinality value provided in the base agent configuration. Example of usage:
ad.datadoghq.com/redis.checks: |
{
"redisdb": {
"check_tag_cardinality": "high",
"instances": [
{
"host": "%%host%%",
"port": "6379"
}
]
}
}
- Added a new feature flag enable_receive_resource_spans_v2 in DD_APM_FEATURES that gates a refactored implementation of ReceiveResourceSpans for OTLP.
Enhancement Notes
- Added information about where the Agent sourced BTF data for eBPF to the Agent flare. When applicable, this will appear in
system-probe/ebpf_btf_loader.log
. - The Agent flare now returns NAT debug information from conntrack in the
system-probe
directory. - The
flare
subcommand includes a--provider-timeout
option to set a timeout for each file collection (default is 10s), useful for unblocking slow flare creation. - This change reduces the number of DNS queries made by Network Traffic based paths in Network Path. A cache of reverse DNS lookups is used to reduce the number of DNS queries. Additionally, reverse DNS lookups are now performed only for private IPs and not for public IPs.
- Agent flare now includes system-probe telemetry data via
system-probe/system_probe_telemetry.log
. - The MSI installer uses 7zr.exe to decompress the embedded Python.
- On Windows, the endpoint /windows_crash_detection/check has been modified to report crashes in an asynchronous manner, to allow processing of large crash dumps without blocking or timing out. The first check will return a busy status and continue to do so until the processing is completed.
Deprecation Notes
- Prebuilt eBPF for the network tracer system-probe module has been deprecated in favor of CO-RE and runtime compilation variants on Linux kernel versions 6+ and RHEL kernel versions 5.14+. To continue to use the prebuilt eBPF network tracer, set system_probe_config.allow_prebuilt_fallback in the system-probe config file, or set the environment variable DD_ALLOW_PREBUILT_FALLBACK, to true on these platforms.
- The feature flag service_monitoring_config.enable_http_stats_by_status_code was deprecated and removed. No impact on USM's behavior.
Bug Fixes
- Fixes an issue added in 7.50 that causes the Windows event log tailer to drop events if it cannot open their publisher metadata.
- Fix a bug in the config parser that broke ignored_ip_addresses from working in NDM Autodiscovery.
- Fixes host tags with a configurable duration so the metric's context hash doesn't change, preventing the aggregator from mistaking it as a new metric.
- Fix could not parse voltage fields error in Nvidia Jetson integration when tegrastats output contains mW units.
- Fix building of Python extension containing native code.
- [oracle] Fix broken activity sampling with an external Oracle client.
- Fix nil pointer error on Oracle DBM query when the check's connection is lost before SELECT statement executes.
- Fix a regression that caused the Agent to not be able to run if its capabilities had been modified with the setcap command.
- Fix bug wherein single line truncated logs ended with whitespace characters were not being tagged as truncated. Fix issue with the truncation message occasionally causing subsequent logs to think they were truncated when they were not (single line logs only).
Datadog Cluster Agent
Release Notes
7.60.0
Prelude
Released on: 2024-12-16 Pinned to datadog-agent v7.60.0: CHANGELOG.
Bug Fixes
- Fixes bug where incorrect timestamp would be used for unbundled Kubernetes events.
- Fixed an issue in the KSM check when it's configured with the option
pod_collection_mode
set tonode_kubelet
. Previously, the check could fail to start if there was a timeout while contacting the API server. This issue has now been resolved.
7.59.1
Prelude
Release on: 2024-12-02
Enhancement Notes
- Setting up a temporary directory for JMXFetch to use when it runs. Using the same one the Agent uses when running as this guarantees a directory where JMXFetch can write to. This helps when JMXFetch sends metrics over Unix Domain Socket <https://docs.datadoghq.com/developers/dogstatsd/unix_socket/?tab=host> as it needs access to a temp directory which it can write to.
7.59.0
Agent
Prelude
Release on: 2024-11-07
- Please refer to the 7.59.0 tag on integrations-core for the list of changes on the Core Checks
Upgrade Notes
- Removed the deprecated config option
otlp_config.debug.loglevel
in favor ofotlp_config.debug.verbosity
:loglevel: debug
maps toverbosity: detailed
loglevel: info
maps toverbosity: normal
loglevel: warn/error
maps toverbosity: basic
loglevel: disabled
maps toverbosity: none
New Features
- Add ability to run process/container collection on the core Agent (Linux only). This is controlled by the process_config.run_in_core_agent.enabled option in datadog.yaml.
- DBM: Add configuration options to SQL obfuscator to customize the obfuscation of SQL statements:
KeepJSONPath
- option to control whether JSON paths following JSON operators in SQL statements should be obfuscated. This option is only valid whenObfuscationMode
isobfuscate_and_normalize
.
- APM: Add new 'sqllexer' feature flag for the Trace Agent, which enables the sqllexer imprementation of the SQL Obfuscator.
- Introduce new Kubernetes tag gpu_vendor for the GPU resource requested by a container.
Enhancement Notes
-
Added additional Agent telemetry metrics for the log tailer code flow: logs.bytes_sent, logs.encoded_bytes_sent, and logs.bytes_missed
-
Datadog may collect environmental, performance, and feature usage information about the Datadog Agent. This may include diagnostic logs and crash dumps of the Datadog Agent with obfuscated stack traces to support and further improve the Datadog Agent.
More details could be found in the docs
-
APM: Updates peer tags for
peer.db.system
. -
Agents are now built with Go
1.22.8
. -
While using the AWS Lambda Extension, when a Lambda Function is invoked by a [properly instrumented][1] Step Function, the Lambda Function will create its Trace and Parent IDs deterministically based on the Step Function's execution context. [1]: https://docs.datadoghq.com/serverless/step_functions/installation/?tab=custom "Install Serverless Monitoring for AWS Step Functions"
-
Updates default .NET library used for auto-instrumentation from v2 to v3
-
The system-probe selinux policy is now installed on Oracle Linux
-
Increases the default input channel, processing channel, and context store sizes for network traffic paths.
-
Adds support for file log collection from Podman rootless containers when
logs_config.use_podman_logs
is set totrue
andpodman_db_path
is configured. -
Allow Python integrations to emit Agent telemetry data.
Security Notes
- Update OpenSSL to 3.3.2 (on Linux & macOS) in order to mitigate CVE-2024-6119.
Bug Fixes
- Fixes the default configuration template to include the Cloud Security Management configuration options.
- Fixing a bug introduced in 7.55 where in some specific scenarios, checks associated with a deleted container or POD would keep running until the Agent is restarted.
- Fix the forwarder health check so that it reports unhealthy when the API key is invalid.
- Fix the removal of 'non-core' integrations during Agent upgrades.
- Fix Process Agent argument scrubbing to allow scrubbing of quoted arguments.
- Fix Orchestrator argument scrubbing to allow scrubbing of quoted arguments.
- Fixes an issue where TCP traceroute latency was not being calculated correctly.
- Fixes the telemetry type for Oracle metrics.
- APM: Fix obfuscation of SQL queries containing non-numeric prepared statement variables.
Other Notes
- Adds Postgres integration metrics to cross-org telemetry whitelist.
- The Agent is now built with a custom toolchain that targets our minimally supported glibc version (2.17 on x86_64 and 2.23 on aarch64)
- On Windows, the TCP socket transport mechanism for system probe communications has been replaced with a named pipe. This deprecates the system_probe_config.sysprobe_socket configuration entry for Windows. The new fixed named pipe path is \pipedd_system_probe.
7.58.2
Prelude
Release on: 2024-11-04
Bug Fixes
- Use of cloud-provided hostname as default when running the Agent in AKS introduced in 7.56.0 is reverted due to cases where the hostname returned is non-unique. This feature will be fixed and added again in a future release.
7.58.1
Agent
Prelude
Release on: 2024-10-24
Enhancement Notes
- Removes a log statement which was causing a lot of noise in the Network Path logs.
Bug Fixes
- [CWS] Fixes an issue where the cws-instrumentation trace command could panic before launching the traced executable when running on AWS Fargate.
- [CWS] Fixes an issue where ECS Fargate tags would not be resolved correctly on CWS events.
- Fixes an error in system-probe triggered by packet capture in environments with multiple VLANs.
- Fix USM's GO-TLS support for Golang 1.23
7.58.0
Agent
Prelude
Release on: 2024-10-21
- Please refer to the 7.58.0 tag on integrations-core for the list of changes on the Core Checks
Upgrade Notes
- Changes behavior of the timeout for Network Path. Previously, the timeout signified the total time to wait for a full traceroute to complete. Now, the timeout signifies the time to wait for each hop in the traceroute. Additionally, the default timeout has been changed to 1000ms.
New Features
- Added capability to tag any Kubernetes resource based on labels and annotations. This feature can be configured with kubernetes_resources_annotations_as_tags and kubernetes_resources_labels_as_tags. These feature configurations are associate group resources with annotations-to-tags (or labels-to-tags) map For example, pods can be associated with an annotations-to-tags map to configure annotations as tags for pods. Example: {`pods`: {`annotationKey1`: tag1, `annotationKey2`: tag2}}
- The Kubernetes State Metrics (KSM) check can now be configured to collect pods from the Kubelet in node agents instead of collecting them from the API Server in the Cluster Agent or the Cluster check runners. This is useful in clusters with a large number of pods where emitting pod metrics from a single check instance can cause performance issues due to the large number of metrics emitted.
- NPM - adds UDP "Packets Sent" and "Packets Received" to the network telemetry in Linux.
- [oracle] Add the
active_session_history
configuration parameter to optionally ingest Oracle active session history samples instead of query sampling. - Added config option
logs_config.tag_truncated_logs
. When enabled, file logs will come with a tagtruncated:true
if they were truncated by the Agent.
Enhancement Notes
- [DBM] Bump go-sqllexer to 0.0.14 to skip collecting CTE tables as SQL metadata.
- Agents are now built with Go
1.22.7
. - Add the ability to tag cisco-sdwan device and interface metrics with user-defined tags.
- Add support for setting a custom log source from resource attribute or log attribute datadog.log.source.
- The default UDP port for traceroute (port 33434) is now used for Network Traffic based paths, instead of the port detected by NPM.
- [oracle] Add
oracle_client_lib_dir
config parameter. - [oracle] Increase tablespace check interval from 1 to 10 minutes.
- [oracle] Don't try to fetch execution plans where
plan_hash_value
is0
- The OTLP ingest endpoint now maps the new OTel semantic convention deployment.environment.name to env
- Prevents the use of the process_config.run_in_core_agent.enabled configuration option in unsupported environments.
- APM: Trace payloads are now compressed with zstd by default.
Security Notes
- Bump embedded Python version to 3.12.6 to address CVE-2024-4030 and CVE-2024-4741.
- Update cURL to 8.9.1.
- Update OpenSSL to 3.3.2 (on Linux & macOS) in order to mitigate CVE-2024-6119.
Bug Fixes
- Adds missing support for the logs config key to work with AD annotations V2.
- Fix
agent jmx [command]
subcommands for container environments with annotations-based configs. - Fixed issue with openSUSE 15 RC 6 where the eBPF tracer wouldn't start due to a failed validation of the
tcp_sendpage
probe. - Fixed a rare issue where short-lived containers could cause logs to be sent with the wrong container ID.
- Fix Windows Process Agent argument stripping to account for spaces in the executable path.
- Fixes issue with the kubelet corecheck where kubernetes.kubelet.volume.* metrics were not properly being reported if any matching namespace exclusion filter was present.
- OOM Kill Check now reports the cgroup name of the victim process rather than the triggering process.
- The process agent will no longer exit prematurely when language detection is enabled or when there is a misconfiguration stemming from process_config.run_in_core_agent.enabled's default enablement in Kubernetes.
- Change the
datadog-security-agent
Windows service display name fromDatadog Security Service
toDatadog Security Agent
for consistency with other Agent services. - Fix a bug preventing SNMP V3 reconnection.
Other Notes
- Add metric origins for the Kubeflow integration.
- Add functional tests to Oracle using a Docker service to host the database instance.
- Adds Agent telemetry for Oracle collector.
Datadog Cluster Agent
Prelude
Released on: 2024-10-21 Pinned to datadog-agent v7.58.0: CHANGELOG.
New Features
- Added capability to tag any Kubernetes resource based on labels and annotations. This feature can be configured with kubernetes_resources_annotations_as_tags and kubernetes_resources_labels_as_tags. These feature configurations are associate group resources with annotations-to-tags (or labels-to-tags) map For example, deployments.apps can be associated with an annotations-to-tags map to configure annotations as tags for deployments. Example: {`deployments.apps`: {`annotationKey1`: tag1, `annotationKey2`: tag2}}
- The Kubernetes State Metrics (KSM) check can now be configured to collect pods from the Kubelet in node agents instead of collecting them from the API Server in the Cluster Agent or the Cluster check runners. This is useful in clusters with a large number of pods where emitting pod metrics from a single check instance can cause performance issues due to the large number of metrics emitted.
Enhancement Notes
- Added a new option for the Cluster Agent ("admission_controller.inject_config.type_socket_volumes") to specify that injected volumes should be of type "Socket". This option is disabled by default. When set to true, injected pods will not start until the Agent creates the DogstatsD and trace-agent sockets. This ensures no traces or DogstatsD metrics are lost, but it can cause the pod to wait if the Agent has issues creating the sockets.
Bug Fixes
- Fixed an issue that prevented the Kubernetes autoscaler from evicting pods injected by the Admission Controller.
7.57.2
Prelude
Release on: 2024-09-24
Enhancement Notes
- Agents are now built with Go
1.22.7
.
Bug Fixes
- Fix OOM error with cluster agent auto instrumentation by increasing default memory request from 20Mi to 100Mi.
- Fixes a panic caused by running the Agent on readonly filesystems. The Agent returns integration launchers and handles memory gracefully.
7.57.1
Agent
7.57.1
Prelude
Release on: 2024-09-17
- Please refer to the 7.57.1 tag on integrations-core for the list of changes on the Core Checks
Bug Fixes
- APM: When the UDS listener cannot be created on the trace-agent, the process will log the error, instead of crashing.
- Fixes memory leak caused by container check.
Datadog Cluster Agent
7.57.1
Prelude
Released on: 2024-09-17 Pinned to datadog-agent v7.57.1: CHANGELOG.
7.57.0
Agent
7.57.0
Known bugs
- ECS Fargate deployments may cause increases in RAM and CPU usage. For more information see #27523.
Prelude
Release on: 2024-09-09
- Please refer to the 7.57.0 tag on integrations-core for the list of changes on the Core Checks
Upgrade Notes
- Update cURL to 8.9.1.
- Update OpenSSL from 3.0.14 to 3.3.1 (on Linux and macOS).
New Features
- The agent diagnose command now includes a
--json
option to output the results in JSON format. - Add integration value for device metadata.
- APM: In order to allow for automatic instrumentation to work in Kubernetes clusters that enforce a
Restricted
Pod Security Standard, which require all containers to explicitly set asecurityContext
, an option to configure a securityContext to be used for allinitContainers
created by the auto instrumentation has been added. | This can be done through theDD_ADMISSION_CONTROLLER_AUTO_INSTRUMENTATION_INIT_SECURITY_CONTEXT
environment value, oradmission_controller.auto_instrumentation.init_security_context
configuration -in both cases ajson
string should be supplied. - Adds a kube_runtime_class tag to metrics associated with Kubernetes pods and their containers.
- Expose the Agent's get host tags function to python checks using the new datadog_agent.get_host_tags method.
- Implement static allowlist of Kubernetes events to send by default. This feature is only enabled when
filtering_enabled
is set totrue
in thekubernetes_apiserver
integration configuration. - Adds a new launcher to handle incoming logs from integtrations.
- Add optional reverse DNS enrichment of private IP addresses to NDM NetFlow.
- On Windows, the default value for the service inference feature is now enabled.
Enhancement Notes
- Turn on Orchestrator Explorer by default in the core agent
- Added new source_host tag to TCP/UDP logs to help users understand where their logs came from.
- Added support to handling UDP/TCP Logs when running the containerized agent.
- APM: Allow custom HTTP client to be provided when instantiating the trace-agent configuration. This feature is primarily intended for the OpenTelemetry exporter.
- APM: Add default UDS listeners for traces (trace-agent) and dogstatsd (core-agent) on /var/run/datadog/apm.socket and /var/run/datadog/dsd.socket, respectively. These are used in the Single Step APM Instrumentation, improving the onboarding experience and minimizing the agent configuration.
- For the [Inferred Service Dependencies beta](https://docs.datadoghq.com/tracing/guide/inferred-service-opt-in/?tab=java), add two new peer.hostname precursor attributes, out.host and dns.hostname. This will improve coverage of inferred services because some tracer integrations only place the peer hostname in one of those attributes.
- APM stats for internal service overrides are now aggregated by the _dd.base_service tag only, enhancing visibility into specific base services.
- Include spans with span.kind=consumer for aggregation of stats on peer tags.
- IP address quantization on all peer tags is done the backend during ingestion. This change updates the Agent to apply the same IP address quantization. This reduces unnecessary aggregation that is currently done on raw IP addresses. And therefore, improves the aggregation performance of stats on peer tags.
- APM: Add new setting to disable the HTTP receiver in the trace-agent. This setting should almost never be disabled and is only a convenience parameter for OpenTelemetry extensions. Disabling the receiver is semantically equivalent to setting the receiver_port to 0 and receiver_socket to "".
- Agents are now built with Go
1.22.6
. - [NDM] Adds the option to collect BGP neighbors metrics from Cisco SD-WAN.
- [NDM] Add option to collect cloud application metrics from Cisco SD-WAN.
- [Cisco SD-WAN] Allow enabling/disabling metrics collection.
- Report the hostname of Kubernetes events based on the associated pod that the event relates to.
- Introduces a parser to extract tags from integration logs and attach them to outgoing logs.
- Implement External Data environment variable injection in the Admission Controller. Format for this new environment variable is it-INIT_CONTAINER,cn-CONTAINER_NAME,pu-POD_UID. This new variable is needed for the New Origin Detection spec. It is used for Origin Detection in case Local Data are unavailable, for example with Kata Containers and CGroups v2.
- Upgraded JMXFetch to 0.49.3 which adds support for jsr77 j2ee statistics and custom ConnectionFactory. See 0.49.3 for more details.
- Windows Agent Installer gives a better error message when a gMSA account is provided for
ddagentuser
that Windows does not recognize. - Uninstalling the Windows Agent MSI Installer removes specific subdirectories of the install path to help prevent data loss when
PROJECTLOCATION
is misconfigured to an existing directory. - Adds a default upper limit of 10000 to the number of network traffic paths that are captured at a single time. The user can increase or decrease this limit as needed.
- Language detection can run on the core Agent without needing a gRPC server.
- Add Hostname and ExtraTags to CollectorECSTask.
- Collect SystemInfo for Pods and ECS Tasks.
- Implement API that allows Python checks to send logs for eventual submission.
- Users can use
DD_ORCHESTRATOR_EXPLORER_CUSTOM_SENSITIVE_ANNOTATIONS_LABELS
to remove sensitive annotations and labels. For example:DD_ORCHESTRATOR_EXPLORER_CUSTOM_SENSITIVE_ANNOTATIONS_LABELS="sensitive-key-1 sensitive-key-2"
. Keys should be separated by spaces. The agent removes any annotations and labels matching these keys. - Add the ability to tag interface metrics with user-defined tags.
Security Notes
- Fix CVE-2024-41110.
Bug Fixes
- Results of agent config did not reflect the actual runtime config for the other services. This will have other Datadog Agent services (e.g. trace-agent) running as a systemd service read the same environment variables from a text file /etc/datadog-agent/environment as the core Agent process.
- [DBM] Bump go-sqllexer to 0.0.13 to fix a bug where the table name is incorrectly collected on PostgreSQL SELECT ONLY statement.
- [Cisco SD-WAN] Do not collect unspecified IP addresses.
- Fix container.net.* metrics accuracy on Linux. Currently container.net.* metrics are always emitted with high cardinality tags while the values may not represent actual container-level values but POD-level values (multiple containers in a pod) or host-level values (containers running in host network). With this bug fix, the container.net.* metrics aren't emitted for containers running in host network and a single timeseries is emitted by pods when running multiple containers. Finally, in non-Kubernetes environments, if multiple containers share the same network namespace, container.net.* metrics won't be emitted.
- Fix duplicate logging in Process Agent component's Enabled() method.
- Fixed bug in kubelet check when running in core agent that was causing kubernetes.kubelet.container.log_filesystem.used_bytes to be reported by the check for excluded/non-existing containers. The metric was being reported in this case without tags. This bug does not exist in the python integration version of the kubelet check.
- Fixes a bug on Windows in the driver installation custom actions that could prevent rollback from working properly if an installation failed or was canceled.
- Update pro-bing library to include fix for a Windows specific issue with large ICMP packets
- [oracle] Fix wrong durations for cloud databases.
- Stop chunking outputs in manual checks for container, process, and process_discovery checks to allow JSON unmarshaler to parse output.
- Remove the original pod annotation on consul
- Fix pod status for pods using native sidecars.
- Fix a regression where the Agent would fail to start on systems with SysVinit.
- APM: Fixes issue where the number of HTTP decoders was incorrectly set if setting GOMAXPROCS to milli-cpu values.
Other Notes
- Add metrics origins for vLLM integration.
- Add deprecation warnings when running process checks on the Process Agent in Linux. This change prepares for the deprecation of processes and container collection in the Process Agent, occurring in a future release.
- Add metric origin for the AWS Neuron integration
Datadog Cluster Agent
7.57.0
Prelude
Released on: 2024-09-09 Pinned to datadog-agent v7.57.0: CHANGELOG.
New Features
- The Cluster Agent now supports activating Continuous Profiling using Admission Controller.
LimitRange
andStorageClass
resources are now collected by the orchestrator check.
Enhancement Notes
- The auto-ins...