** This spec is deprecated please see the new versioned specs. **
This is a work in progress specification for Autometrics.
It aims to describe the full feature set of the Autometrics libraries, but it may have important details missing. We will attempt to update this document to describe the expectations across all of the language implementations.
Libraries SHOULD expose a decorator, macro, wrapper function, or use another metaprogramming technique offered by the language to instrument functions and methods in the user's source code. Ideally, the function attribute should simply be called autometrics
or Autometrics
, but libraries MAY append a suffix to the name if necessary.
Libraries MAY enable the decorator, macro, etc to apply to an entire class definition. If they do, they SHOULD provide an option for users to skip or ignore particular methods.
Libraries MAY need an initialization function.
Libraries MAY expose additional functionality for exporting metrics to Prometheus and/or other metrics collection servers. This MAY include serializing the metrics to the Prometheus text format, OpenMetrics export format, the OpenTelemetry Protocol and/or exposing the metrics on a specific port and HTTP path to be scraped.
Note: there is an open discussion about whether libraries should export metrics on a default port and path. There is another open discussion about support for pushing metrics to a collector.
Libraries SHOULD expose functionality to create objectives within the source code. Objectives can be "attached" to functions by passing the objective to the Autometrics decorator, macro, etc for one or more functions.
Objectives can relate to functions' success rate and/or latencies.
Success rate objectives add the objective.name
and objective.percentile
labels to the function.calls
metric.
Latency objectives add the objective.name
, objective.percentile
, and objective.latency_threshold
labels to the function.calls.duration
metric.
Libraries MUST support producing metrics using an OpenTelemetry library. Libraries MAY also support Prometheus client libraries and allow users to configure which one should be used to produce metrics.
Libraries MUST support exporting metrics to Prometheus, or provide documentation for how users can export the metrics from the OpenTelemetry format to the Prometheus exposition format.
Autometrics libraries MAY support attaching exemplars to the metrics generated if the underlying metrics library or libraries they use support them. See Grafana's explainer, the OpenMetrics Spec, and the OpenTelemetry Spec for more details about exemplars.
Libraries that support exemplars SHOULD integrate with popular tracing libraries and/or the OpenTelemetry library to extract exemplar fields from the context or span a given function is called within.
Libraries SHOULD support extracting the trace_id
field and attaching it as an exemplar label or attribute. Libraries MAY support extracting other fields automatically or provide the user functionality to customize which fields are used.
Autometrics uses the OpenTelemetry Metric Semantic Conventions for naming metrics, including using .
's as separators.
When the metrics are exported to Prometheus, all dot (.
) separators are replaced by underscores (_
). Suffixes are appended where required by Prometheus/OpenMetrics.
Prometheus Name:
function_calls_total
Required Labels:
function
,module
,service.name
,result
,caller
Additional Labels (if a success rate objective is attached to the given function):
objective.name
andobjective.percentile
This metric is a 64-bit monotonic counter that tracks the number of times a given function was invoked.
When this metric is exported to Prometheus, its name SHOULD be function_calls_total
, because Prometheus/OpenMetrics specifies that counters SHOULD have the _total
suffix. Note that library authors may need to append the suffix because not all Prometheus client libraries or exporters will do so.
If possible, libraries SHOULD start this counter off at zero (by incrementing the counter by 0) in order to expose the names of instrumented functions to visualization tools that use the metrics. Libraries SHOULD use as many of the labels as possible for the initial call to increment by zero, including those related to objectives and setting result="ok"
.
Prometheus Name:
function_calls_duration_seconds
Required Labels:
function
,module
,service.name
Additional labels (if a latency objective is attached to the given function):
objective.name
,objective.percentile
,objective.latency_threshold
This is a 64-bit floating point histogram that tracks the duration or latency of function calls.
It MUST track the duration in seconds (not milliseconds). Libraries using OpenTelemetry SHOULD set the units in the resource metadata.
Libraries SHOULD support the default OpenTelemetry histogram buckets as label values. Libraries MAY allow users to specify custom histogram buckets.
When this metric is exported to Prometheus, its name SHOULD be function_calls_duration_seconds
, because Prometheus/OpenMetrics specifies that metrics SHOULD include their units. Note that library authors may need to append the unit suffix because not all Prometheus client libraries or exporters will do so.
Prometheus Name:
build_info
Required Labels:
version
,commit
,branch
,service.name
This is a gauge or up/down counter.
It MUST always have the value of 1.0
.
Prometheus Name:
function_calls_concurrent
Required Labels:
function
,module
,service.name
This metric is optional. Libraries MAY provide an option to the user for enabling this on a per-function basis.
This is a gauge or "up/down counter" used for tracking concurrent calls to the specific function. When the function is initially called, the gauge is incremented by 1 and when it finishes, the value is decremented by 1.
When the metrics are exported to Prometheus, all dot (.
) separators in the label keys are replaced by underscores (_
).
Label values MAY contain any Unicode characters.
See the metrics for which labels are valid on each metric.
The Git branch of the user's project. If this information is not available, this label MAY be absent or empty (""
).
Note: there is an ongoing discussion about whether this should be replaced with multiple labels such as caller_function
and caller_module
.
The name of the function
that invoked the given function. If the caller is not known, this label MAY be absent or empty (""
).
This SHOULD refer to Autometrics-instrumented functions. Therefore, if Function A calls Function B, which calls Function C and only Functions A and C are instrumented but not B, the caller
of Function C would be Function A.
Libraries MAY make this label optional (on an opt-out basis) if collecting this information has a non-negligible performance overhead.
The short (8-byte) Git commit hash of the user's project. If this information is not available, this label MAY be absent or empty (""
).
The name of the function or method, exactly as it appears in the source code.
The fully-qualified module or file path of the function
. The combination of the function
and module
labels MUST be sufficient to uniquely identify the function within the project's source code. The exact contents of this label value are assumed to be language-specific.
Note: There is an ongoing discussion about whether the class should be added to the module
label or if there should be a separate class
label.
If a function has an SLO attached, this label contains the user-specified name of the objective. If there is no SLO attached, this label MAY be absent or empty (""
).
If a function has an SLO attached, this label specifies the percentage of requests that should return the result="ok"
OR the percentage of requests that should meet the specified objective.latency_threshold
.
The value MUST be expressed as a percentage, so 99.9% would be "99.9"
(without the %
symbol).
If there is no SLO attached, this label MAY be absent or empty (""
).
Libraries SHOULD support the following percentiles: "90"
, "95"
, "99"
, "99.9"
. Libraries MAY allow users to specify custom percentiles but care should be taken to ensure that users generate separate Prometheus recording rules for the custom percentiles.
If a function has an SLO attached, this specifies the maximum duration of function calls that are considered meeting the objective.
This MUST be specified in seconds (not milliseconds).
Libraries SHOULD support the default OpenTelemetry histogram buckets as label values. Libraries MAY allow users to specify custom latencies but care should be taken to ensure that the value of this label matches one of the histogram buckets supported by the function.calls.duration
metric.
Whether the function executed successfully or errored. An error MAY either mean that the function returned an error or that it threw an exception.
The value of this label MUST either be "ok"
or "error"
.
Libraries MAY offer users the ability to override the default behavior for determining whether the result
label should be "ok"
or "error"
, for example to allow users to treat client-side errors as "ok"
.
The logical name of a service. This matches the OpenTelemetry Service specification.
All metrics produced by a library from a given instance SHOULD use a single service.name
. All instances of a horizontally scaled service SHOULD also use the same service.name
.
Libraries SHOULD support setting the service.name
using environment variables (AUTOMETRICS_SERVICE_NAME
and OTEL_SERVICE_NAME
, with the first taking precedence if both are set). Libraries MAY also support configuring this value in an initialization function.
The version of the user's project, ideally using Semantic Versioning. It SHOULD only contain the version number and SHOULD NOT start with a v
.