Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

At least the metric apollo_router_session_count_total is violating Prometheus naming best-practices #5933

Open
frittentheke opened this issue Aug 30, 2024 · 2 comments
Labels
component/open-telemetry OTLP, Datadog, Prometheus, etc. and the integrations around it. potentially-breaking Requires an incompatible change raised by user

Comments

@frittentheke
Copy link

Describe the bug

The metric apollo_router_session_count_total is reporting

Number of currently connected clients
(https://www.apollographql.com/docs/router/configuration/telemetry/instrumentation/standard-instruments/#session)

so it's a gauge as it's increasing and decreasing with the number of connections.

According to the Prometheus metric naming best-practice (and thus consumer expectations) this should NOT have the total suffix as this is documented to be used for accumulating counts only:

Note that an accumulating count has total as a suffix, in addition to the unit if applicable.

@abernix
Copy link
Member

abernix commented Aug 30, 2024

Thanks for opening this. We agree. Unfortunately, this is one of our oldest metrics (perhaps one of the first!).

We admit we made a mistake with its naming, but changing it at this point would be a breaking change. Overall, we're trying to replace them with new better metrics that meet the requirements of Prometheus and OpenTelemetry naming conventions and best practices, but that's one at a time. We do believe that we have moved to a "good" place. We have this tracked internally as part of a different initiative, but this particular metric won't change until a major version release as we don't want to just emit twice the metrics unnecessarily either.

Hope this helps explain where we are on it! Let us know if you have any other thoughts.

@abernix abernix added component/open-telemetry OTLP, Datadog, Prometheus, etc. and the integrations around it. potentially-breaking Requires an incompatible change labels Aug 30, 2024
@abernix
Copy link
Member

abernix commented Aug 30, 2024

For what it's worth, in all the dashboards I've ever seen this on, I don't actually personally get a ton of value out of this metric versus just general request-based metrics that the router provides. That's particularly true since other infrastructure usually provides that insight in a more uniform way across a wider variety of products (e.g., Istio/Envoy/Linkerd/proxies usually cover this).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component/open-telemetry OTLP, Datadog, Prometheus, etc. and the integrations around it. potentially-breaking Requires an incompatible change raised by user
Projects
None yet
Development

No branches or pull requests

2 participants