Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Token usage metrics adding up strangely #1609

Open
habuma opened this issue Oct 28, 2024 · 1 comment
Open

Token usage metrics adding up strangely #1609

habuma opened this issue Oct 28, 2024 · 1 comment
Assignees
Labels
bug Something isn't working Observability

Comments

@habuma
Copy link
Member

habuma commented Oct 28, 2024

This feels like a bug to me, but it may be expected behavior. Reporting it in case it is in fact a problem.

I was inspecting the /actuator/metrics/gen_ai.client.token.usage endpoint and found that the total number of tokens used were 5336.

So then I pivoted on the "gen_ai.operation.name" tag to find out how many of those are for "chat" (as opposed to "embedding"). I got 5206.

Then I wondered how many of those are input tokens. So I applied the "gen_ai.token.type" tag along with "gen_ai.operation.name" to find out that there were 2562 input tokens. Great! So then should I assume that all of the other tokens are output tokens? That is, should I expect that there were 5206-2562=2644 output tokens?

No. When I asked for chat output tokens (by changing the "gen_ai.token.type" tag to "output"), I got back only 41.

So...scratching my head a little before realizing that the original total of 5206 chat tokens is actually the sum of input + outout + total tokens. In effect, the total number of tokens given without using the "gen_ai.token.type" tag is double what it should be. The actual total should be 2603.

Again, this may be expected behavior. I get it that if I ask for "gen_ai.token.type" with "total", I will get back 2603, which is the actual total. But it's a bit misleading to ask for "gen_ai.operation.name" for "chat" without also using "gen_ai.token.type" gives me double the actual count of tokens.

@asaikali asaikali added bug Something isn't working Observability labels Oct 30, 2024
@markpollack markpollack added this to the 1.0.0-M4 milestone Nov 5, 2024
@tzolov
Copy link
Contributor

tzolov commented Nov 6, 2024

Hi @habuma ,

I get why it might feel strange, but the behavior you're seeing is expected - Spring AI's metrics handler simply reports what the model provides without additional processing.

The total token count comes directly from the model API response, even though it may seem redundant given we have input and output counts. While unclear why models include this explicit total, we need to preserve this information in our metrics.

Regarding metrics organization: Unlike JMX's hierarchical structure, time-series databases like Prometheus use labels/tags for aggregation, which is why we keep all usage metrics grouped together rather than splitting them out.

Moving the total usage metrics outside the usage metrics wold feel strange for TSDB.

@tzolov tzolov removed this from the 1.0.0-M4 milestone Nov 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Observability
Projects
None yet
Development

No branches or pull requests

4 participants