Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Go metrics go_memstats_alloc_bytes and go_memstats_alloc_bytes_total collide in OpenMetrics and are still both exposed #829

Open
hairyhenderson opened this issue Dec 30, 2020 · 12 comments

Comments

@hairyhenderson
Copy link
Contributor

I received a bug report caddyserver/caddy#3940 which seems to be caused by 2 metrics exposed by the Go collector: go_memstats_alloc_bytes and go_memstats_alloc_bytes_total - in effect, these names seem to be ambiguous with respect to the OpenMetrics "magic" _total suffix.

This seems to be related to prometheus/common#253 (comment), though the conclusion in that issue seems to be "it's a bad idea, but won't break anything".

The metrics in question are defined here: https://github.com/prometheus/client_golang/blob/6007b2b5cae01203111de55f753e76d8dac1f529/prometheus/go_collector.go#L88..L104

I'm not super certain how to resolve this without removing or renaming one of the metrics, given that the python client seems to be behaving correctly with respect to the OpenMetrics spec (specifically https://github.com/OpenObservability/OpenMetrics/blob/master/specification/OpenMetrics.md#suffixes).

At the very least there seems to be an inconsistency between the Python and Go implementations...

LMK if there's anything I'm missing - on vacation so my brain isn't running in high-gear 😉

@brian-brazil
Copy link
Contributor

Your issue here is that you tried to parse Prometheus text format with the OpenMetrics parser, so there is no bug here.

However, one of these will need a rename when client_golang gains full OpenMetrics support in a future version.

@hairyhenderson
Copy link
Contributor Author

Your issue here is that you tried to parse Prometheus text format with the OpenMetrics parser, so there is no bug here.

🤔 My impression was that setting the Accept: application/openmetrics-text; version=0.0.1 header when the promhttp.HeaderOpt EnableOpenMetrics option is set to true would produce output in the OpenMetrics text format (the Content-Type response header certainly implies it).

Looks like the python client isn't relevant to this issue - seems the Go parser behaves the same way:

Here's some test code that demonstrates the issue:

package main

import (
	"fmt"
	"net/http"
	"net/http/httptest"

	"github.com/prometheus/client_golang/prometheus"
	"github.com/prometheus/client_golang/prometheus/promhttp"
	"github.com/prometheus/common/expfmt"
)

func main() {
	h := promhttp.InstrumentMetricHandler(prometheus.DefaultRegisterer,
		promhttp.HandlerFor(prometheus.DefaultGatherer, promhttp.HandlerOpts{EnableOpenMetrics: true}),
	)
	srv := httptest.NewServer(h)
	defer srv.Close()

	req, _ := http.NewRequest(http.MethodGet, srv.URL, nil)
	req.Header.Set("Accept", "application/openmetrics-text; version=0.0.1")

	resp, _ := srv.Client().Do(req)

	fmt.Printf("%s %d\n", resp.Proto, resp.StatusCode)
	fmt.Printf("Header:\n")
	for k, v := range resp.Header {
		fmt.Printf("\t%s: %v\n", k, v)
	}

	parser := &expfmt.TextParser{}
	_, err := parser.TextToMetricFamilies(resp.Body)
	if err != nil {
		panic(err)
	}
}

When I run I see:

$ go run .
HTTP/1.1 200
Header:
        Content-Type: [application/openmetrics-text; version=0.0.1; charset=utf-8]
        Date: [Wed, 30 Dec 2020 16:26:08 GMT]
panic: text format parsing error in line 19: second HELP line for metric name "go_memstats_alloc_bytes"

goroutine 1 [running]:
main.main()
        /Users/hairyhenderson/go/src/github.com/hairyhenderson/temp/main.go:34 +0x5d1
exit status 2

However, one of these will need a rename when client_golang gains full OpenMetrics support in a future version.

My impression based on the above is that even the partial OpenMetrics support present now is incompatible with the output that the Go collector produces... I'm confused now about how this works when Prometheus scrapes this output - though perhaps it doesn't use common/expfmt for parsing?

@brian-brazil
Copy link
Contributor

That code example is using the Prometheus text format parser with OpenMetrics input, which can also run into issues. This still doesn't indicate a bug, as you're conflating the two formats.

@brian-brazil
Copy link
Contributor

While your examples don't directly demonstrate it, there is an issue there in the OpenMetrics output:

# HELP go_memstats_alloc_bytes Number of bytes allocated and still in use.
# TYPE go_memstats_alloc_bytes gauge
go_memstats_alloc_bytes 647440.0
# HELP go_memstats_alloc_bytes Total number of bytes allocated, even if freed.
# TYPE go_memstats_alloc_bytes counter
go_memstats_alloc_bytes_total 647440.0

as there's a repeated metric family name.

@hairyhenderson
Copy link
Contributor Author

That code example is using the Prometheus text format parser

Ah - I think I follow... common/expfmt's TextParser.TextToMetricFamilies is parsing for Prometheus format explicitly? My impression was that it would parse according to the OpenMetrics rules given OpenMetrics negotiation.

Is there a Go equivalent for the python prometheus_client.openmetrics.parser? I presume that is an OpenMetrics-format parser?

Also, to answer my earlier confusion about how Promtheus itself works - it has its own parser which doesn't seem to care about duplicate names.

While your examples don't directly demonstrate it, there is an issue there in the OpenMetrics output:

Isn't that why the error text format parsing error in line 19: second HELP line for metric name "go_memstats_alloc_bytes" is given?

That seems to be caused by https://github.com/prometheus/common/blob/master/expfmt/openmetrics_create.go#L92, where the _total suffix is stripped for the HELP line.

@brian-brazil
Copy link
Contributor

Ah - I think I follow... common/expfmt's TextParser.TextToMetricFamilies is parsing for Prometheus format explicitly?

Yes. A full Go OpenMetrics parser has yet to be written.

Isn't that why the error text format parsing error in line 19: second HELP line for metric name "go_memstats_alloc_bytes" is given?

Kinda, trying to parse one format as another format isn't exactly wise and something else may break instead. The original report would have failed for other reasons for example, as typical Prometheus format output is not valid OpenMetrics output..

@beorn7
Copy link
Member

beorn7 commented Dec 31, 2020

Parser shenanigans aside, the actual issue here is that the standard metrics exposed by the Go collector are fine with the old Prometheus formats, but run into a known issue with OpenMetrics: Since OpenMetrics shaves off the _total from a counter's name when it comes to the metric family name, there is a collision if there is also a gauge with the same name as the counter's name without the _total. This was always well understood, but only just now I realize that we have an instance of this in the metrics almost every instrumented Go binary exposes. That changes the problem from an edge case to a highly relevant one.

For full OpenMetrics support, we need to change the metric name. But what to do right now? The current preliminary OpenMetrics implementation serves invalid OpenMetrics format. Interestingly, Prometheus is "fine" with it: It ingests both metrics. For the metadata API, it drops the gauge and keeps the counter (which I guess is just because the counter comes later in the exposition and overrides the gauge).

We could just leave things as is, accepting that the OpenMetrics format exposed is technically invalid (while Prometheus currently handles it in a way that at least doesn't break anyone when activating OpenMetrics).

Or we could fix the format by either not exposing one of the metrics or by renaming one of the metrics whenever OpenMetrics is negotiated. However, this would break everyone that relies on the metrics as they are now.

@beorn7 beorn7 changed the title Clashing names when parsing with OpenMetrics python client Go metrics go_memstats_alloc_bytes and go_memstats_alloc_bytes_total collide in OpenMetrics and are still both exposed Dec 31, 2020
@brian-brazil
Copy link
Contributor

We could for now always expose counters as unknown for OM, or being smarter about when we have them as counters rather than unknown.

@beorn7
Copy link
Member

beorn7 commented Jan 2, 2021

It made me quite happy that finally tools like Grafana start to make use of metadata in tooltips etc. Obfuscating counters as unknown is not ideal for them.

@brian-brazil
Copy link
Contributor

There is the option of only to switching to unknown if there is a collision.

@beorn7
Copy link
Member

beorn7 commented Jan 4, 2021

Yes, that's what I had in mind. It's still not ideal… 😞

@brian-brazil
Copy link
Contributor

Yeah, but I think it's the best we can do currently.

We might also want to decide on a new name for one of the metrics now and implement it as a duplicate, as that'll make the future transition easier. alloc->allocated maybe?

@beorn7 beorn7 added this to the OpenMetrics support in v1 milestone May 21, 2021
romain-dartigues added a commit to orange-cloudfoundry/promfetcher that referenced this issue Dec 15, 2022
Création d'une nouvelle version de l'API (`/v2`) qui change l'entête HTTP
`Accept` envoyée aux exporters des clients car la bibliothèque Go que nous utilisons
afin de parser et enrichir les exporters de nos clients
ne sait pas gérer le format OpenMetrics [1]
qui peut être retournée par des exporters suivant la spécification.

Les requêtes envoyées à nos clients contiennent désormais une entête HTTP
leur indiquant la version de l'API interrogée.

[1]: prometheus/client_golang#829 (comment)
[2]: prometheus/common#214

JIRA: PAASCFY-2126
romain-dartigues added a commit to orange-cloudfoundry/promfetcher that referenced this issue Dec 15, 2022
Create a new version of the API endpoint (`/v2`).
This new version change the `Accept` HTTP header sent to the exporters requested
(from `application/openmetrics-text` … to `text/plain`) because the Go library
we use does not seems to support all OpenMetrics specificities ^1.

The requests sent to the exporters contains a new HTTP header `X-Promfetcher-API-version` with the Promfetchers API version.

^1: Java actuator does respect the OpenMetrics specifications; the double quotes in the comments is not supported for example:
> # HELP process_cpu_usage The \"recent cpu usage\" for the Java Virtual Machine process

See also:
[1]: prometheus/client_golang#829 (comment)
[2]: prometheus/common#214
@dosubot dosubot bot added the stale label Aug 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants