Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

metrics default error #32

Open
CarlosFdez77 opened this issue Oct 19, 2023 · 8 comments
Open

metrics default error #32

CarlosFdez77 opened this issue Oct 19, 2023 · 8 comments

Comments

@CarlosFdez77
Copy link

I have problems with some checks against an autonomous database in the standard metrics.
The exporter is deployed in Kubernetes.

I also have the same problem in custom checks such as slow queries.

caller=collector.go:326 level=error Errorscrapingfor=resource _="unsupported value type" 5.687861ms=:
caller=collector.go:326 level=error Errorscrapingfor=wait_time _="unsupported value type" 5.207271ms=:

@markxnelson
Copy link
Member

Hi, thanks for reporting this issue!

Most of the time that message seems to occur when the user does not have permission to run the query, or when the query is only suitable for a CDB and is being run in a PDB, as in the case with ADB.

If you could provide me your custom metrics file, and confirm you are using the normal "out-of-the-box" standard metrics, I can try to debug this for you. Please also confirm what version your ADB instance is, and what user you are using to connect to it?

Thanks

@CarlosFdez77
Copy link
Author

CarlosFdez77 commented Oct 20, 2023

Hi Mark, thanks for your time.
I thought this export was functional with ADB, in our case we used version 19.

Regarding the default metrics, I have a problem with two, which causes a lot of noise in the pod log:

ts=2023-10-20T05:56:42.466Z caller=collector.go:326 level=error Errorscrapingfor=wait_time _="unsupported value type" 4.801915ms=: ts=2023-10-20T05:56:42.468Z caller=collector.go:326 level=error Errorscrapingfor=resource _="unsupported value type" 7.01722ms=: ts=2023-10-20T05:57:02.467Z caller=collector.go:326 level=error Errorscrapingfor=wait_time _="unsupported value type" 5.563066ms=: ts=2023-10-20T05:57:02.469Z caller=collector.go:326 level=error Errorscrapingfor=resource _="unsupported value type" 7.338398ms=:

First of all, let me tell you that I am not an expert in BBDD.
Regarding permits; I have included what the documentation indicates for the default metrics:

V_$LOG V_$PROCESS V_$SESSION V_$SYSSTAT V_$INSTANCE V_$DATAFILE V_$SYSTEM_WAIT_CLASS V_$RESOURCE_LIMIT V_$WAITCLASSMETRIC V_$ASM_DISKGROUP_STAT DBA_FREE_SPACE DBA_DATA_FILES DBA_TABLESPACES DBA_TEMP_FILES V_$TEMP_EXTENT_POOL V_$TEMP_SPACE_HEADER DBA_TABLESPACE_USAGE_METRICS

I have no answer with the "wait_time" and "resource" metrics.
In "wait_time" I see that the v$waitclassmetric view is empty, is this normal in ADB?
SELECT n.wait_class as WAIT_CLASS, round(m.time_waited/m.INTSIZE_CSEC,3) as VALUE FROM v$waitclassmetric m, v$system_wait_class n WHERE m.wait_class_id=n.wait_class_id AND n.wait_class != 'Idle'

And in "resource" likewise the view v$resource_limit does not return data either.

SELECT resource_name,current_utilization,CASE WHEN TRIM(limit_value) LIKE 'UNLIMITED' THEN '-1' ELSE TRIM(limit_value) END as limit_value FROM v$resource_limit

Thank you very much for your interest,
Carlos.

@CarlosFdez77
Copy link
Author

CarlosFdez77 commented Oct 20, 2023

My deploy is simple:

apiVersion: apps/v1 kind: Deployment metadata: name: oracle-metrics-exporter namespace: monitoring spec: replicas: 1 selector: matchLabels: app: oracle-metrics-exporter template: metadata: labels: app: oracle-metrics-exporter spec: containers: - name: oracle-metrics-exporter image: container-registry.oracle.com/database/observability-exporter:1.0.0 imagePullPolicy: Always env: # uncomment and customize the next item if you want to provide custom metrics definitions - name: CUSTOM_METRICS value: /oracle/observability/custom-metrics.toml #- name: TNS_ADMIN # value: "/oracle/tns_admin" - name: DB_USERNAME valueFrom: secretKeyRef: name: oracle-db-secret key: username optional: false - name: DB_PASSWORD valueFrom: secretKeyRef: name: oracle-db-secret key: password optional: false # update the connect string below for your database - can be simple format, or use a tns name as shown: - name: DB_CONNECT_STRING value: "(description=(retry_count=20)(retry_delay=3)(address=(protocol=tcps)(port=1521)(host=**********.adb.eu-frankfurt-1.oraclecloud.com))(connect_data=(service_name=***************.adb.oraclecloud.com))(security=(ssl_server_dn_match=no)))" volumeMounts: #- name: tns-admin # mountPath: /oracle/tns_admin # uncomment and customize the next item if you want to provide custom metrics definitions - name: config-volume mountPath: /oracle/observability/custom-metrics.toml subPath: custom-metrics.toml resources: requests: memory: "64Mi" cpu: "100m" limits: memory: "128Mi" cpu: "500m" ports: #- containerPort: 8080 - containerPort: 9161 restartPolicy: Always volumes: #- name: tns-admin # configMap: # name: db-metrics-tns-admin # uncomment and customize the next item if you want to provide custom metrics definitions - name: config-volume configMap: name: db-metrics-txeventq-exporter-config

@markxnelson
Copy link
Member

thanks for the info. we are investigating this. some initial comments from my DBA colleague (fyi);

v$waitclassmetric is capturing real-time events within the past 1-minute; its _history companion view will hold that data for up to an hour. Being empty is a sign that the database is not being hit hard. I don't see anything that says that view will not populate in an ADB-S environment (they are container aware so there's nothing that would be exposed from other PDBs in the same shared CDB which would warrant disabling).

Given ADBs are on Exa it will probably take a lot to make an entry in the view ... so "no rows" output can be expected. However, I did generate a massive amount of I/O to try and trigger a wait event in that view and nothing came of it; checking with ADB team if this is expected

v$resource_limit : It looks like it can now only be queried from the CDB; so for ADB's this will always return 0 rows and will only return rows if queried from the CDB in non-ADBs.

@markxnelson
Copy link
Member

i am going to look at handling this better and supressing the spourious messages

@CarlosFdez77
Copy link
Author

OK thanks.
I understand there will be a new version of the image at some point. correct?

@markxnelson
Copy link
Member

Yes indeed, we are just doing some testing and then will put out an update.

@markxnelson
Copy link
Member

hi, i did put out a 1.1 release. it will still report when it cannot get a metric, but the output is more useful now, i hope. they look like this now... i was thinking maybe it should be a warning not an error, but its a bit difficult when the query works but no rows are returned to know if that is a good thing or a bad thing. if the query fails its clearly an error.

ts=2023-10-27T16:56:19.884Z caller=collector.go:327 level=error msg="Error scraping metric" Context=ownership MetricsDesc="map[inst_id:Owner instance of the current queues.]" time=1.502612ms error="no metrics found while parsing, query returned no rows"

i am looking into how to throttle these messages - suppress duplicates. i don't think the logging library i am using supports that, so i need to figure out if i should swap to a different one, or build something in.

i also changed the wait_class query so it should work fine in both pdbs and cdbs now - but it will still return no rows unless the db is under stress.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants