metrics default error #32

CarlosFdez77 · 2023-10-19T07:29:27Z

I have problems with some checks against an autonomous database in the standard metrics.
The exporter is deployed in Kubernetes.

I also have the same problem in custom checks such as slow queries.

caller=collector.go:326 level=error Errorscrapingfor=resource _="unsupported value type" 5.687861ms=:
caller=collector.go:326 level=error Errorscrapingfor=wait_time _="unsupported value type" 5.207271ms=:

markxnelson · 2023-10-19T12:16:31Z

Hi, thanks for reporting this issue!

Most of the time that message seems to occur when the user does not have permission to run the query, or when the query is only suitable for a CDB and is being run in a PDB, as in the case with ADB.

If you could provide me your custom metrics file, and confirm you are using the normal "out-of-the-box" standard metrics, I can try to debug this for you. Please also confirm what version your ADB instance is, and what user you are using to connect to it?

Thanks

CarlosFdez77 · 2023-10-20T07:18:32Z

Hi Mark, thanks for your time.
I thought this export was functional with ADB, in our case we used version 19.

Regarding the default metrics, I have a problem with two, which causes a lot of noise in the pod log:

ts=2023-10-20T05:56:42.466Z caller=collector.go:326 level=error Errorscrapingfor=wait_time _="unsupported value type" 4.801915ms=: ts=2023-10-20T05:56:42.468Z caller=collector.go:326 level=error Errorscrapingfor=resource _="unsupported value type" 7.01722ms=: ts=2023-10-20T05:57:02.467Z caller=collector.go:326 level=error Errorscrapingfor=wait_time _="unsupported value type" 5.563066ms=: ts=2023-10-20T05:57:02.469Z caller=collector.go:326 level=error Errorscrapingfor=resource _="unsupported value type" 7.338398ms=:

First of all, let me tell you that I am not an expert in BBDD.
Regarding permits; I have included what the documentation indicates for the default metrics:

V_$LOG V_$PROCESS V_$SESSION V_$SYSSTAT V_$INSTANCE V_$DATAFILE V_$SYSTEM_WAIT_CLASS V_$RESOURCE_LIMIT V_$WAITCLASSMETRIC V_$ASM_DISKGROUP_STAT DBA_FREE_SPACE DBA_DATA_FILES DBA_TABLESPACES DBA_TEMP_FILES V_$TEMP_EXTENT_POOL V_$TEMP_SPACE_HEADER DBA_TABLESPACE_USAGE_METRICS

I have no answer with the "wait_time" and "resource" metrics.
In "wait_time" I see that the v$waitclassmetric view is empty, is this normal in ADB?
SELECT n.wait_class as WAIT_CLASS, round(m.time_waited/m.INTSIZE_CSEC,3) as VALUE FROM v$waitclassmetric m, v$system_wait_class n WHERE m.wait_class_id=n.wait_class_id AND n.wait_class != 'Idle'

And in "resource" likewise the view v$resource_limit does not return data either.

SELECT resource_name,current_utilization,CASE WHEN TRIM(limit_value) LIKE 'UNLIMITED' THEN '-1' ELSE TRIM(limit_value) END as limit_value FROM v$resource_limit

Thank you very much for your interest,
Carlos.

CarlosFdez77 · 2023-10-20T07:25:02Z

My deploy is simple:

apiVersion: apps/v1 kind: Deployment metadata: name: oracle-metrics-exporter namespace: monitoring spec: replicas: 1 selector: matchLabels: app: oracle-metrics-exporter template: metadata: labels: app: oracle-metrics-exporter spec: containers: - name: oracle-metrics-exporter image: container-registry.oracle.com/database/observability-exporter:1.0.0 imagePullPolicy: Always env: # uncomment and customize the next item if you want to provide custom metrics definitions - name: CUSTOM_METRICS value: /oracle/observability/custom-metrics.toml #- name: TNS_ADMIN # value: "/oracle/tns_admin" - name: DB_USERNAME valueFrom: secretKeyRef: name: oracle-db-secret key: username optional: false - name: DB_PASSWORD valueFrom: secretKeyRef: name: oracle-db-secret key: password optional: false # update the connect string below for your database - can be simple format, or use a tns name as shown: - name: DB_CONNECT_STRING value: "(description=(retry_count=20)(retry_delay=3)(address=(protocol=tcps)(port=1521)(host=**********.adb.eu-frankfurt-1.oraclecloud.com))(connect_data=(service_name=***************.adb.oraclecloud.com))(security=(ssl_server_dn_match=no)))" volumeMounts: #- name: tns-admin # mountPath: /oracle/tns_admin # uncomment and customize the next item if you want to provide custom metrics definitions - name: config-volume mountPath: /oracle/observability/custom-metrics.toml subPath: custom-metrics.toml resources: requests: memory: "64Mi" cpu: "100m" limits: memory: "128Mi" cpu: "500m" ports: #- containerPort: 8080 - containerPort: 9161 restartPolicy: Always volumes: #- name: tns-admin # configMap: # name: db-metrics-tns-admin # uncomment and customize the next item if you want to provide custom metrics definitions - name: config-volume configMap: name: db-metrics-txeventq-exporter-config

markxnelson · 2023-10-25T13:27:27Z

thanks for the info. we are investigating this. some initial comments from my DBA colleague (fyi);

v$waitclassmetric is capturing real-time events within the past 1-minute; its _history companion view will hold that data for up to an hour. Being empty is a sign that the database is not being hit hard. I don't see anything that says that view will not populate in an ADB-S environment (they are container aware so there's nothing that would be exposed from other PDBs in the same shared CDB which would warrant disabling).

Given ADBs are on Exa it will probably take a lot to make an entry in the view ... so "no rows" output can be expected. However, I did generate a massive amount of I/O to try and trigger a wait event in that view and nothing came of it; checking with ADB team if this is expected

v$resource_limit : It looks like it can now only be queried from the CDB; so for ADB's this will always return 0 rows and will only return rows if queried from the CDB in non-ADBs.

markxnelson · 2023-10-25T13:28:27Z

i am going to look at handling this better and supressing the spourious messages

CarlosFdez77 · 2023-10-26T06:53:02Z

OK thanks.
I understand there will be a new version of the image at some point. correct?

markxnelson · 2023-10-26T12:08:38Z

Yes indeed, we are just doing some testing and then will put out an update.

markxnelson · 2023-10-27T16:57:53Z

hi, i did put out a 1.1 release. it will still report when it cannot get a metric, but the output is more useful now, i hope. they look like this now... i was thinking maybe it should be a warning not an error, but its a bit difficult when the query works but no rows are returned to know if that is a good thing or a bad thing. if the query fails its clearly an error.

ts=2023-10-27T16:56:19.884Z caller=collector.go:327 level=error msg="Error scraping metric" Context=ownership MetricsDesc="map[inst_id:Owner instance of the current queues.]" time=1.502612ms error="no metrics found while parsing, query returned no rows"

i am looking into how to throttle these messages - suppress duplicates. i don't think the logging library i am using supports that, so i need to figure out if i should swap to a different one, or build something in.

i also changed the wait_class query so it should work fine in both pdbs and cdbs now - but it will still return no rows unless the db is under stress.

markxnelson mentioned this issue Oct 25, 2023

update wait_class query; add vault; cleanup logs; update some deps #34

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

metrics default error #32

metrics default error #32

CarlosFdez77 commented Oct 19, 2023

markxnelson commented Oct 19, 2023

CarlosFdez77 commented Oct 20, 2023 •

edited

Loading

CarlosFdez77 commented Oct 20, 2023 •

edited

Loading

markxnelson commented Oct 25, 2023

markxnelson commented Oct 25, 2023

CarlosFdez77 commented Oct 26, 2023

markxnelson commented Oct 26, 2023

markxnelson commented Oct 27, 2023

metrics default error #32

metrics default error #32

Comments

CarlosFdez77 commented Oct 19, 2023

markxnelson commented Oct 19, 2023

CarlosFdez77 commented Oct 20, 2023 • edited Loading

CarlosFdez77 commented Oct 20, 2023 • edited Loading

markxnelson commented Oct 25, 2023

markxnelson commented Oct 25, 2023

CarlosFdez77 commented Oct 26, 2023

markxnelson commented Oct 26, 2023

markxnelson commented Oct 27, 2023

CarlosFdez77 commented Oct 20, 2023 •

edited

Loading

CarlosFdez77 commented Oct 20, 2023 •

edited

Loading