Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expose query duration via histogram #121

Merged
merged 3 commits into from
Apr 5, 2024

Conversation

wilfriedroset
Copy link
Contributor

The following PR introduces histogram for query duration. Doing so allow to use sql_exporter for synthetic monitoring.
A user could the following configuration

misc:
  histogram_buckets:
    - .25
    - .5
    - 1
    - 2
    - 4
jobs:
  - connections:
      - postgres://localhost:5432/canary?sslmode=require
    interval: 1s
    name: mydb
    queries:
      - allow_zero_rows: true
        help: ""
        name: delete_from_canary
        query: delete from canary_check;
        values: []
      - allow_zero_rows: true
        help: ""
        name: insert_into_canary
        query: insert into canary_check values(now()) on conflict do nothing;
        values: []
      - allow_zero_rows: true
        help: ""
        name: select_from_canary
        query: select 1 as up from canary_check;
        values:
          - up
    startup_sql:
      - SET lock_timeout = 1000
      - SET idle_in_transaction_session_timeout = 100

The resulting histogram looks like this

sql_exporter_queries_total{query="delete_from_canary",sql_job="mydb"} 550
sql_exporter_queries_total{query="insert_into_canary",sql_job="mydb"} 550
sql_exporter_queries_total{query="select_from_canary",sql_job="mydb"} 550
sql_exporter_query_duration_seconds_bucket{query="delete_from_canary",sql_job="mydb",le="0.25"} 550
sql_exporter_query_duration_seconds_bucket{query="delete_from_canary",sql_job="mydb",le="0.5"} 550
sql_exporter_query_duration_seconds_bucket{query="delete_from_canary",sql_job="mydb",le="1"} 550
sql_exporter_query_duration_seconds_bucket{query="delete_from_canary",sql_job="mydb",le="2"} 550
sql_exporter_query_duration_seconds_bucket{query="delete_from_canary",sql_job="mydb",le="4"} 550
sql_exporter_query_duration_seconds_bucket{query="delete_from_canary",sql_job="mydb",le="+Inf"} 550
sql_exporter_query_duration_seconds_sum{query="delete_from_canary",sql_job="mydb"} 5.9034834720000005
sql_exporter_query_duration_seconds_count{query="delete_from_canary",sql_job="mydb"} 550
sql_exporter_query_duration_seconds_bucket{query="insert_into_canary",sql_job="mydb",le="0.25"} 550
sql_exporter_query_duration_seconds_bucket{query="insert_into_canary",sql_job="mydb",le="0.5"} 550
sql_exporter_query_duration_seconds_bucket{query="insert_into_canary",sql_job="mydb",le="1"} 550
sql_exporter_query_duration_seconds_bucket{query="insert_into_canary",sql_job="mydb",le="2"} 550
sql_exporter_query_duration_seconds_bucket{query="insert_into_canary",sql_job="mydb",le="4"} 550
sql_exporter_query_duration_seconds_bucket{query="insert_into_canary",sql_job="mydb",le="+Inf"} 550
sql_exporter_query_duration_seconds_sum{query="insert_into_canary",sql_job="mydb"} 1.3082169589999986
sql_exporter_query_duration_seconds_count{query="insert_into_canary",sql_job="mydb"} 550
sql_exporter_query_duration_seconds_bucket{query="select_from_canary",sql_job="mydb",le="0.25"} 550
sql_exporter_query_duration_seconds_bucket{query="select_from_canary",sql_job="mydb",le="0.5"} 550
sql_exporter_query_duration_seconds_bucket{query="select_from_canary",sql_job="mydb",le="1"} 550
sql_exporter_query_duration_seconds_bucket{query="select_from_canary",sql_job="mydb",le="2"} 550
sql_exporter_query_duration_seconds_bucket{query="select_from_canary",sql_job="mydb",le="4"} 550
sql_exporter_query_duration_seconds_bucket{query="select_from_canary",sql_job="mydb",le="+Inf"} 550
sql_exporter_query_duration_seconds_sum{query="select_from_canary",sql_job="mydb"} 0.7355683239999996
sql_exporter_query_duration_seconds_count{query="select_from_canary",sql_job="mydb"} 550
sql_select_from_canary{col="up",database="canary_postgresql",driver="postgres",host="localhost:5432",sql_job="mydb",user="someuser"} 1

This is of great help to monitor the latency from a client point of view and build SLO around that.
See also: https://prometheus.io/docs/practices/histograms/

Signed-off-by: Wilfried Roset <[email protected]>
Copy link
Member

@dewey dewey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very cool, thanks for the contribution! I left two small comments but otherwise it looks very nice.

config.go Outdated Show resolved Hide resolved
config.go Outdated
@@ -100,11 +114,16 @@ type CloudSQLConfig struct {

// File is a collection of jobs
type File struct {
Misc Misc `yaml:"misc,omitempty"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two questions:

  • What do you think about calling it Configuration or something to make it more explicit that it's a new place to put advanced configuration options.
  • Would it make sense to have this part job specific and not globally available? I could imagine that there's maybe jobs that take longer where you want different buckets than just for a quick select

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about calling it Configuration or something to make it more explicit that it's a new place to put advanced configuration options.

LGTM

Would it make sense to have this part job specific and not globally available? I could imagine that there's maybe jobs that take longer where you want different buckets than just for a quick select

If my understanding is correct the bucket definition is done at the histogram definition. If we want to have the bucket definition at the job level we need to define multiple histogram which can severely impact the cardinality. I would recommend waiting for the native histogram which are design to address such issue.

@wilfriedroset
Copy link
Contributor Author

thank you for your feedbacks which I have taken into account.

…stalling the correct go version based on go.mod

Signed-off-by: Wilfried Roset <[email protected]>
@dewey dewey merged commit 9fc7266 into justwatchcom:master Apr 5, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants