Integrate DIAMetrics principles into a benchmarking suite for brackit #39

AlvinKuruvilla · 2022-05-24T14:21:53Z

Background on DIAMetrics

DIAMetrics is an end-to-end benchmarking and performance framework for query engines developed by Google.

Componenets

Note that there are more details than mentioned here; this is only as an overview, and if we need to add details about more parts, we can do that further down the line

Workload Extractor:

According to the paper, this component extracts a "representative workload" from a live production workload. "DIAMetrics employs a workload extractor and summarizer, which is a feature-based way to ‘mine’ the query logs of a customer and extract a subset of queries that adequately represent the workload of the customer."
For our current purposes, I feel like the best way we can utilize a component like this is to pinpoint a set of heavy workloads that we can keep a list of and then just run those workloads for the time being. To this end, I am working on a PR that will hopefully bring more XQuery files for us to run against from this repository. I will update this issue with a PR number so that we can keep track of everything.

Data and Query Scrambler

This component aims to help protect sensitive data and create variations of the representative sets to prevent sensitive data leakage. The paper lists off a few ways that they achieve this, but for the time being, we can put less emphasis on this part since we will use this internally for the moment.

Workload Runner

According to the paper, this component "allows users to specify various combinations of workloads and systems to be benchmarked. For instance, we may want to run TPC-H on various query engines over various storage formats to see which storage format is the best option for which engine." The runner can either schedule runs of specific engines or spin up and manage (including cleanup and shutdown) entire engine instances for the runs

Monitoring

There are two parts to this:

Visualization Framework - which brings up dashboards
Alerting Framework - which compares workload performance to historical data and alerts when there iareconcerns

TODO (more to come as we get further along)

Merge in more XQuery files from xquerl
Figure out workloads that do not perform well and add them to brackit
Extract representative workloads somehow

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Integrate DIAMetrics principles into a benchmarking suite for brackit #39

Integrate DIAMetrics principles into a benchmarking suite for brackit #39

AlvinKuruvilla commented May 24, 2022

Integrate DIAMetrics principles into a benchmarking suite for brackit #39

Integrate DIAMetrics principles into a benchmarking suite for brackit #39

Comments

AlvinKuruvilla commented May 24, 2022

Background on DIAMetrics

Componenets

Workload Extractor:

Data and Query Scrambler

Workload Runner

Monitoring

TODO (more to come as we get further along)