You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As we approach our v1 release, it is important that we develop and maintain methodologies for testing performance of various parts of the system and occasionally checking for regressions. This epic holds the high level tasks for managing that work.
Event delivery
We currently have several ways of delivering xAPI events to ClickHouse. For each of these we should document a methodology and reference configuration for testing throughput of events to ClickHouse before issues emerge (queues filling up, task sizes growing, delivery time lagging, etc).
We should be able to emulate traffic using tracking log replay of very large files with a batch size of 1 and adjust the sleep setting to the maximum the backend can handle. If the backend can take a 0 sleep loop, we should run additional processes until it breaks.
We should be careful to constrain the configurations to be roughly equivalent in resources / cost, to emulate a production environment for a mid-sized system, and to use the same version of Aspects for all tests.
Test system configuration:
- Tutor version
- Aspects version
- Environment specifications (local / k8s, CPU / Memory / Disk resources allocated)
Load generation specifications:
- Tool
- Exact script
- Any custom settings for things like sleep time and # of processes
Data captured for results:
- Length of run
- Sleep time / batch size
- We should have values for these every 10 seconds:
- Latency of events in ClickHouse (now - most recent event)
- Queue size (if applicable) ex: pending tasks in celery, pending stream size in redis, etc
- Total events in CH
- Query times for 2-3 ClickHouse reporting queries (as taken from Superset)
Query performance
On a load test dataset, check every reporting query we have (as captured from the "show SQL" in Superset), with and without any applicable filters to see how they perform. We should run the queries 5x each and capture the response times and number of rows returned. It should also be possible to capture the queries by browsing each chart, using different filters, then pulling the SQL from the ClickHouse logs.
We should be careful to capture the xapi-db-load configuration for generating the data so we can regenerate as necessary.
Test ClickHouse configuration
- local / k8s/CH Cloud, Altinity...
- hardware or config specs
- Total rows in ClickHouse
For each query
- Query short name (enrollments no filter, enrollments enrollment type filter)
- Raw query
- Duration
- Rows returned
The text was updated successfully, but these errors were encountered:
What we've found in #202 and earlier tests is that insert performance exceeds our ability to generate events up to that ~55 / sec line. Once we have more production information from partners we can determine if we should test to a higher threshold, but for now I'm closing these tasks out.
As we approach our v1 release, it is important that we develop and maintain methodologies for testing performance of various parts of the system and occasionally checking for regressions. This epic holds the high level tasks for managing that work.
Event delivery
We currently have several ways of delivering xAPI events to ClickHouse. For each of these we should document a methodology and reference configuration for testing throughput of events to ClickHouse before issues emerge (queues filling up, task sizes growing, delivery time lagging, etc).
We should be able to emulate traffic using tracking log replay of very large files with a batch size of 1 and adjust the sleep setting to the maximum the backend can handle. If the backend can take a 0 sleep loop, we should run additional processes until it breaks.
We should be careful to constrain the configurations to be roughly equivalent in resources / cost, to emulate a production environment for a mid-sized system, and to use the same version of Aspects for all tests.
Template for reporting results:
Query performance
On a load test dataset, check every reporting query we have (as captured from the "show SQL" in Superset), with and without any applicable filters to see how they perform. We should run the queries 5x each and capture the response times and number of rows returned. It should also be possible to capture the queries by browsing each chart, using different filters, then pulling the SQL from the ClickHouse logs.
We should be careful to capture the xapi-db-load configuration for generating the data so we can regenerate as necessary.
Template for reporting results:
The text was updated successfully, but these errors were encountered: