-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Peformance reporting #73
Comments
Assuming the request latency has a normal distribution we can show the median and standard deviation and compare those. The difference could be expressed in relation to the standard deviation. We can experiment with including or excluding the first few requests. One way to work around the variance without dismissing the first request (or first two requests) is to show the cumulative latency over all the requests and compare that to the other branch or request path. We can do the same for database time, query count, CPU time, etc. To show all measurements we can use a scatter plot (x = request index, y = latency) and show test cases in different colors. |
Appreciate all the thought here! We should definitely start with delivering what's has been working in recent history, which is a simple relative "change factor" like "1.2x slower" or "about the same." This was the end result of showing different types of data, with different thresholds and was relatively stable, matching user expectations.
I believe the daemon actually had 20% as the threshold, 10% was not enough, so we should use that as the starting place in CI. https://github.com/rubytune/perf_check_daemon/blob/master/config/daemon.yml.example#L20-L23 And we dropped the first two requests. Let's keep doing that for now, as that was driven by usage and experience. The deamon (and PerfCheck CI) is intended to be configurable per app deployment on what "meaningful" difference is needed to trigger a confident result, as it will depend completely on what the datasets look like. On the existing target app, we are dealing with endpoints taking 2-10+ seconds, issuing 100s of queries, so the variance seen on an empty fast app is background noise and not signal. In other words, we need to be able to configure CI appropriately so we are encouraging focus on the "broad strokes" and highlighting meaningful signal and guide people towards going for the larger strokes. This doesn't mean we shouldn't improve accuracy. But it means further solutions and tweaks should be driven by a working user feedback loop on the target app — in other words, we need to wait to hear from users that they tested two equivalent branches and saw a differing number, and we need to have some data to play with before we invest into further calibration/optimization. stddev is a great idea, especially to display in an "advanced" interpretation of the results down the line. 80% of people are going to be confused by the added information if we lead with it (people don't understand basic stats), so we should be careful in how we choose to present it. If it's just numerical, we will need to include some educational clarification about what that means. Primary benefits of keeping the "headline result" simple (even at the cost of sometimes being less detailed/accurate):
In the end, we are saying the same things. This is important stuff, and we should experiment with potential improvements. |
Currently there is a difference between PerfCheck and PerfCheck CI reporting.
PerfCheck requires at least 10% change in latency to report something faster or slower. PerfCheck CI just checks if it was faster or slower.
I believe it's important to have a solid computation and identical output between the tools. We want users to be able to trust the summary in the report at all times.
This issue will not contain actionable changes but is more of a summary of observations, insights, and ideas.
Let's start with some details about measuring requests in our current situation.
Users want to be able to interpret outcomes when comparing performance between branches or paths.
When comparing branches or paths a user will want to see a number of sections.
The text was updated successfully, but these errors were encountered: