Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement report that is functionally equivalent to PerfCheck daemon #86

Open
Manfred opened this issue Nov 28, 2019 · 6 comments
Open
Assignees

Comments

@Manfred
Copy link
Collaborator

Manfred commented Nov 28, 2019

  • As a user I want to know if the performance measured in the job is better, about the same, or worse than the reference measurement.
@Manfred
Copy link
Collaborator Author

Manfred commented Nov 29, 2019

I want to propose implementing the following observations if relevant:

  • ✅/❌ Performance is Nx faster/slower/about the same as master (2ms vs 3ms)
  • ❌ At least one request took longer than 10ms
  • ❌ At least one request performed more than 20 queries
  • ❌ Requests perform a lot more queries on average than master (30.2 queries vs 22.0 queries)

The limits for these observations are configurable on the application level.

Perf Check used to have a link to details, so I guess we implement that too. The details will show a table with some summary statistics: average, standard deviation, and difference between averages for what you are comparing.

Branch Latency Queries
master 606.5ms (σ = 13.5) 29
slower 2333.2ms (σ = 34.5) 215
+1726.7 +186

And then the full table with all measurements.

# Latency Query count Server memory Response code
0 758.2ms 29 306.6 200
1 699.8ms 29 324.1 200

@sudara
Copy link
Member

sudara commented Nov 29, 2019

I want to propose implementing the following observations if relevant

Mimicking the old daemon is a great Step 1 here!

I'm assuming the configurable tolerance is going to start as a "factor of change" like 1.2x as we discussed and then we will move to a more sophisticated/dynamic measure.

If/when we move to a sophisticated/dynamic measure, I'm currently trying to think through our responsibility is for communicating that measure, and if/how it will impact understanding of the results.

Example: people ask "when does it go green" and instead of "when your average is within 1.2x the change" we need to supply a clear answer that will be understood and accepted. After we play with that algo and determine how well alternative detections work on the target app, we can then make decisions on how we are calculating that to not only optimize for accuracy but clarity/simplicity of communication.

✅/❌ Performance is Nx faster/slower/about the same as master (2ms vs 3ms)

Let's roll with this!

From a superficial developer point of view, we should understand that we are communicating green="permission to move on" which isn't ideal for the long term. When a branch reports as 1.15x worse, we should eventually communicate that incremental drifting = still bad, or that we lack confidence. I think that's part of why we were experimenting with "gray" as the "about the same" color. We might decide to choose a yellow state if we can ascertain that "there's definitely a change, albeit minor," etc.

At least one request...

In my mind, referencing individual requests or even acknowledging that individual requests exist as a separate topic. The "absolute threshold" feature is intennded to communicate on the performance state of the URL ala "you improved the action, but it's still fundamentally problematic" which is why it's historically calculated on the aggregate (average). I'd like to stick with the same phrasing and meaning that the daemon had here for Step 1 (this phrasing was iterated on to arrive at something that seems to be direct and clear to most):

❌ 2.5x slower than master (1481ms vs 601ms)
❌ Increased AR queries from 29 to 215!

In the future, we can add some logic to detect if there was variance (some requests had differing numbers of AR queries) and then call that out as well.

Perf Check used to have a link to details, so I guess we implement that too.

The intention was to replace the gist with the job details page on Perf Check CI, so this is perfect. I love the idea of an html summary/detail view of the individual requests, though given the fact we are displaying logs, we can also defer this as an additional feature.

@sudara
Copy link
Member

sudara commented Nov 29, 2019

Just since my brain is on the topic, I've always had a problem with the performance comparison becoming "more tolerant" as we scale up due to using a change factor. Meaning, 10 second requests will "allow" you to scale up to 11.999 seconds without perf check complaining.

A certain amount of this makes sense, because usually 10 second actions have a ridiculous amount of work happening within them, and so have higher variance depending on db/cpu/network state.

@Manfred
Copy link
Collaborator Author

Manfred commented Nov 29, 2019

Increased AR queries from 29 to 215!

Do you mind if I rename 'AR queries' to 'database queries'? I really don't like abbreviations and connection.execute also adds to the query count.

@sudara
Copy link
Member

sudara commented Nov 29, 2019 via email

@Manfred Manfred assigned Manfred and unassigned sudara Nov 29, 2019
@Manfred
Copy link
Collaborator Author

Manfred commented Nov 29, 2019

I'm going with almost identical text with what Perf Check daemon generated and leaving out the details view for now because people can use the log to see the raw data. We'll configure production with 1.2x as the performance threshold, 4 seconds for latency, and 75 database queries.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants