Implement report that is functionally equivalent to PerfCheck daemon #86

Manfred · 2019-11-28T07:56:56Z

As a user I want to know if the performance measured in the job is better, about the same, or worse than the reference measurement.

Manfred · 2019-11-29T13:24:28Z

I want to propose implementing the following observations if relevant:

✅/❌ Performance is Nx faster/slower/about the same as master (2ms vs 3ms)
❌ At least one request took longer than 10ms
❌ At least one request performed more than 20 queries
❌ Requests perform a lot more queries on average than master (30.2 queries vs 22.0 queries)

The limits for these observations are configurable on the application level.

Perf Check used to have a link to details, so I guess we implement that too. The details will show a table with some summary statistics: average, standard deviation, and difference between averages for what you are comparing.

Branch	Latency	Queries
master	606.5ms (σ = 13.5)	29
slower	2333.2ms (σ = 34.5)	215
	+1726.7	+186

And then the full table with all measurements.

#	Latency	Query count	Server memory	Response code
0	758.2ms	29	306.6	200
1	699.8ms	29	324.1	200

sudara · 2019-11-29T14:41:59Z

I want to propose implementing the following observations if relevant

Mimicking the old daemon is a great Step 1 here!

I'm assuming the configurable tolerance is going to start as a "factor of change" like 1.2x as we discussed and then we will move to a more sophisticated/dynamic measure.

If/when we move to a sophisticated/dynamic measure, I'm currently trying to think through our responsibility is for communicating that measure, and if/how it will impact understanding of the results.

Example: people ask "when does it go green" and instead of "when your average is within 1.2x the change" we need to supply a clear answer that will be understood and accepted. After we play with that algo and determine how well alternative detections work on the target app, we can then make decisions on how we are calculating that to not only optimize for accuracy but clarity/simplicity of communication.

✅/❌ Performance is Nx faster/slower/about the same as master (2ms vs 3ms)

Let's roll with this!

From a superficial developer point of view, we should understand that we are communicating green="permission to move on" which isn't ideal for the long term. When a branch reports as 1.15x worse, we should eventually communicate that incremental drifting = still bad, or that we lack confidence. I think that's part of why we were experimenting with "gray" as the "about the same" color. We might decide to choose a yellow state if we can ascertain that "there's definitely a change, albeit minor," etc.

At least one request...

In my mind, referencing individual requests or even acknowledging that individual requests exist as a separate topic. The "absolute threshold" feature is intennded to communicate on the performance state of the URL ala "you improved the action, but it's still fundamentally problematic" which is why it's historically calculated on the aggregate (average). I'd like to stick with the same phrasing and meaning that the daemon had here for Step 1 (this phrasing was iterated on to arrive at something that seems to be direct and clear to most):

❌ 2.5x slower than master (1481ms vs 601ms)
❌ Increased AR queries from 29 to 215!

In the future, we can add some logic to detect if there was variance (some requests had differing numbers of AR queries) and then call that out as well.

Perf Check used to have a link to details, so I guess we implement that too.

The intention was to replace the gist with the job details page on Perf Check CI, so this is perfect. I love the idea of an html summary/detail view of the individual requests, though given the fact we are displaying logs, we can also defer this as an additional feature.

sudara · 2019-11-29T14:48:34Z

Just since my brain is on the topic, I've always had a problem with the performance comparison becoming "more tolerant" as we scale up due to using a change factor. Meaning, 10 second requests will "allow" you to scale up to 11.999 seconds without perf check complaining.

A certain amount of this makes sense, because usually 10 second actions have a ridiculous amount of work happening within them, and so have higher variance depending on db/cpu/network state.

Manfred · 2019-11-29T14:51:59Z

Increased AR queries from 29 to 215!

Do you mind if I rename 'AR queries' to 'database queries'? I really don't like abbreviations and connection.execute also adds to the query count.

sudara · 2019-11-29T16:26:30Z

I don’t mind. I think we changed it to AR because we had people questioning if there were “other things” that could be adding to the database query count, so it marginally made it more clear that its active record calls as the source problem.

…

On Fri 29. Nov 2019 at 15:52, Manfred Stienstra ***@***.***> wrote: Increased AR queries from 29 to 215! Do you mind if I rename 'AR queries' to 'database queries'? I really don't like abbreviations and connection.execute also adds to the query count. — You are receiving this because you were assigned. Reply to this email directly, view it on GitHub <#86?email_source=notifications&email_token=AAAADWHPTANYBBMYYLY4KQDQWEUA7A5CNFSM4JSQVNU2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEFPBJ5Q#issuecomment-559813878>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAAADWEXQ7LSJEAZRVFUKZTQWEUA7ANCNFSM4JSQVNUQ> .

Manfred · 2019-11-29T16:55:17Z

I'm going with almost identical text with what Perf Check daemon generated and leaving out the details view for now because people can use the log to see the raw data. We'll configure production with 1.2x as the performance threshold, 4 seconds for latency, and 75 database queries.

Manfred assigned sudara Nov 29, 2019

Manfred assigned Manfred and unassigned sudara Nov 29, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement report that is functionally equivalent to PerfCheck daemon #86

Implement report that is functionally equivalent to PerfCheck daemon #86

Manfred commented Nov 28, 2019

Manfred commented Nov 29, 2019

sudara commented Nov 29, 2019 •

edited

Loading

sudara commented Nov 29, 2019 •

edited

Loading

Manfred commented Nov 29, 2019

sudara commented Nov 29, 2019 via email

Manfred commented Nov 29, 2019

Implement report that is functionally equivalent to PerfCheck daemon #86

Implement report that is functionally equivalent to PerfCheck daemon #86

Comments

Manfred commented Nov 28, 2019

Manfred commented Nov 29, 2019

sudara commented Nov 29, 2019 • edited Loading

sudara commented Nov 29, 2019 • edited Loading

Manfred commented Nov 29, 2019

sudara commented Nov 29, 2019 via email

Manfred commented Nov 29, 2019

sudara commented Nov 29, 2019 •

edited

Loading

sudara commented Nov 29, 2019 •

edited

Loading