Measure SLOs across whole suite, not single tests #779

oxddr · 2019-09-06T12:45:08Z

Currently we measure all SLOs per-test. We think about measuring it across whole testing suite (density + load).

Measuring SLO across performance suite will increase number of windows (as defined in SLO description). Single bad request will have a smaller chances to sink the whole tests. Currently we see test flakiness cause by a single request (e.g. kubernetes/kubernetes#82377). Also this would put us closer to intention behind the two-level SLO, which is defined per cluster-day.

Implementation wise, this would involve merging density and load test into a single test and moving some measurements to the very end of it.

/area slo

wojtek-t · 2019-09-06T13:50:44Z

Implementation wise, this would involve merging density and load test into a single test and moving some measurements to the very end of it.

Alternatively (I'm actually leaning towards saying that it would be better, but I'm happy to hear arguments against) we could:

add a concept of measurement that is run before the first test case and then after the last
[conceptually BeforeSuite and AfterSuite from ginkgo]
start and gather all SLOs there

wojtek-t · 2019-09-06T13:51:32Z

I think that we can even hardcode the set of things that we start there - it would also visibly simplify the configs...

wojtek-t · 2019-09-06T13:51:42Z

@mm4tt - FYI

oxddr · 2019-09-06T14:12:53Z

I like the new per-suite measurements you propose. This is indeed a better, long-term solution. Mine was suppose to be rather a hacky prototype to validate the concept.

However I'd rather avoid hard-coding any set of measurements, this would decrease the flexibility clusterloader2 gives us. But I'm open for debate.

mm4tt · 2019-09-06T14:16:24Z

Could it become part of the TestSuite - https://github.com/kubernetes/perf-tests/blob/master/clusterloader2/docs/test_suite_design.md ?

wojtek-t · 2019-09-06T14:17:00Z

However I'd rather avoid hard-coding any set of measurements, this would decrease the flexibility clusterloader2 gives us. But I'm open for debate.

I'm not going to push hard for this. My main goal is just to simplify the configs.
So the next iteration of that proposal is:

opaque the SLO-based measurement (api-call-latencies, network-programming, ...) into single "SLO" measurement [we can't yet add pod-startup-time, but we hopefully will be in the future]
[this is similar to what we did for TestMetrics]
Make that a default measurement for Before/After suite.

wojtek-t · 2019-09-06T14:17:28Z

Could it become part of the TestSuite - https://github.com/kubernetes/perf-tests/blob/master/clusterloader2/docs/test_suite_design.md ?

yes - it perfectly fits there

fejta-bot · 2019-12-05T14:56:18Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

wojtek-t · 2019-12-05T15:10:44Z

/remove-lifecycle stale

fejta-bot · 2020-03-04T16:08:28Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

mm4tt · 2020-03-05T08:52:46Z

Closing this one in favor of #1007
While #1007 it's not addressing exactly the same problem and reasoning for doing it are different, it will make this FR obsolete.

/close

k8s-ci-robot · 2020-03-05T08:52:48Z

@mm4tt: Closing this issue.

In response to this:

Closing this one in favor of #1007
While #1007 it's not addressing exactly the same problem and reasoning for doing it are different, it will make this FR obsolete.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

oxddr · 2020-03-05T09:31:42Z

/reopen

While we may not want to do this in the short-term, I'd like to keep this one open. The issues, you've mentioned means we avoid the problem, but not solving it. Where the problem i lack of functionality in clusterloader. We've added test for tokens recently, which is measured separately. We may add more in the future. We should be able to do measurements across the suite.

k8s-ci-robot · 2020-03-05T09:31:44Z

@oxddr: Reopened this issue.

In response to this:

/reopen

While we may not want to do this in the short-term, I'd like to keep this one open. The issues, you've mentioned means we avoid the problem, but not solving it. Where the problem i lack of functionality in clusterloader. We've added test for tokens recently, which is measured separately. We may add more in the future. We should be able to do measurements across the suite.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

fejta-bot · 2020-04-04T10:05:30Z

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

wojtek-t · 2020-04-04T12:23:43Z

/remove-lifecycle rotten
/lifecycle frozen

k8s-ci-robot added the area/slo label Sep 6, 2019

oxddr mentioned this issue Sep 6, 2019

[Failing Test] gce-master-scale-performance - slow POST nodes after kubelet restarted in the middle of the test kubernetes/kubernetes#82377

Closed

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 5, 2019

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 5, 2019

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 4, 2020

k8s-ci-robot closed this as completed Mar 5, 2020

k8s-ci-robot reopened this Mar 5, 2020

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Apr 4, 2020

k8s-ci-robot added lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. and removed lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. labels Apr 4, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Measure SLOs across whole suite, not single tests #779

Measure SLOs across whole suite, not single tests #779

oxddr commented Sep 6, 2019

wojtek-t commented Sep 6, 2019

wojtek-t commented Sep 6, 2019

wojtek-t commented Sep 6, 2019

oxddr commented Sep 6, 2019

mm4tt commented Sep 6, 2019

wojtek-t commented Sep 6, 2019

wojtek-t commented Sep 6, 2019

fejta-bot commented Dec 5, 2019

wojtek-t commented Dec 5, 2019

fejta-bot commented Mar 4, 2020

mm4tt commented Mar 5, 2020

k8s-ci-robot commented Mar 5, 2020

oxddr commented Mar 5, 2020

k8s-ci-robot commented Mar 5, 2020

fejta-bot commented Apr 4, 2020

wojtek-t commented Apr 4, 2020

Measure SLOs across whole suite, not single tests #779

Measure SLOs across whole suite, not single tests #779

Comments

oxddr commented Sep 6, 2019

wojtek-t commented Sep 6, 2019

wojtek-t commented Sep 6, 2019

wojtek-t commented Sep 6, 2019

oxddr commented Sep 6, 2019

mm4tt commented Sep 6, 2019

wojtek-t commented Sep 6, 2019

wojtek-t commented Sep 6, 2019

fejta-bot commented Dec 5, 2019

wojtek-t commented Dec 5, 2019

fejta-bot commented Mar 4, 2020

mm4tt commented Mar 5, 2020

k8s-ci-robot commented Mar 5, 2020

oxddr commented Mar 5, 2020

k8s-ci-robot commented Mar 5, 2020

fejta-bot commented Apr 4, 2020

wojtek-t commented Apr 4, 2020