proper benchmarking #92

hexaeder · 2021-07-03T14:21:12Z

This is a proposal on how to do proper benchmarking for future version using PkgBenchmark.jl and BenchmarkTools.jl.

I've added a readme to the benchmark folder on how to run those.

In short

$ cd NetworkDynamics/benchmarks
$ ./run_benchmarks.jl v0.5.0    # this will take a few minutes

will create 3 markdownfiles with the results of the benchmarks

the target version, i.e. the current state of the repo
the baseline version (in this case v0.5.0) for the comparison
a comparison with relative differences for each of the timings.

You can see those markdown files in this gist.

The most importand is probably this section, where you can see the nice speedup of #80

Results

A ratio greater than 1.0 denotes a possible regression (marked with ❌), while a ratio less
than 1.0 denotes a possible improvement (marked with ✅). Only significant results - results
that indicate possible regressions or improvements - are shown below (thus, an empty table means that all
benchmark results remained invariant between builds).

ID	time ratio	memory ratio
`["diffusion", "ode_edge", "assemble", "10"]`	0.97 (5%)	0.87 (1%) ✅
`["diffusion", "ode_edge", "assemble", "100"]`	0.85 (5%) ✅	0.41 (1%) ✅
`["diffusion", "ode_edge", "assemble", "1000"]`	0.32 (5%) ✅	0.07 (1%) ✅
`["diffusion", "ode_edge", "assemble", "10000"]`	0.06 (5%) ✅	0.01 (1%) ✅
`["diffusion", "ode_edge", "call_mt", "10"]`	0.93 (5%) ✅	1.01 (1%)
`["diffusion", "ode_edge", "call_mt", "100"]`	0.64 (5%) ✅	1.00 (1%)
`["diffusion", "static_edge", "assemble", "10"]`	0.95 (5%) ✅	0.86 (1%) ✅
`["diffusion", "static_edge", "assemble", "100"]`	0.84 (5%) ✅	0.40 (1%) ✅
`["diffusion", "static_edge", "assemble", "1000"]`	0.31 (5%) ✅	0.07 (1%) ✅
`["diffusion", "static_edge", "assemble", "10000"]`	0.06 (5%) ✅	0.01 (1%) ✅
`["diffusion", "static_edge", "call", "10"]`	1.51 (5%) ❌	1.00 (1%)
`["diffusion", "static_edge", "call", "1000"]`	0.94 (5%) ✅	1.00 (1%)
`["diffusion", "static_edge", "call_mt", "10"]`	0.82 (5%) ✅	1.00 (1%)
`["diffusion", "static_edge", "call_mt", "100"]`	0.44 (5%) ✅	0.99 (1%)
`["diffusion", "static_edge", "call_mt", "1000"]`	0.53 (5%) ✅	1.00 (1%)
`["diffusion", "static_edge", "call_mt", "10000"]`	0.88 (5%) ✅	0.99 (1%)
`["kuramoto", "heterogeneous", "assemble", "10"]`	0.96 (5%)	0.86 (1%) ✅
`["kuramoto", "heterogeneous", "assemble", "100"]`	0.84 (5%) ✅	0.40 (1%) ✅
`["kuramoto", "heterogeneous", "assemble", "1000"]`	0.32 (5%) ✅	0.07 (1%) ✅
`["kuramoto", "heterogeneous", "assemble", "10000"]`	0.03 (5%) ✅	0.01 (1%) ✅
`["kuramoto", "heterogeneous", "call", "10"]`	0.88 (5%) ✅	1.00 (1%)
`["kuramoto", "heterogeneous", "call", "100"]`	0.85 (5%) ✅	1.00 (1%)
`["kuramoto", "heterogeneous", "call", "1000"]`	0.85 (5%) ✅	1.00 (1%)
`["kuramoto", "heterogeneous", "call", "10000"]`	0.83 (5%) ✅	1.00 (1%)
`["kuramoto", "heterogeneous", "call_mt", "10"]`	1.09 (5%) ❌	1.00 (1%)
`["kuramoto", "heterogeneous", "call_mt", "100"]`	0.88 (5%) ✅	1.00 (1%)
`["kuramoto", "heterogeneous", "call_mt", "1000"]`	0.85 (5%) ✅	1.00 (1%)
`["kuramoto", "heterogeneous", "call_mt", "10000"]`	0.87 (5%) ✅	1.00 (1%)
`["kuramoto", "homogeneous", "assemble", "10"]`	0.97 (5%)	0.87 (1%) ✅
`["kuramoto", "homogeneous", "assemble", "100"]`	0.85 (5%) ✅	0.43 (1%) ✅
`["kuramoto", "homogeneous", "assemble", "1000"]`	0.34 (5%) ✅	0.08 (1%) ✅
`["kuramoto", "homogeneous", "assemble", "10000"]`	0.03 (5%) ✅	0.01 (1%) ✅
`["kuramoto", "homogeneous", "call", "10"]`	0.75 (5%) ✅	1.00 (1%)
`["kuramoto", "homogeneous", "call", "1000"]`	0.90 (5%) ✅	1.00 (1%)
`["kuramoto", "homogeneous", "call_mt", "100"]`	1.22 (5%) ❌	1.00 (1%)
`["kuramoto", "homogeneous", "call_mt", "1000"]`	0.66 (5%) ✅	1.00 (1%)
`["kuramoto", "homogeneous", "call_mt", "10000"]`	0.97 (5%)	0.99 (1%) ✅

lindnemi

This is absolutely great and could be merged right away. I was wondering about a few more things though

It would be nice to benchmark an inhomogeneous system with algebraic constraints in mass matrix form. Thus we would benchmark stiff solvers as well that should improve with the upcoming Jacobian vector product.
Would it be possible to add these benchmarks to CI? Or should we run them on our machines manually each time we release a new Version?
Just for clarification: If I get this right we could also benchmark against different Julia version in order catch potential regressions in Base?

lindnemi · 2021-07-05T07:17:16Z

Project.toml

@@ -4,6 +4,7 @@ authors = ["Frank Hellmann <[email protected]>, Michael Lindner <michaelli
 version = "0.5.4"

 [deps]
+BenchmarkTools = "6e4b80f9-dd63-53aa-95a3-0cdb28fa8baf"


Would it be possible to add this dependency to a local environment in the benchmarking folder instead?

This is not that trivial unfortunately. Normally, pkgbenchmark will take the benchmark files from the specific commit you are benchmarking. I think this is not what we want here. During the process of benchmarking a different commit the script will

copy the NetworkDynamics repo to a temp location

copy the benchmark folder to a different temp location

change the temp NetworkDynamics folder to the commit to we want to

at this point, it is not clear how to reach the temp ND folder from the environment which lives inside the benchmark folder, therefore it is much simpler to run everything from the ND environment

on a second though though, this works with [email protected] where ther was no BenchmarkTools in the deps. We might get away without the explicit dependency. As i do right now with PkgBenchmarks (which you have to have in your global environment so it doesn't show up in the ND deps)

hexaeder · 2021-07-05T08:25:34Z

* It would be nice to benchmark an inhomogeneous system with algebraic constraints in mass matrix form. Thus we would benchmark stiff solvers as well that should improve with the upcoming Jacobian vector product.

well, thats the idea. Having a common place where to add as much benchmarks as we like... right now they are also a bit redudant, i'm not sure whether we should test everything for so many values of N

* Would it be possible to add these benchmarks to CI? Or should we run them on our machines manually each time we release a new Version?

In theory yeah, totally possible. We can test this out at some point. For this, i think running it lokally will be sufficient. The default config will allways compare to master. I thinke once your are happy with your PR, run ./run_benchmark on your PR branch and but the judgmentfile in the comments

* Just for clarification: If I get this right we could also benchmark against different Julia version in order catch potential regressions in Base?

Yes, right now i am adding more parameters to the benchmark script so you can do something like

$ ./run_benchmarks.jl --target directory --baseline directory --tcommand julia-nightly --bcommand julia

to benchmark the current directory on julia-nightly vs julia.

lindnemi · 2021-07-27T09:49:06Z

Unfortunately I can't run the benchmarks on my machine. I get:
ERROR: LoadError: AssertionError: Captured sysimg looks weird? using Revise...
-J/home/micha/julia/julia-1.6.2/lib/julia/sys.so

lindnemi · 2021-08-04T11:19:02Z

ok great. the benchmarks make a lot of sense like that. diffusion benchmarks a star graph and kuramoto a sparse graph.

at some point we should benchmark different solver calls, especially when it comes to evaluating performance of JVPs. we could add that with the JVP PR #95.

hexaeder added 9 commits July 3, 2021 10:54

use conventional dir name

95d4923

cleanup

8c9b34b

fixes

22b738c

add BenchmarkTools to ND

3338469

ignore benchmark resultfiles

93af587

add cli parameter for verions

fcd1b1c

save raw results

f1f8ea6

add readme for benchmarks

3c2eb3e

run benchmark in temp folder (copy repo)

fea8ee0

lindnemi reviewed Jul 5, 2021

View reviewed changes

more inbounds

b0e31e5

hexaeder added 4 commits July 5, 2021 12:17

add more cli parameters

1d8f544

remove BenchmarkTools from deps

0b3f1f3

add verbose and retune parameter

e062450

add exportraw flag

ea53995

hexaeder force-pushed the hw/benchmarking branch from 29a62b2 to ea53995 Compare July 5, 2021 14:20

hexaeder and others added 2 commits July 5, 2021 18:56

use sub environment for benchmarks

abf16cf

added some OS independence

a7553f4

lindnemi and others added 3 commits July 28, 2021 09:41

avoid startup file complications

4133866

add LightGraphs dep

2eb157c

comments on type of graph

68d8ec4

lindnemi merged commit 643a662 into main Aug 4, 2021

lindnemi deleted the hw/benchmarking branch August 4, 2021 11:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

proper benchmarking #92

proper benchmarking #92

hexaeder commented Jul 3, 2021 •

edited

Loading

lindnemi left a comment

lindnemi Jul 5, 2021

hexaeder Jul 5, 2021

hexaeder Jul 5, 2021

hexaeder commented Jul 5, 2021

lindnemi commented Jul 27, 2021

lindnemi commented Aug 4, 2021

proper benchmarking #92

proper benchmarking #92

Conversation

hexaeder commented Jul 3, 2021 • edited Loading

Results

lindnemi left a comment

Choose a reason for hiding this comment

lindnemi Jul 5, 2021

Choose a reason for hiding this comment

hexaeder Jul 5, 2021

Choose a reason for hiding this comment

hexaeder Jul 5, 2021

Choose a reason for hiding this comment

hexaeder commented Jul 5, 2021

lindnemi commented Jul 27, 2021

lindnemi commented Aug 4, 2021

hexaeder commented Jul 3, 2021 •

edited

Loading