Add micro-benchmarks #19

julienrf · 2017-01-18T10:01:12Z

This PR adds a few micro-benchmarks that may help us to get some numbers when tuning our implementations.

I was not able to produce nice charts, the output is text-only (/cc @axel22).

szeiger · 2017-01-18T16:36:15Z

You should talk to @lrytz and @retronym who are currently working on the benchmarking infrastructure for scalac.

szeiger · 2017-01-18T16:52:10Z

build.sbt

@@ -16,5 +16,6 @@ parallelExecution in Test := false

 libraryDependencies ++= Seq(
  "org.scala-lang.modules" %% "scala-java8-compat" % "0.8.0",
-  "com.novocode" % "junit-interface" % "0.11" % "test"
+  "com.novocode" % "junit-interface" % "0.11" % "test",
+  "com.storm-enroute" %% "scalameter" % "0.8.2"


Have you considered JMH (via sbt-jmh)? That's what we used for other Scala library benchmarking so far.

Yes, I actually started with JMH but didn’t find a way to make it correctly run the tests. The problem is that JMH has to guess how to build your test suite, and the only way you can tell it how to do so is via annotations with very poor composition capabilities. As a result, I was unable to define a configuration generating test data based on the test @Params (maybe that’s possible, but I couldn’t figure out how to do so…).
With scalameter, in contrast, the control is not inverted: you can build your test suite as you want and then ask scalameter to benchmark it.

Are there known issues in using scalameter?

BTW, there is another team which is responsible of writing a more polished test suite. In the meanwhile, I think this one can give us a correct estimation of the differences between the various collection implementations.

There are no known issues in using scalameter that I am aware of, and I use it directly in my projects. I am not actively maintaining it, since I am satisfied with the current feature set, which is roughly measuring running time, measuring running time with outlier elimination, measuring running time with GC elimination, measuring memory footprint, measuring method invocation counts, measuring boxing counts, and measuring GC counts.

There is an HTML reporter that outputs a webpage with curves, and there is also a MongoDB reporter that flushes the results into a MongoDB instance.

I personally switched from using the Bench frontend to JBench frontend which is based on annotations, but both work fine.

I'm not aware of any issues. I don't know why we use sbt-jmh instead. Maybe for historical reasons.

Based on my experience scalameter is really good in case you want to store the history of runs and maybe run regression testing. It's also good as long as value you're measuring isn't less than a millisecond.
Also, I've found JVM to be a better tool in answering the why? question, as it can produce both the heatmap and the actual generated assembly.
I think that for the kind of benchmarking that is done here now both would do fine.

I remember when I started using JMH I found some workloads on which JMH gave useful results and Scalameter didn't. I don't remember any longer what this was, though, just that I specifically went looking for it based on the JMH documentation, and found it. Thyme also got it wrong but it usually got head-to-head comparisons qualitatively right due to explicitly testing them head-to-head. (But Thyme is vulnerable to peculiarities of JVM history, since it runs entirely within the VM you've already got.)

Anyway, for now, I think either Scalameter or JMH is fine. Scalameter is prettier.

szeiger · 2017-01-18T16:55:07Z

src/main/scala/bench/Benchmark.scala

@@ -0,0 +1,239 @@
+package bench


These benchmarks shouldn't be in the main project. In fact, the more sbt projects I create, the more I am convinced that a single-project build in . (like I created it here) is a bad idea. Eventually you'll have to refactor it, which means moving all existing sources (or using an ugly, non-standard sbt setup or complicating things with configurations instead of projects).

These benchmarks shouldn't be in the main project

Can you elaborate why?

If you want to use configurations, here is an example of how you can create a separate SBT configuration for benchmarks, which means that benchmarks:

https://github.com/scalameter/scalameter-examples/blob/master/basic-with-separate-config/build.sbt

If you do this, your dependency should have an extra % "bench" at the end.

Otherwise, you could use a separate project.

I suppose it's not really important here because we don't expect to publish any artifacts. In general you don't want to have test or benchmarking code be part of the main project / artifact to avoid cluttering the source file structure and introducing accidental dependencies.

Ichoran · 2017-01-18T18:45:04Z

src/main/scala/bench/Benchmark.scala

+
+  benchmark("scala.List", scalaLists) { xs =>
+    var x: Object = null
+    xs.foreach(x = _)


This can potentially be optimized away by a sufficiently aggressive JIT. You need something that leaves some sort of impact that depends on every element. For instance,

var n = 0 xs.foreach{ x => if (x eq null) n += 1 } n

Agreed.

If you really want to be sure that this want be completely eliminated, you would need to assign the value n to a volatile variable at the end of the benchmark. The problem is that aggressive JIT optimizations could also compile the benchmark function (or its caller) and aggressively inline everything including the foreach call here.

Doesn't ScalaMeter do a BlackHole-like thing to consume the results of a benchmark? It doesn't matter if the JIT compiler inlines the foreach because it will do that in production code, too, if it can. But it does matter if it can throw away the answer and thus throw away all the work.

It does so for memory footprint measurement, but it so far did not do this for time measurements. It does so for running time too from now on:

scalameter/scalameter@f82d561

About the foreach comment - I meant, inline both the caller and foreach in this test, which would be a prerequisite for completely throwing the result away.

Ichoran · 2017-01-18T18:51:49Z

src/main/scala/bench/Benchmark.scala

+    }
+
+  benchmark("scala.List", scalaLists)(_.tail)
+  benchmark("List", lists)(_.tail)


Does a single tail take enough time to be measurable? I rarely get good results on single ns-length operations even with JMH which arguably tries the hardest to make this work. You have to be careful with ArrayBuffer which will be O(n), but I think this may need a little more thought.

+1 - it looks like tail should be a super-fast operation here, and you will not get any good numbers.

Would be better to use the tail operation in a while loop.

Indeed, I think I first did it in a loop but that was way too long with Arrays. I had the same issue with apply.

Ichoran · 2017-01-18T18:57:44Z

src/main/scala/strawman/collection/mutable/ArrayBuffer.scala

@@ -39,7 +39,7 @@ class ArrayBuffer[A] private (initElems: Array[AnyRef], initLength: Int)
        start = 0
      }
      else {
-        val newelems = new Array[AnyRef](end * 2)
+        val newelems = new Array[AnyRef](if (end == 0) 16 else (end * 2))


I think this algorithm could be improved in three aspects. First, allocating size 1 then 2 then 4 then 8 then 16 is pretty wasteful. I'd go for at least 4, maybe 8, on the first realloc. Second, no point doing an Array.copy if end==0. Third, you max out at (1 << 30) elements this way when you could have ((1 << 31) - 1) if you handled the last buffer increase separately. I know you didn't write the entire algorithm, but if we're improving this, let's do it all the way.

I just created #20 to keep track of this.

Another problem of this implementation is integer overflow. We should always look at the current collections library for implementation details where these corner cases have already been discovered and fixed.

The integer overflow isn't a terrible problem because you get an exception when you try to allocate -(1 << 31) bytes. It's not the ideal exception to have, but you were going to get an exception anyway, and practically nobody checks whether something is a NegativeArraySizeException or whatever we'd throw on a too-full array. But in general I fully agree that we need to keep careful watch on out of bounds errors and other such corner cases.

The overflow cuts the usable size for arrays in half. Once you exceed MaxInt/2 size any attempt to grow the buffer fails even though you should be able to grow up to MaxInt size.

@szeiger - Yes, I already pointed out the "max out at (1 << 30)" problem. I thought you meant that there was an overflow problem beyond the lack of capacity that I mentioned the first time around.

Ichoran · 2017-01-18T19:01:33Z

Aside from using ScalaMeter instead of JMH, this looks reasonable to me. I haven't used ScalaMeter very much, so I can't vouch for its accuracy except to say that when I did head to head testing of ScalaMeter against Thyme (which I wrote), I didn't find any cases where Thyme seemed to get things right but ScalaMeter got things wrong.

Picking up the benchmarking framework from the main library might be better, though. I'm not sure. I don't think it's terribly important at this point as long as the tests are telling us something reasonably meaningful and we're not duplicating huge amounts of work.

SethTisue · 2017-01-18T19:15:00Z

Picking up the benchmarking framework from the main library might be better, though

not sure if everybody even knows about the existence of the (JMH-based) stuff in https://github.com/scala/scala/blob/2.12.x/test/benchmarks/README.md

that's what we normally use for benchmarking stdlib stuff, so if using the same stuff is workable here, we wouldn't have to wonder whether we're comparing apples and oranges when comparing results from two different tools

axel22 · 2017-01-18T19:11:27Z

src/main/scala/bench/Benchmark.scala

+    }
+
+  benchmark("scala.List", scalaLists)(_.tail)
+  benchmark("List", lists)(_.tail)


+1 - it looks like tail should be a super-fast operation here, and you will not get any good numbers.

Would be better to use the tail operation in a while loop.

axel22 · 2017-01-18T19:11:40Z

src/main/scala/bench/Benchmark.scala

+
+  benchmark("scala.List", scalaLists)(_.tail)
+  benchmark("List", lists)(_.tail)
+  benchmark("LazyList", lazyLists)(_.tail)


axel22 · 2017-01-18T19:18:22Z

src/main/scala/bench/Benchmark.scala

+
+  benchmark("scala.List", scalaLists) { xs =>
+    var x: Object = null
+    xs.foreach(x = _)


Agreed.

If you really want to be sure that this want be completely eliminated, you would need to assign the value n to a volatile variable at the end of the benchmark. The problem is that aggressive JIT optimizations could also compile the benchmark function (or its caller) and aggressively inline everything including the foreach call here.

axel22 · 2017-01-18T19:24:14Z

src/main/scala/bench/Benchmark.scala

+
+}
+
+trait Cons extends Bench.ForkedTime with Generators {


Note - the ForkedTime predefined template ignores GC:

https://github.com/scalameter/scalameter/blob/dd7f057e70c4653ba4a38afe9699c5c4c1f097a3/src/main/scala/org/scalameter/Bench.scala#L108

This is more stable (hence useful if you care about regression testing), but depending on what you want to measure, it might make more sense to include GC pauses to the measurement. You can do this by overriding the measurer method of the test to return a Measurer.Default instance.

I would say that, for micro-benchmarks like those I think it is better to ignore GCs (though for a more complex benchmark like a complete sorting algorithm, maybe it would be better to not ignore GCs). What do you think?

julienrf · 2017-01-19T11:01:33Z

@Ichoran and @axel22, I tuned the benchmarks according to your comments: I perform tail and apply within a while loop to make it longer. I didn’t follow the @volatile trick, though.

axel22 · 2017-01-19T12:02:38Z

Sgtm. As I said, it mostly depends on whether you want to get a stable number, or an accurate average time. If I wanted to track GC pressure, I would normally create a separate test for GC counts.

…

On January 19, 2017 12:03:13 PM GMT+01:00, Julien Richard-Foy ***@***.***> wrote: julienrf commented on this pull request. > + def benchmark[A](name: String, generator: Gen[A]): Unit = + performance of name in { + measure method "memory-footprint" in { + using(generator).curve("memory-footprint") in identity + } + } + + benchmark("scala.List", scalaLists) + benchmark("List", lists) + benchmark("LazyList", lazyLists) + benchmark("ArrayBuffer", arrayBuffers) + benchmark("ListBuffer", listBuffers) + +} + +trait Cons extends Bench.ForkedTime with Generators { I would say that, for micro-benchmarks like those I think it is better to ignore GCs (though for a more complex benchmark like a complete sorting algorithm, maybe it would be better to not ignore GCs). What do you think? -- You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub: #19

-- Sent from my Android device with K-9 Mail. Please excuse my brevity.

szeiger · 2017-01-20T11:20:46Z

We also discussed this internally in the Scala team and the consensus is that JMH is the preferred tool. The current benchmarking infrastructure is for the compiler only but could eventually be extended to cover the library, and it is based on JMH. The biggest argument against ScalaMeter though is binary compatibility. If there are any impediments that prevent us from getting useful numbers with JMH now, we can start with ScalaMeter, but all benchmarks that we want to keep around and move into the main Scala project together with the library will have to be ported to JMH eventually.

julienrf · 2017-01-20T12:37:00Z

@szeiger Alright, I will migrate to JMH.

SethTisue · 2017-01-20T16:41:12Z

The current benchmarking infrastructure is for the compiler only

what about the JMH-based stuff in https://github.com/scala/scala/tree/2.12.x/test/benchmarks ?

odersky · 2017-01-22T04:48:21Z

Note that #14 has a better growing strategy for the backing array. It was lifted from the current `ResizableArray`.

…

On Sat, Jan 21, 2017 at 11:51 AM, Ichoran ***@***.***> wrote: ***@***.**** commented on this pull request. ------------------------------ In src/main/scala/strawman/collection/mutable/ArrayBuffer.scala <#19>: > @@ -39,7 +39,7 @@ class ArrayBuffer[A] private (initElems: Array[AnyRef], initLength: Int) start = 0 } else { - val newelems = new Array[AnyRef](end * 2) + val newelems = new Array[AnyRef](if (end == 0) 16 else (end * 2)) @szeiger <https://github.com/szeiger> - Yes, I already pointed out the "max out at (1 << 30)" problem. I thought you meant that there was an overflow problem *beyond* the lack of capacity that I mentioned the first time around. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#19>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAwlVkpel_T3q-6f8R_x2FfJineKiFuSks5rUVangaJpZM4Lmrpx> .

-- Martin Odersky EPFL and Lightbend

julienrf · 2017-01-23T14:45:26Z

I just migrated to JMH.

I found no equivalent of the “memory footprint” benchmark of ScalaMeter, so I added a small sub-project with some ad-hoc code doing that. This is not something I’m used to do, so please let me know if you think we should use ScalaMeter (or anything else) for that.

retronym · 2017-01-24T05:13:47Z

I'd suggest using http://openjdk.java.net/projects/code-tools/jol/. Example code in http://igstan.ro/posts/2014-09-23-calculating-an-object-graphs-size-on-the-jvm.html

julienrf · 2017-01-24T10:42:47Z

Thanks @retronym. I just gave it a try and got the following exception:

java.lang.IllegalAccessError: tried to access class org.openjdk.jol.vm.InstrumentationSupport$Installer from class org.openjdk.jol.vm.InstrumentationSupport
        at org.openjdk.jol.vm.InstrumentationSupport.saveAgentJar(InstrumentationSupport.java:109)
        at org.openjdk.jol.vm.InstrumentationSupport.tryDynamicAttach(InstrumentationSupport.java:93)
        at org.openjdk.jol.vm.InstrumentationSupport.instance(InstrumentationSupport.java:63)
        at org.openjdk.jol.vm.VM.current(VM.java:73)
        at org.openjdk.jol.info.GraphPathRecord.<init>(GraphPathRecord.java:60)
        at org.openjdk.jol.info.GraphPathRecord.<init>(GraphPathRecord.java:45)
        at org.openjdk.jol.info.GraphWalker.walk(GraphWalker.java:67)
        at org.openjdk.jol.info.GraphLayout.parseInstance(GraphLayout.java:56)
        at strawman.collection.MemoryFootprint$.$anonfun$benchmark$1(MemoryFootprint.scala:23)
        at scala.runtime.java8.JFunction1$mcJI$sp.apply(JFunction1$mcJI$sp.java:12)
        at scala.collection.immutable.List.map(List.scala:272)
        at strawman.collection.MemoryFootprint$.benchmark(MemoryFootprint.scala:22)

I also get the following error from time to time:

java.lang.IncompatibleClassChangeError: strawman.collection.MemoryFootprint and strawman.collection.MemoryFootprint$delayedInit$body disagree on InnerClasses attribute

This might be related to the nature of the bytecode generated by the Scala compiler.

Or maybe there is some installation step that is missing… I pushed the code in the benchmark-jol branch, in case you want to have a look.

I would be really happy to use the tools that you recommend but I must confess that I am super tired of working with Java-based tools that break your programming model.

retronym · 2017-01-24T12:45:17Z

Yeah, JOL and JMH are somewhat magic, but part of that is necessary complexity from the jobs they are doing. But there are real downsides to using Scala frameworks for testing the Scala distribution itself, for instance I've spent close to a week working through with Scalacheck/SBT in our build.

Hopefully #22 helps some.

julienrf · 2017-01-25T13:06:22Z

It is now possible to produce the following kind of charts for each benchmark (with the timeBenchmark/charts sbt task):

ktoso · 2017-01-25T13:08:20Z

These are very cool @julienrf! Would you consider contributing the charts code back to sbt-jmh?
I'd love the scala benchmarking ecosystem as a whole get easier access to such things, it would improve the way people post results online a lot.

refs sbt/sbt-jmh#103

DarkDimius · 2017-01-25T13:08:32Z

@julienrf, would you consider making the y time-scale be linear instead of logarithmic? Log-scale makes it hard to see the factors.

julienrf · 2017-01-25T13:23:54Z

@ktoso The charts are specific to our benchmarks because they expect them to be parameterized by a size. Also, note that they add dependencies to JFreeChart and play-json to the build.

ktoso · 2017-01-25T13:25:15Z

The dependency is not a problem for the plugin, it could be an extra plugin. If you'd be willing to help out let's chat in the other ticket, would be very cool IMO.

julienrf · 2017-01-25T14:30:34Z

@DarkDimius Here is what we would get, for instance:

Linear scale:

Log scale:

IMHO the problem with the linear scale is that the high values on the right make it difficult to read the small values on the left.

I think log scales are good to visualize the behavior of one collection over the number of its elements.

If we want to compare the factors between different collection implementations, I would normalize the values so that for size n we compare values between 0 and 1, for size n + 1 we also compare values between 0 and 1, etc. I will try to implement that, to see if it is worth it.

Ichoran · 2017-01-26T01:24:30Z

I strongly support log scales for comprehensibility.

If you are just doing head-to-head comparisons, you can set one collection as the baseline and plot everything else as log ratio against that collection. This has the effect of spreading the Y-axis out enough so everything is visible. Takes a few seconds longer to grok, but often worth it if you otherwise can't see the differences.

julienrf · 2017-01-26T09:33:37Z

@DarkDimius We could have something like the following to figure out factors between implementations:

For instance, we see that to build a collection of 8 elements, ArrayBuffer is ~30% faster than ListBuffer. With 512 elements, on the other hand, ListBuffer is ~20% faster than ArrayBuffer.

What do you think?

szeiger · 2017-01-26T14:19:31Z

Looks good! We should get this merged so we can use the infrastructure for benchmarking other changes.

odersky · 2017-01-28T00:17:46Z

Looks good to me. @julienrf please merge at your convenience.

julienrf requested review from szeiger, odersky and Ichoran January 18, 2017 10:01

julienrf mentioned this pull request Jan 18, 2017

Immutable array #6

Closed

szeiger reviewed Jan 18, 2017

View reviewed changes

Ichoran reviewed Jan 18, 2017

View reviewed changes

axel22 approved these changes Jan 18, 2017

View reviewed changes

axel22 reviewed Jan 18, 2017

View reviewed changes

julienrf mentioned this pull request Jan 19, 2017

Improve the way we grow buffers #20

Closed

DarkDimius mentioned this pull request Jan 20, 2017

Comparison with JMH? scalameter/scalameter#173

Closed

Add micro-benchmarks

2b5fb0b

julienrf force-pushed the benchmark branch from a56019d to 2b5fb0b Compare January 23, 2017 14:42

Produce charts from benchmark report

848e8b6

#20 Fix the way we grow Arrays

bbcb391

Show error bars

8958d26

julienrf merged commit 960e273 into master Jan 28, 2017

julienrf deleted the benchmark branch January 28, 2017 10:43

biboudis mentioned this pull request Apr 26, 2017

Dotty benchmarks scala/scala3#2307

Closed

7 tasks

Add micro-benchmarks #19

Add micro-benchmarks #19

Conversation

julienrf commented Jan 18, 2017

szeiger commented Jan 18, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

axel22 Jan 18, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Ichoran Jan 18, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Ichoran commented Jan 18, 2017

SethTisue commented Jan 18, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

axel22 Jan 18, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

julienrf commented Jan 19, 2017

axel22 commented Jan 19, 2017 via email

szeiger commented Jan 20, 2017

julienrf commented Jan 20, 2017

SethTisue commented Jan 20, 2017

odersky commented Jan 22, 2017 via email

julienrf commented Jan 23, 2017

retronym commented Jan 24, 2017

julienrf commented Jan 24, 2017

retronym commented Jan 24, 2017

julienrf commented Jan 25, 2017 • edited Loading

ktoso commented Jan 25, 2017 • edited Loading

DarkDimius commented Jan 25, 2017

julienrf commented Jan 25, 2017

ktoso commented Jan 25, 2017

julienrf commented Jan 25, 2017

Ichoran commented Jan 26, 2017

julienrf commented Jan 26, 2017 • edited Loading

szeiger commented Jan 26, 2017

odersky commented Jan 28, 2017

axel22 Jan 18, 2017 •

edited

Loading

Ichoran Jan 18, 2017 •

edited

Loading

axel22 Jan 18, 2017 •

edited

Loading

julienrf commented Jan 25, 2017 •

edited

Loading

ktoso commented Jan 25, 2017 •

edited

Loading

julienrf commented Jan 26, 2017 •

edited

Loading