Parameterize summations over RealFloats #64

414owen · 2020-09-25T23:06:54Z

I'm trying to gauge interest in upstreaming some more generalized floating point compensated summations.
This is useful to me for use with the ad library, where grad takes a function (Traversable f, Num a) => f (Reverse s a) -> Reverse s a). I use the RealFloat instance of Reverse s a to implement a loss function, which includes compensated floating point arithmetic.

Currently all tests pass apart from GHCJS 8.4. I'll look into it if there's enough interest in merging this.

I've run some preliminary benchmarks, which seem very promising, and I think establish that any potential data boxing coming from this generalization won't do too much harm to a user.

edit: <removed obsolete benchmarks>

The instance is useless, as one can use Prelude.sum for the same effect. The test will have the same result as its type parameter would.

Shimuuar · 2020-09-28T19:02:23Z

Main problem is of course performance impact. We go from 3-words/KBN to 5-words/KBN + pointer chasing. And then there's question what is impact of this change. Benchmark result (with -O2) are exactly same so I suppose GHC is smart enough to just unbox everything in main loop and don't allocate Kahan/KBN/KB2 objects at all.

Do benchmark with -O1 (could be changed in cabal file) continue to show no difference?

P.S. Out of curiosity. How exactly do you use compensated summation with ad?

414owen · 2020-09-28T22:55:11Z

Okay, here are some ~~more complete benchmarks~~

I also dumped the assembly for some minimal examples (old, new).

The new ASM doesn't look great, but I'm not really in a position to evaluate it.
The benchmarks don't seem to care too much...

414owen · 2020-09-28T23:05:18Z

I'm passing grad, and optimize a function (Traversable f, RealFloat b) => f b -> b, where b ~ Reverse s a in grad's type.
Elsewhere, I use the function where b ~ Double.

414owen · 2020-10-04T09:14:43Z

Here are some more easily comparable benchmarks: https://gist.github.com/414owen/ea366fc110a4e416ae9ceea035689a03

The new generalized version with -O2 is slightly faster than the current version in every test.
I don't know why this is the case.

Shimuuar · 2020-10-04T16:04:17Z

I looked into core and yes GHC unboxes KBN/etc accumulators so inner loop has type Double# -> Double# -> Int# -> Double. It means it doesn't matter whether accumulator is boxed or not. On one hand it's pretty much ideal case for optimizer: simple loop where everything is inlined. On other it's more or less how accumulator is intended to be used.

Question turns out to be how easy to break optimization above? I can't invent one on the spot so I need to think a bit about it.

The new ASM doesn't look great, but I'm not really in a position to evaluate it.

AFAIK NCG never was among best so it could generate suboptimal assembly

I'm passing grad, and optimize a function ...

And how does KBN & friends enter the picture?

Shimuuar · 2020-10-10T15:30:07Z

I've added summation benchmark which inhibits inlining and requires compiler to actually allocate KBN objects on heap:

kbnStep :: Sum.KBNSum Double -> Double -> Sum.KBNSum Double
kbnStep = Sum.add
{-# NOINLINE kbnStep #-}

...
      , bench "kbn.Noinline" $ whnf (Sum.kbn . U.foldl' kbnStep Sum.zero) v
...

Results are to say the least surprising:

Sum/kbn                     3.704 ms
Sum/kbn.Noinline [unboxed]  8.467 ms
Sum/kbn.Noinline [boxed]    5.871 ms (!)

For some reason boxed version outperforms unboxed one. I didn't tried to look why it's the case. It looks like making accumulator types boxed doesn't result in large performance penalty and for good performance everything must be inlined anyway

Another possible related puzzle is kb2 outperforming everything else including naive summation. And this happens despite kb2 doing much ore work.

Shimuuar · 2020-10-18T15:35:30Z

This PR turns into digging into benchmarking weirdness but it's difficult to make performance sensitive choices when benchmarks lie to you. It turns out that benchmark results depend on order in which they're run:

Sum/naive                                mean 3.632 ms  ( +- 30.21 μs  )
Sum/kahan                                mean 5.437 ms  ( +- 19.12 μs  )
Sum/kb2                                  mean 2.310 ms  ( +- 17.17 μs  )
Sum/kbn                                  mean 1.566 ms  ( +- 8.775 μs  )

Sum/naive                                mean 3.621 ms  ( +- 24.39 μs  )
Sum/kahan                                mean 5.476 ms  ( +- 20.11 μs  )
Sum/kbn                                  mean 3.712 ms  ( +- 28.19 μs  )
Sum/kb2                                  mean 2.284 ms  ( +- 19.84 μs  )

When kbn is run lasts its run time goes from 3.7ms to 1.6ms! It's more than 2x speedup! Something weird is going on here.

I also attempted to measure run time of kbn/kb2 summation using perf tools (gist here). Results are very boring and in line with what's expected: kbn — 1.9 slowdown and kb2 — 2.7 slowdown.

414owen added 3 commits September 25, 2020 20:09

Parameterize compensated summations over any RealFloat

fe23fe3

Remove TypeApplications from summation tests

be7fe0e

Remove Identity functor compensated summation code

ec76e45

The instance is useless, as one can use Prelude.sum for the same effect. The test will have the same result as its type parameter would.

414owen changed the title ~~Generalized summations~~ Parameterize summations over RealFloats Sep 25, 2020

generalized pairwiseSum, and commented code

05c8bbf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parameterize summations over RealFloats #64

Parameterize summations over RealFloats #64

414owen commented Sep 25, 2020 •

edited

Loading

Shimuuar commented Sep 28, 2020

414owen commented Sep 28, 2020 •

edited

Loading

414owen commented Sep 28, 2020 •

edited

Loading

414owen commented Oct 4, 2020

Shimuuar commented Oct 4, 2020

Shimuuar commented Oct 10, 2020

Shimuuar commented Oct 18, 2020

Parameterize summations over RealFloats #64

Are you sure you want to change the base?

Parameterize summations over RealFloats #64

Conversation

414owen commented Sep 25, 2020 • edited Loading

Shimuuar commented Sep 28, 2020

414owen commented Sep 28, 2020 • edited Loading

414owen commented Sep 28, 2020 • edited Loading

414owen commented Oct 4, 2020

Shimuuar commented Oct 4, 2020

Shimuuar commented Oct 10, 2020

Shimuuar commented Oct 18, 2020

414owen commented Sep 25, 2020 •

edited

Loading

414owen commented Sep 28, 2020 •

edited

Loading

414owen commented Sep 28, 2020 •

edited

Loading