Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IntMap should offer lookup variant for likely successful lookup #794

Closed
Boarders opened this issue Aug 16, 2021 · 19 comments
Closed

IntMap should offer lookup variant for likely successful lookup #794

Boarders opened this issue Aug 16, 2021 · 19 comments

Comments

@Boarders
Copy link
Contributor

Boarders commented Aug 16, 2021

Currently lookup is defined to fail early by checking if the prefix mask agrees with the key:

lookup :: Key -> IntMap a -> Maybe a
lookup !k = go
  where
    go (Bin p m l r) | nomatch k p m = Nothing
                     | zero k m  = go l
                     | otherwise = go r
    go (Tip kx x) | k == kx   = Just x
                  | otherwise = Nothing
    go Nil = Nothing

In the original paper they note that if we expect our lookups to be successful then we should instead prioritize testing if the kth bit is zero and failing only when we reach leaves. If I understand correctly, then this would just be the following code:

lookupSucc :: Key -> IntMap a -> Maybe a
lookupSucc !k = go
 where
   go (Bin _p m l r)
                    | zero k m  = go l
                    | otherwise = go r    
   go (Tip kx x) | k == kx   = Just x
                 | otherwise = Nothing
   go Nil = Nothing

Is that correct? It seems to pass the tests in any case. This can certainly lead to large speed ups when we know we are doing many successful lookups. For example, in the lookup benchmark we get the following (this is a best case scenario as the benchmark does not test a map with misses which seems like a bad design for a benchmark):

benchmarked lookup
time                 172.2 μs   (170.3 μs .. 174.1 μs)
                    0.998 R²   (0.994 R² .. 1.000 R²)
mean                 176.6 μs   (175.3 μs .. 181.2 μs)
std dev              7.148 μs   (2.786 μs .. 15.37 μs)

benchmarked lookupSucc
time                 153.0 μs   (151.6 μs .. 154.0 μs)
                    1.000 R²   (1.000 R² .. 1.000 R²)
mean                 152.2 μs   (151.7 μs .. 152.7 μs)
std dev              1.783 μs   (1.306 μs .. 2.362 μs)

Even if this is not quite the correct implementation, containers should probably offer this sort of function to allow users to tune their code to their needs as both variants are worthwhile.

@treeowl
Copy link
Contributor

treeowl commented Aug 16, 2021 via email

@Boarders
Copy link
Contributor Author

Here is how it looks on the original benchmark, the same benchmark with 50% hits, with 90% hits and with all failures:

original fast-fail version:
benchmarked lookup
time                 183.5 μs   (182.2 μs .. 185.0 μs)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 182.8 μs   (182.3 μs .. 183.3 μs)
std dev              1.673 μs   (1.292 μs .. 2.405 μs)

benchmarked lookup_half
time                 101.4 μs   (100.4 μs .. 102.9 μs)
                     0.999 R²   (0.998 R² .. 1.000 R²)
mean                 100.9 μs   (100.6 μs .. 101.3 μs)
std dev              1.195 μs   (828.7 ns .. 1.764 μs)

benchmarked lookup_most
time                 169.2 μs   (168.3 μs .. 170.2 μs)
                     1.000 R²   (0.999 R² .. 1.000 R²)
mean                 168.4 μs   (168.0 μs .. 168.9 μs)
std dev              1.558 μs   (1.255 μs .. 2.051 μs)

benchmarked lookup_fails
time                 23.82 μs   (23.70 μs .. 23.97 μs)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 23.95 μs   (23.86 μs .. 24.19 μs)
std dev              447.1 ns   (176.3 ns .. 882.7 ns)
version without fast-fail
benchmarked lookup
time                 152.1 μs   (149.6 μs .. 156.0 μs)
                     0.996 R²   (0.992 R² .. 1.000 R²)
mean                 154.3 μs   (153.0 μs .. 156.3 μs)
std dev              5.418 μs   (3.935 μs .. 7.282 μs)
variance introduced by outliers: 17% (moderately inflated)

benchmarked lookup_half
time                 151.2 μs   (150.2 μs .. 152.6 μs)
                     0.999 R²   (0.999 R² .. 1.000 R²)
mean                 151.7 μs   (151.2 μs .. 152.4 μs)
std dev              1.902 μs   (1.390 μs .. 2.776 μs)

benchmarked lookup_most
time                 143.9 μs   (143.4 μs .. 144.5 μs)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 145.2 μs   (144.8 μs .. 145.9 μs)
std dev              1.829 μs   (1.350 μs .. 2.582 μs)

benchmarked lookup_fails
time                 150.2 μs   (149.0 μs .. 151.7 μs)
                     0.999 R²   (0.999 R² .. 1.000 R²)
mean                 150.5 μs   (149.9 μs .. 151.5 μs)
std dev              2.544 μs   (1.804 μs .. 3.565 μs)

I can definitely ask about naming on the libraries list tomorrow, yes!

@Boarders
Copy link
Contributor Author

I can't see any good reason that find should not be this variant since it otherwise throws an error.

@treeowl
Copy link
Contributor

treeowl commented Aug 16, 2021 via email

@andreasabel
Copy link
Member

andreasabel commented Aug 18, 2021

A name that does not suggest success is query. This could be a fast-failing alternative to lookup (which still suggests success, albeit not as strongly as find).

P.S.: In Agda, we do not use find, but lookup which gives us control over the error, e.g.: https://github.com/agda/agda/blob/c9c20bf3683f9eaf9987b80257369be2a45d452b/src/full/Agda/Utils/Graph/AdjacencyMap/Unidirectional.hs#L564

@nomeata
Copy link
Contributor

nomeata commented Aug 18, 2021

findMaybe?

@treeowl
Copy link
Contributor

treeowl commented Aug 18, 2021

Could y'all also please comment on whether you think lookup should continue to be fast-failing or not?

@nomeata
Copy link
Contributor

nomeata commented Aug 18, 2021

Maybe lookup should get a HasCallStack constraint to distinguish call sites, keep stats on whether failure is frequent, persist that data transparently in /var/run/… and pick the right implementation based on that.

@andreasabel
Copy link
Member

Could y'all also please comment on whether you think lookup should continue to be fast-failing or not?

The usual dilemma: (1) some apps out there might have been tested to perform well with the present fast-failing lookup, (2) other apps might rather use lookup with success expectation and benefit from a switch to non-fast-failing. Without a field study, we wouldn't know.

In the absence of real-world data, I can only recommend the conservative path: Leave lookup as it is, but fork it into 2 new variants (e.g. findMaybe and query, or lookupShouldSucceed and lookupMayFail) that bring the issue to the attention of the user. Possibly deprecate lookup, but then it is only deprecated for IntMap, where we strive for a uniform interface for all the collections, and there, lookup does not split... (complications, sigh!).

@nomeata
Copy link
Contributor

nomeata commented Aug 18, 2021

We might be overthinking this. The performance difference isn't so great that this change would be worse to some than other likely-optimizations that in the past were simply introduced in prior versions. So making a change should be fine I believe. This does not help us in deciding whether there to make that change or not. Gut feeling says yes: it makes the run time of the function more uniform in general.

@nomeata
Copy link
Contributor

nomeata commented Aug 23, 2021

Here is how it looks on the original benchmark, the same benchmark with 50% hits, with 90% hits and with all failures:

How did you set up the “all failures” test?

I expect that an “all failures (but uniform distribution of keys)” will behave very different from “all failures (but existing keys are far away from the queries)”. In fact, I expect for usages where the keys of the failed lookups are “within” the range of the existing values, the new code is actually faster for the failed lookups too.

That makes me wonder if it is justifiable to just use new code always?

@Boarders
Copy link
Contributor Author

@nomeata I did not do anything sensible. The current benchmark has keys 12..2^12 so I just shifted them all by 2^12. In general the IntMap benchmarks could do with some attention - I'll see if I can add something a bit more reasonable in this change.

@Boarders
Copy link
Contributor Author

Boarders commented Sep 9, 2021

@nomeata Your suspicion was correct, if we use lookup on lots of near-misses then the new version is faster.

@Boarders
Copy link
Contributor Author

Boarders commented Sep 9, 2021

@sjakobi When making these changes it would be nice to know how it impacts GHC as a real-world consumer of IntMap, do you have any advice on how to do that?

@nomeata
Copy link
Contributor

nomeata commented Sep 16, 2021

JFTR, conversation has continued at #800

nomeata added a commit to nomeata/hs-to-coq that referenced this issue Sep 16, 2021
in issue haskell/containers#794 and PR
haskell/containers#800 an alternative
implementation of `lookup` is proposed which checks if the key is as
expected only when reaching the leaf. This means less branching in the
success case or when the lookup key shares long prefixes with existing
values, but may do more work when the lookup key is far away from
existing keys.

Not saying that it’s particularly doubtful that the changed code is
correct, but since we have the machinery, why not verify it? So here we
go, and yes, it goes through.

I don’t think we need to merge the PR like this, and can probably wait
for the code to reach a containers release, and then include it there.

Also, this PR currently contains a bunch of local hacks that I used to
get going again (now that I use NixOS), which need to be removed, merged
separately, or made obsolete by better changes to the setup.
@sjakobi
Copy link
Member

sjakobi commented Sep 16, 2021

@sjakobi When making these changes it would be nice to know how it impacts GHC as a real-world consumer of IntMap, do you have any advice on how to do that?

Yes, that would be very good to check! Unfortunately I'm not aware of any good documentation on doing this – although it should definitely exist, either in the GHC wiki or even in this repo.

The basic process is that you update the libraries/containers submodule in GHC to use your branch and then build some sample modules, comparing allocations and timings against a build with an unmodified GHC. Cabal is a popular sample package to test, because it's directly available in libraries/Cabal and doesn't have additional dependencies.

I've previously done a tiny bit of testing for #340: See #340 (comment)

@doyougnu has done more testing and may have better advice: #340 (comment)

@nomeata
Copy link
Contributor

nomeata commented Sep 16, 2021

comparing allocations and timings

allocations will be useless here, I expect, as (laziness aside) both functions should allocate the same (the return Just/Nothing). Also make sure to change find as well, I assume GHC uses that a lot as well. Timing is always quite noisy though. For this one, I’d consider a comparison of instruction count a good first indication, though.

@Boarders
Copy link
Contributor Author

Ben Gamari gave me some ideas for what to do in order to measure a cabal build, just waiting on GHC to re-build.

@doyougnu
Copy link

You can read through the commits in this branch to get a step-by-step guide on altering the boot libs (although I'm adding two boot libs in that branch). Instead of adding a boot lib you would just be changing its ref in $GHC_ROOT/.gitmodules and updating the submodule.

You'll want to mimic this commit: https://gitlab.haskell.org/doyougnu/ghc/-/commit/297fe09edbfab7c5a358b0485090a4ed8121f427

And these docs will be useful: https://gitlab.haskell.org/ghc/ghc/-/wikis/working-conventions/git/submodules

If you run into trouble I can test it out sometime next week or continue to offer advice here.

I'm +1 for this change and would be interested in observing of ghc reacts to this increase in lookup misses.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants