-
Notifications
You must be signed in to change notification settings - Fork 178
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Faster Eq and Ord #1016
Comments
Using the natural structure would probably a really fun little project. |
I believe @jwaldmann is looking into it.
It did turn out to be a little faster. I also realized |
Missed |
Hmm... |
I tried a few options for compare_toList :: Ord a => Seq a -> Seq a -> Ordering
compare_toList xs ys = cmpList (toList xs) (toList ys)
{-# INLINABLE compare_toList #-}
cmpList :: Ord a => [a] -> [a] -> Ordering
cmpList [] [] = EQ
cmpList [] (_:_) = LT
cmpList (_:_) [] = GT
cmpList (x:xs) (y:ys) = compare x y <> cmpList xs ys
{-# INLINABLE cmpList #-}
compare_leftFused :: Ord a => Seq a -> Seq a -> Ordering
compare_leftFused xs ys0 = foldr f z xs (toList ys0)
where
-- btw, oneShot doesn't help
f x k ys = case ys of
[] -> GT
y:ys' -> compare x y <> k ys'
z [] = EQ
z (_:_) = LT
{-# INLINABLE compare_leftFused #-}
compare_foldMapOrdM :: Ord a => Seq a -> Seq a -> Ordering
compare_foldMapOrdM xs ys0 = case runOrdM (foldMap f xs) (toList ys0) of
o :*: ys -> o <> if F.null ys then EQ else LT
where
f x = OrdM $ \ys -> case ys of
[] -> GT :*: []
y:ys' -> compare x y :*: ys'
{-# INLINABLE compare_foldMapOrdM #-} Here are the benchmark results (GHC 9.6.3). The compared sequences are Show benchmarks
It seems that The Core output is also too massive to read and reason about. Other notes:
|
Reran the benchmarks with -O (benchmarks are all set to use -O2), and Show benchmarks
|
I've come to the realization that the non-regular structure of Seq is terrible for the performance of folds. All the nesting and unnesting of closures, required only to make it typecheck, increases allocations and disables optimizations like arity analysis. Currently the folds are unrolled one level (thanks to #510), but the more we unroll the better it performs, with diminishing returns. If we can switch to a regular structure, data FingerTree a
= EmptyT
- | Single a
- | Deep {-# UNPACK #-} !Int !(Digit a) (FingerTree (Node a)) !(Digit a)
+ | Single (Tree a)
+ | Deep {-# UNPACK #-} !Int !(Digit (Tree a)) (FingerTree a) !(Digit (Tree a))
+ data Tree a
+ = Leaf a
+ | Node2 !Int a a
+ | Node3 !Int a a a with the invariants maintained manually, we won't have this problem. I don't want to be diving into this right now, so I will go with the |
What you gain in fold performance gets lost elsewhere, I believe. A technique I used somewhere (priority queues, I think) is to start by making a GADT to represent depth in the tree while also supplying type evidence. Then that GADT can be turned into a fake GADT to improve efficiency, using data Foo a where
This :: Foo (Elem a)
That :: !(Foo a) -> Foo (Node a) |
Yes that's possible. I will still want to try it out at some point.
I was thinking about using GADTs too, but what is fake GADT? |
It's when you play make-believe with the type checker. Here's the code I mean. |
Let me expand on the GADT approach a bit. Using a real GADT, we could do this: data Where a n where
Here :: Where a (Elem a)
There :: !(Where a n) -> Where a (Node n)
foldMapBlob :: Monoid m => Where a t -> (a -> m) -> t -> m
foldMapBlob Here f (Elem a) = f a
foldMapBlob (There w) f (Node2 _ x y) = foldMapBlob w f x <> foldMapBlob w f y
foldMapBlob (There w) f (Node3 _ x y z) = foldMapBlob w f x <> foldMapBlob w f y <> foldMapBlob w f z
foldMapFT :: Monoid m => Where a t -> (a -> m) -> FingerTree t -> m
foldMapFT !_ _ EmptyT = mempty
foldMapFT w f (Single t) = foldMapBlob w f t
foldMapFT w f (Deep _ pr m sf) =
foldMap (foldMapBlob w f) pr
<> foldMapFT (There w) f m
<> foldMap (foldMapBlob w f) sf
foldMapSeq :: Monoid m => (a -> m) -> Seq a -> m
foldMapSeq f (Seq t) = foldMapFT Here f t We could use a fake GADT specialized to data Where el no a n where
Here :: Where el no a (el a)
There :: !(Where el no a n) -> Where el no a (no n)
foldMapBlob :: Monoid m => Where Elem Node a t -> (a -> m) -> t -> m
foldMapBlob Here f (Elem a) = f a
foldMapBlob (There w) f (Node2 _ x y) = foldMapBlob w f x <> foldMapBlob w f y
foldMapBlob (There w) f (Node3 _ x y z) = foldMapBlob w f x <> foldMapBlob w f y <> foldMapBlob w f z
foldMapFT :: Monoid m => Where Elem Node a t -> (a -> m) -> FingerTree t -> m
foldMapFT !_ _ EmptyT = mempty
foldMapFT w f (Single t) = foldMapBlob w f t
foldMapFT w f (Deep _ pr m sf) =
foldMap (foldMapBlob w f) pr
<> foldMapFT (There w) f m
<> foldMap (foldMapBlob w f) sf
foldMapSeq :: Monoid m => (a -> m) -> Seq a -> m
foldMapSeq f (Seq t) = foldMapFT Here f t The main efficiency advantage of the nested type approach is that the leaves ( data Leaf a = Leaf2 a a | Leaf3 a a a This possibility has been discussed in the issue tracker before, somewhere. It's totally plausible, but
|
(Please don't actually use |
(half jokingly) then fix GHC, instead of uglifying the code ... what are the real-life applications of Data.Sequence that suffer from this inefficiency? My main use case for Data.Sequence is: to show the source code to students, as an example where typing enforces balance ... I use Data.Sequence in real code from time to time but it's not critical for performance (in the cases that I profiled). Perhaps this is because I do not use Eq and Ord of containers much at all. These instances are certainly nice to have, e.g., to quickly use a Set as a key for a Map, but I found this is too expensive (e.g., for the powerset construction of an automaton), and needs to be replaced with some form of hashing, so the key of the hot Map is Int. This is also why I am a bit slow with #787 (it is still on my list but other stuff is more urgent right now, sorry) |
Nice! Why don't we do this? I expect this would be more efficient than what we have, even without the unsafe trick.
Yes, I can see this being a disadvantage of changing the representation.
Hashing is a fold after all, so you stand to benefit from improved fold functions ;) |
We've never used GADTs in I think the unsafe trick is totally fine, as long as its limitations are respected to avoid the risk of nasal demons. That mostly means being very careful not to overflow the counter. Because sequences are weird, it's probably safest to do an overflow check for increment even on 64-bit systems, though there are fold contexts where that's overkill. |
(Sorry, just passing by) |
@Bodigrim , are people using MicroHs? If so, we should be sure to make |
The counter only goes up to the depth of the tree which is O(log n), so I don't think we need to worry about overflow.
We could have a separate definition for GHC under some CPP. Or, if we don't want to maintain two definitions, instead of applying |
That particular bit of reasoning has a false premise: that tree depth is certain never to exceed
Directly coercing the thing in question leads to quite a large "trusted base"—random unsafe coercions scattered around already intricate code. Let's not go there. CPP would suck: quite a lot of code is involved once this trick is used everywhere that would benefit from it. That includes pretty much everything that reaches deep into trees: maps, folds, traversals, zips, indexing, and I imagine even the mind-warping |
Hold on, I missed a logarithm, didn't I? Let me think about that again; you're probably right. |
No you're right, I can see this being a problem. We need O(2^maxBound) elements, which could be done by repeatedly appending a sequence to itself maxBound times. Feasible for 32-bit systems? So we will need a bounds check.
Understandable... it's not an appealing option. |
Repeated appending risks multiplication by doubling, which could definitely be an issue on 32-bit systems. But repeated |
I like the project a lot (Haskell on microcontrollers!), but I think it will take some time before compatibility with MicroHS will become a practical question. It's in its early days. |
Regarding the alternate representation,
Found it: #91 I also realized something neat about these two representations. It comes down to the question: How we do know when we reach a leaf with an element?
So, this is a good old space-vs-time trade-off. Paying the space to simplify/speed up a lot of other things seems like a good deal to me personally. But it remains to be seen how well it would work in practice. |
Space typically has a huge impact on time. I suspect you'll find that trading compactness for speed will get you neither. But you're welcome to try. |
The following operations are implemented by converting to lists and comparing.
==
(improved in More efficient Eq, Ord for Set, Map #1017)compare
(same as above)==
(same as above)compare
(same as above)compare
compare
==
(More efficient Eq, Ord for Seq #1035)compare
(same as above)This is not the best we can do for a couple of reasons:
Eq
andOrd
for lists don't specialize (but there a few manually specified exceptions).See list instances Eq, Ord.
I tried a couple of alternatives and saw decent improvements
Benchmark code
For now, converting the second set to a list is not avoided.
The text was updated successfully, but these errors were encountered: