-
Notifications
You must be signed in to change notification settings - Fork 178
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New convenience function for Data.Map #784
Comments
I'm not convinced by the motivation yet. If you compare
and
the |
If the concern is to avoid creating a singleton only to append it immediately, there is |
My motivation was first performance (less allocation) and second convenience. Yes, I am not suggesting getting rid of anything -- just adding this variant of the function. By
But here I don't get to supply a value so I have to capture my |
Using insertCons :: (Ord k) => k -> a -> Map k [a] -> Map k [a]
insertCons k v = alter f k
where
f Nothing = Just [v]
f (Just vs) = Just (v : vs) This avoids traversing the map twice, but still does some boxing (though I wouldn't be surprised if that gets optimized away). I'm not exactly sure why capturing a value would necessarily allocate though. (Not trying to be condescending, but if you want to have as few allocations as possible, I'm not sure Haskell is the right language for you anyway.) The function you're proposing is too specialized IMO (btw, I think it would make more sense to propose using Considering the scenario, what you're actually doing seems to be simulating a multimap. For that, using |
I covered your first point in my previous message (building a closure will allocate unless by magic everything gets inlined the right way -- which is no way to live :). As to your second point and your suggestion of using In fact function As to the Haskell-is-not-the-language-for-you-if-you-don't-want-to-allocate-too-much argument I don't buy it. True, Haskell allocates a lot, but a programming language is a tool to build all kinds of programs -- for some allocation is no big deal but for others it is. I don't see why I have to compromise. What I have proposed is an improvement both in terms of allocation and (I believe) in terms of API. As to the last suggestion of using Set instead of [a], this could work when what you were collecting was a |
That's only true when you want to write a function that's generic over insertWith' :: (Ord k) => (a -> b -> b) -> (a -> b) -> k -> a -> Map k b -> Map k b
insertWith' f g key x = insertWith (const (f x)) key (g x)
insertCons :: (Ord k) => k -> a -> Map k [a] -> Map k [a]
insertCons key value = insertWith' (:) pure key value
As you can see, it can be implemented using |
Thank you for that insight. I rewrote it slightly to understand what you did:
and the point is you never use As for the generality issue, I think you have misunderstood what I meant probably because I didn't explain it well. It is more general in the sense that it can do what the existing functions can do without allocating anything. With your (admittedly more general) API above (mapping to I looked at the implementation and it looks easy to implement my version. If you decide to add it let me know and I can send a patch. |
It appears to me (please correct me if I am wrong) that there is another performance issue with your implementation here:
If the key is already in the map we end up calling both |
I have implemented the above proposal with an alternative interface as suggested by @konsumlamm . I have tentatively named the function insertWithFun (betters names are welcome). The new interface now looks like:
The pull request is here: #785 If you decide you like the new function, please review carefully! |
I'm not really a fan of these closure-avoidance functions. I'd much rather
see if we can get good performance from friendlier APIs.
…On Sun, Jun 27, 2021, 6:05 PM Neo ***@***.***> wrote:
I have implemented the above proposal with an alternative interface as
suggested by @konsumlamm <https://github.com/konsumlamm> . I have
tentatively named the function insertWithFun (betters names are welcome).
The new interface now looks like:
insertWithFun :: Ord k => (a -> b -> b) -> (a -> b) -> k -> a -> Map k b -> Map k b
The pull request is here: #785
<#785>
If you decide you like the new function, please review carefully!
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#784 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAOOF7KWHDPFJ7WPPBHTOVTTU6OBXANCNFSM47LP63AA>
.
|
@treeowl But how? It's not really possible with the existing apis (alter, insertWith). Boxes and closures are your only choice. And the case I am describing happens often enough in real code -- building a Map from some Key to a Set or a List, or a Vector, or another Map etc. I have not been using Haskell long but it happened to me often enough to annoy me enough to propose this addition. And honestly I wouldn't call the API of this function unfriendly -- it's just one more argument than the existing functions. It's of course up to you guys if you want to include it but I hope you do. |
What I have wanted for ages is something more like
insyboggle :: Ord k => v -> (v -> v) -> k -> Map k v -> Map k v
which is equivalent to
gadzooks :: Ord k => (Maybe v -> v) -> k -> Map k v -> Map k v
The only problem is the pesky closure creation. Is there a way to
manipulate arity and inlining to make that problem go away?
…On Sun, Jun 27, 2021, 6:24 PM Neo ***@***.***> wrote:
@treeowl <https://github.com/treeowl> But how? It's not really possible
with the existing apis (alter, insertWith). Boxes and closures are your
only choice. And the case I am describing happens often enough in real code
-- building a Map from some Key to a Set or a List, or a Vector, or another
Map etc. I have not been using Haskell long but it happened to me often
enough to annoy me enough to propose this addition. And honestly I wouldn't
call the API of this function unfriendly -- it's just one more argument
than the existing functions.
It's of course up to you guys if you want to include it but I hope you do.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#784 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAOOF7IGLKJSZIHVSF77XHLTU6QKNANCNFSM47LP63AA>
.
|
Let's have a closer look. Let for example v = Set w. Calling Calling I think there is no hope for The function being proposed here does no allocation as the expense of one extra argument. I think it's small price to pay. |
That seems overly pessimistic about what GHC can optimize. In theory, simply inlining the singleton and the closure will avoid their creation. In fact I think this is doable by tweaking only At the moment
Here's the modified function: alter :: Ord k => (Maybe a -> Maybe a) -> k -> Map k a -> Map k a
alter f = go
where
-- go :: Ord k => (Maybe a -> Maybe a) -> k -> Map k a -> Map k a
go !k Tip = case f Nothing of
Nothing -> Tip
Just x -> singleton k x
go k (Bin sx kx x l r) = case compare k kx of
LT -> balance kx x (go k l) r
GT -> balance kx x l (go k r)
EQ -> case f (Just x) of
Just x' -> x' `seq` Bin sx kx x' l r
Nothing -> glue l r
-- #if __GLASGOW_HASKELL__
-- {-# INLINABLE alter #-}
-- #else
{-# INLINE alter #-}
-- #endif Here's module CC (insertWith') where
import Data.Map (Map)
import qualified Data.Map.Strict as Map
insertWith' :: Ord k => (a -> b -> b) -> (a -> b) -> k -> a -> Map k b -> Map k b
insertWith' cons singleton k x = Map.alter (\oy -> Just $ case oy of
Just y -> cons x y
Nothing -> singleton x) k Optimized Core (using insertWith'
= \ @ k_a1y2
@ a_a1y3
@ b_a1y4
$dOrd_a1y6
cons_a1cW
singleton_a1cX
k1_a1cY
x_a1cZ
eta_B1 ->
letrec {
go7_s1zP
= \ k2_a1yJ ds_a1yK ->
case k2_a1yJ of k3_a1yL { __DEFAULT ->
case ds_a1yK of {
Bin ipv_a1yT ipv1_a1yU ipv2_a1yV ipv3_a1yW ipv4_a1yX ->
case compare $dOrd_a1y6 k3_a1yL ipv1_a1yU of {
LT ->
balance ipv1_a1yU ipv2_a1yV (go7_s1zP k3_a1yL ipv3_a1yW) ipv4_a1yX;
EQ ->
case cons_a1cW x_a1cZ ipv2_a1yV of x'1_a1za { __DEFAULT ->
Bin ipv_a1yT ipv1_a1yU x'1_a1za ipv3_a1yW ipv4_a1yX
};
GT ->
balance ipv1_a1yU ipv2_a1yV ipv3_a1yW (go7_s1zP k3_a1yL ipv4_a1yX)
};
Tip ->
case singleton_a1cX x_a1cZ of x1_a1zg { __DEFAULT ->
Bin 1# k3_a1yL x1_a1zg Tip Tip
}
}
}; } in
go7_s1zP k1_a1cY eta_B1 I did have to use |
@Lysxia I have tried your code (with ghc 8.10.4, options -O2 -fno-full-laziness) but I was not able to get ghc to inline I am not sure what you are suggesting though. Are you suggesting to add the new function but instead implement it with |
Whether you use an arity of one or two, you're sometimes going to change
the arity of what the user passes/wants to pass. How do you decide which is
more important?
…On Mon, Jun 28, 2021, 9:37 AM Neo ***@***.***> wrote:
I have tried your code (with ghc 8.10.4, options -O2 -fno-full-laziness)
but I was not able to get ghc to inline alter. I didn't change INLINABLE
to INLINE so that much be the reason. ghc may not inline an INLINABLE
function for any number of reasons (some of them good) -- and those reasons
change from version to version.
I am not sure what you are suggestiong though. Are you suggesting to add
the new function but instead implement it with alter (instead of directly
as I did), and then find the right set on incantations when compiling
containers to get ghc to get rid of the boxes/closures? Or are you saying
that we don't need the new function altogether because we have alter and
maybe we can get it to inline in user code? If it's the first, then I
honestly don't see the point -- the implementation is so small it's not
worth bothering with the refactoring. If it's the second, then first we
have to find the right set of incantations to get ghc to inline and second
we have to convince people to use those compiler options (be it
no-full-laziness or whatever else) in their projects -- a losing game -- no
one will do it (in fact specifying the options you want to get alter to
inline may cause the user project to run slower -- after all
*full-laziness* is the default for a reason).
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#784 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAOOF7KSXEXRPO2266URL2TTVB3JNANCNFSM47LP63AA>
.
|
@treeowl I am sorry but I don't understand your comment regarding 'arity' at all. Please elaborate. |
I am indeed suggesting the latter option, that we somehow fix
The current definition of
Or we implement Maybe someone more knowledgeable about the internals of GHC can tell us
|
@Lysxia ghc may decide not to inline a function for all sorts of reasons. The INLINE directive forces the inline but is that the right thing in all cases? I believe the solution I have proposed is a reasonable compromise and it does make the Map interface better. |
For now I'm more in favor of improving I'm wondering though how much performance we stand to gain from avoiding the overhead of closure allocation. All of the functions discussed end up allocating a new Regarding the functions that @treeowl suggests in #784 (comment), see some previous discussion in #538, particularly #538 (comment). |
insyboggle :: Ord k => v -> (v -> v) -> k -> Map k v -> Map k v`
I fail to see the issue here since The real question to me is why |
Consider inserting (k, v) elements into a map (Data.Map) in a
scenario where if the element exists you want to combine the new
element (v) with the old existing element. For this we have function:
which does the trick.
But consider this scenario:
and when we insert we want to add another value to the list if the key
is there.
We now have to do:
So we need to build a new singleton list and then call append from
every insert. The suggestion is to add a new function in which we
parametrize more generally over the value of the map as below.
The behavior we want is the following but without the double tree
traversal (2*log(n)) cost.
With such a function for the example above we could then do:
and so avoid creating the singleton lists. Of course this is not specific to lists... it would help with any container value in the position of map value.
The text was updated successfully, but these errors were encountered: