Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Complexity of IntMap-IntMap operations #1011

Open
meooow25 opened this issue Jun 20, 2024 · 0 comments
Open

Complexity of IntMap-IntMap operations #1011

meooow25 opened this issue Jun 20, 2024 · 0 comments

Comments

@meooow25
Copy link
Contributor

meooow25 commented Jun 20, 2024

Recently, something made me notice the documented complexity of IntMap-IntMap operations, and I realized that I have never really thought about it.

Let's take IntMap union, for instance.

-- | \(O(n+m)\). The (left-biased) union of two maps.

The claim is $O(n + m)$, which is not hard to verify. We match on some (in a bad case all) of the constructors of the two maps, of which there are a linear number, and whenever we do so we perform a constant amount of work. Hence $O(n + m)$.

But we can try to find a tighter bound.

Consider that $m \le n$, and we think of the smaller map. In the worst case, we might visit all constructors of the map, which makes $m$ a lower bound. The work we do in addition to this, is whenever we reach forks in the bigger map which are not in the smaller map. How many such forks can there be? The worst case is that for every leaf in the smaller map, the path to the root from a leaf hits every possible fork (i.e. forks at every power of 2). So, we need the number of nodes of the subset of the complete binary tree of all ints which forms paths from the root to the leaf keys of the map. For a map of size $m$ and the complete binary tree having height $W$, this is $O(m(W - \log m))$. Since we still have the $n + m$ upper bound as before, the overall complexity for $m \le n$ is

$$O(\min(n, m(W - \log m)))$$

I can note two interesting things about it:

  1. If we consider $m$ as a small constant, the complexity is $O(\min(n,W))$, which checks out to be the complexity we have for insert.
  2. The second term, $m (W - \log m) = m \log(\frac{2^W}{m})$, is not unlike the complexity for Map union, $O(m \log ( \frac{n}{m} ))$.

If this looks correct, I would like to update the documentation to reflect it.

Why does it matter? After all, the current bound is not incorrect.
For accuracy, and also it would help better estimate complexity in derived scenarios. For instance, consider foldl' union empty intmaps on $r$ IntMaps of size $m$ each. With the current bound, we would calculate the complexity to be $O(r^2 m)$. With the new bound, we get $O(rm(W - \log m))$. We went from quadratic in $r$ to linear, which is a very useful distinction.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant