Skip to content
This repository has been archived by the owner on Mar 4, 2021. It is now read-only.

Bloom filter crate is buggy and not maintained #17

Open
mfil opened this issue Jul 24, 2019 · 3 comments
Open

Bloom filter crate is buggy and not maintained #17

mfil opened this issue Jul 24, 2019 · 3 comments

Comments

@mfil
Copy link

mfil commented Jul 24, 2019

The bloom crate has a bug which was reported in 2017. There were no actions in the repo for three years.

Briefly, the bug is as follows: one needs to compute k hash values for an element added to the filter. In order to cut down the number of hash function calls, the crate computes two independent hashes and uses them as a seed for a RNG and then samples k values from it. I think that the basic idea is sound, but they botched the implementation of the RNG. See the next method for HashIter in src/hashing.rs. All RNG outputs after the first are a multiple of h2 modulo 2^64.

I'm sorry to say that I don't know a good alternative for the crate. I've been looking at several other bloom filter implementations in Rust for work and none of them inspired much confidence, so we ultimately decided to roll our own.

@aep
Copy link
Owner

aep commented Jul 25, 2019

thanks for the hint. luckily elfkit only uses bloom filters in one place, that might as well be a hashset.

Could you point me at your implementation? Happy to check it out

@mfil
Copy link
Author

mfil commented Jul 25, 2019

We're just getting started on our implementation and I don't know if our employer will let us open-source it. Sorry!

@mfil
Copy link
Author

mfil commented Jul 29, 2019

If you don't need bloom filters specifically, you could also check out cuckoofilters: https://crates.io/crates/cuckoofilter

They're supposed to have better performance.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants