Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scalable Bloom Filter panics after ~31 million insertions #29

Open
jameshageman-stripe opened this issue Mar 21, 2019 · 5 comments
Open

Comments

@jameshageman-stripe
Copy link

Hey @tylertreat! I noticed that NewDefaultScalableBloomFilter(0.01) panics after exactly 31,536,466 insertions. Here is a test case:

func TestScalableBloomLarge(t *testing.T) {
	total := uint32(40000000) // 40 million
	f := NewDefaultScalableBloomFilter(0.01)

	i := uint32(0)

	defer func() {
		if err := recover(); err != nil {
			fmt.Printf("panicked after %d iterations: %s\n", i, err)
			panic(err)
		}
	}()

	for ; i < total; i++ {
		if i%1000000 == 0 {
			fmt.Printf("  i: %d\n", i)
		}
		key := make([]byte, 8)
		binary.BigEndian.PutUint32(key, i)
		f.Add(key)
	}
}

and here is the test ouput:

=== RUN   TestScalableBloomLarge
  i: 0
  i: 1000000
  i: 2000000
# ...
  i: 29000000
  i: 30000000
  i: 31000000
panicked after 31536466 iterations: runtime error: makeslice: len out of range
--- FAIL: TestScalableBloomLarge (215.81s)
panic: runtime error: makeslice: len out of range [recovered]
	panic: runtime error: makeslice: len out of range [recovered]
	panic: runtime error: makeslice: len out of range

goroutine 132 [running]:
testing.tRunner.func1(0xc000100f00)
	/usr/local/Cellar/go/1.12.1/libexec/src/testing/testing.go:830 +0x392
panic(0x1174160, 0x11d3170)
	/usr/local/Cellar/go/1.12.1/libexec/src/runtime/panic.go:522 +0x1b5
github.com/tylertreat/BoomFilters.TestScalableBloomLarge.func1(0xc00007cf7c)
	/Users/jameshageman/stripe/BoomFilters/scalable_test.go:179 +0x118
panic(0x1174160, 0x11d3170)
	/usr/local/Cellar/go/1.12.1/libexec/src/runtime/panic.go:522 +0x1b5
github.com/tylertreat/BoomFilters.NewPartitionedBloomFilter(0x2710, 0x357fb461c091b, 0x54e5e2762f38eb)
	/Users/jameshageman/stripe/BoomFilters/partitioned.go:52 +0xb1
github.com/tylertreat/BoomFilters.(*ScalableBloomFilter).addFilter(0xc00028a000)
	/Users/jameshageman/stripe/BoomFilters/scalable.go:148 +0x6a
github.com/tylertreat/BoomFilters.(*ScalableBloomFilter).Add(0xc00028a000, 0xc0d18e4858, 0x8, 0x8, 0x11d8120, 0xc00028a000)
	/Users/jameshageman/stripe/BoomFilters/scalable.go:120 +0x112
github.com/tylertreat/BoomFilters.TestScalableBloomLarge(0xc000100f00)
	/Users/jameshageman/stripe/BoomFilters/scalable_test.go:189 +0xc6
testing.tRunner(0xc000100f00, 0x11ad908)
	/usr/local/Cellar/go/1.12.1/libexec/src/testing/testing.go:865 +0xc0
created by testing.(*T).Run
	/usr/local/Cellar/go/1.12.1/libexec/src/testing/testing.go:916 +0x35a
FAIL	github.com/tylertreat/BoomFilters	218.857s

The culprit appears to be this line:

partitions = make([]*Buckets, k)

which confuses me because k should be a totally reasonable size, and is only a function of fpr and not n.

@tylertreat
Copy link
Owner

Fascinating discovery and sorry you hit this bug! I will take a look and see if I can figure out what's going on.

@tylertreat
Copy link
Owner

So unfortunately the issue is k ends up being too large to allocate a slice of that length. Here's what ends up getting computed in NewPartitionedBloomFilter on the 31536466th iteration:

n 10000
fpRate +4.649956e-309
k 9223372036854775808

This shows what's happening: https://play.golang.org/p/fJD2OaXK2Bc

I'm not sure what exactly to do here since it is the nature of the scalable bloom filter to add continuously larger filters in order to enforce a tight upper bound on false positives.

@jameshageman-stripe
Copy link
Author

@tylertreat I see, I assume it would take longer to hit this limit if you set a higher hint? It would seem intuitive as 31 million is so much larger than the default hint of 10000.

@tylertreat
Copy link
Owner

tylertreat commented Mar 25, 2019

In this case the limit is hit irrespective of hint since the slice length is k, which is computed based on the target fpRate alone:

k = OptimalK(fpRate)

partitions = make([]*Buckets, k)

@darcyaf
Copy link

darcyaf commented Oct 27, 2022

test the limition of the k.
where len(filters) to large, the fpRate will become smaller and smaller, and result of 1/fpRate became +Inf, so that make make func panic

playground: test filters limit

package main

import (
	"fmt"
	"math"
)

func main() {
	fp := 0.01
	r := 0.8
	for i := 100; i < 100000000; i = i + 400 {

		fpRate := fp * math.Pow(r, float64(i))
		k := math.Ceil(math.Log2(1 / fpRate))
		fmt.Printf("%+v,%+v, %+v\n", i, fpRate, k)
		_ = make([]*Buckets, uint(k))
	}
}

type Buckets struct {
	data       []byte
	bucketSize uint8
	max        uint8
	count      uint
}
100,2.037035976334508e-12, 39
500,3.507466211043593e-51, 168
900,6.039323489882386e-90, 297
1300,1.0398796744101223e-128, 426
1700,1.790514681094456e-167, 554
2100,3.08299402527833e-206, 683
2500,5.3084469288417205e-245, 812
2900,9.140338438957912e-284, 941
3300,1.6e-322, +Inf
panic: runtime error: makeslice: len out of range

goroutine 1 [running]:
main.main()
	/tmp/sandbox2242876909/prog.go:16 +0x3f

Program exited.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants