Skip to content

Benchmarks

Meng Zhao edited this page Dec 11, 2018 · 2 revisions

Whoosh is quite fast!

As Whoosh is in pure Python, there is of course the common suspicion that it must be significantly slower than Non-Pure-Python (like search code in C/C++/Java, plus Python wrapper) search solutions.

The benchmarks below are maybe not very scientific and also not covering all sorts of different use cases, but they maybe show that one needs to be careful with such suspicions.

Benchmark code is there (benchmark results made with different versions of the benchmark code are NOT comparable): https://bitbucket.org/thomaswaldmann/python-search-benchmark/

If you have more test code or adaptions for different python search libs, please contribute!

How the benchmark works

N documents are generated, the search word is a random word and 10 chars long, plus 10 extra fields with 100 chars of random stuff each (just to pump up the size of the document).

For indexing, all fields are indexed and stored.

For searching, all words are searched in random order and all stored fields are retrieved.

For whoosh, we used the multiprocessing writer for building the index - this explains why it is faster for indexing than xappy (because it used all 4 cores, not just 1).

For searching, xappy/xapian is faster (there was no parallel processing used).

But you see that the speed difference between xappy and whoosh is maybe not as big as you expected.

Index Size about 12MB

# Phenom II X4 840, 8GB RAM, HDD
# Python 2.7.2+ (default, Oct  4 2011, 20:06:09) 
# [GCC 4.6.1] on linux2

Params:
DOC_COUNT: 3000 WORD_LEN: 10
EXTRA_FIELD_COUNT: 10 EXTRA_FIELD_LEN: 100

Benchmarking: xappy 0.5 / xapian 1.2.5
Indexing takes 2.8s (1068.9/s)
Searching takes 0.5s (6635.8/s)

Benchmarking: whoosh 2.3.2
Indexing takes 0.8s (3575.6/s)
Searching takes 0.8s (3714.8/s)
Clone this wiki locally