suffix-array-li2016

An implemenetation of (part of) the suffix array construction algorithm developed by Zhize Li et al. (2016). This algorithm is claimed by the authors to be in-place (i.e. O(1) space complexity) and runs in linear time (O(n) where n is the number of characters in the string). However in practice, the time complexity does not seem to be linear, as evident in the experimental results in their own paper and in experiments with my implementations.

Performance

I provide both C++ and Rust implementations, which have similar performance. Li's algorithm is not faster than the naive algorithm (naive sorting of all suffixes, which is also in-place) with a 32-bit unsigned interger alphabet until n is greater than about 800,000. Here I am referring to the Rust implementation of the algorithm in section 3, which allows a mutable string and has a stronger constraint, i.e. n >= alphabet_size. The algorithm in section 4 (the main contribution of this paper, which operates on immutable strings and allows alphabet_size > n) is expected to be slower.

// test tests::bench_li_500_1000      ... bench:     168,711 ns/iter (+/- 27,549)
// test tests::bench_li_500_10000     ... bench:   1,356,838 ns/iter (+/- 79,133)       x 8.0  x   8.0
// test tests::bench_li_500_100000    ... bench:  14,359,321 ns/iter (+/- 1,307,696)    x10.6  x  85.1
// test tests::bench_li_500_1000000   ... bench: 180,594,903 ns/iter (+/- 13,631,112)   x12.6  x1070.4
// test tests::bench_naive_500_1000    ... bench:      90,554 ns/iter (+/- 5,723)
// test tests::bench_naive_500_10000   ... bench:   1,262,911 ns/iter (+/- 58,702)      x13.9  x  10.3
// test tests::bench_naive_500_100000  ... bench:  16,251,237 ns/iter (+/- 661,166)     x12.9  x 179.5
// test tests::bench_naive_500_1000000 ... bench: 220,851,474 ns/iter (+/- 6,532,465)   x13.6  x2438.9

Curiously, when testing with 10,000,000 or more characters, Li's algorithm becomes slower than the naive algorithm (the C++ implementation has similar performance). What could be the culprit?

test tests::bench_li_500_10000000    ... bench: 4,796,887,136 ns/iter (+/- 20,770,245)  x26.56  x28433
test tests::bench_naive_500_10000000 ... bench: 3,082,987,837 ns/iter (+/- 63,887,953)  x13.95  x34045

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
cpp		cpp
rust		rust
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

suffix-array-li2016

Performance

About

Releases

Packages

Languages

License

TianyiShi2001/suffix-array-li2016

Folders and files

Latest commit

History

Repository files navigation

suffix-array-li2016

Performance

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages