Skip to content

Fast and highly tuned bit vector implementation including space efficient rank and select support having only 3.51% space overhead.

License

Notifications You must be signed in to change notification settings

pasta-toolbox/bit_vector

Repository files navigation

pasta::bit_vector

License: GPL v3 DOI pasta::bit_vector CI codecov

This header-only library contains a highly tuned (uncompressed) bit vector implementation with multiple space efficient rank and select support data structures. Our fastest rank and select support has a space overhead of only ~3.51% and makes use of data level parallelism via SIMD instructions.

If you use this code in a scientific context, please cite our paper.

@inproceedings{Kurpicz2022CompactRankSelect,
  author    = {Florian Kurpicz},
  title     = {Engineering Compact Data Structures for Rank and Select Queries on Bit Vectors},
  booktitle = {{SPIRE}},
  series    = {Lecture Notes in Computer Science},
  volume    = {13617},
  pages     = {257--272},
  publisher = {Springer},
  year      = {2022},
  doi       = {10.1007/978-3-031-20643-6\_19}
}

Contents

This repository contains the following algorithms and data structures. Our documentation contains in-depth on the usage of all these algorithms and data structures including easy to follow examples. You can find the example in the screenshot below as text, too.

Screenshot Documentation

Bit Vectors

Bit vectors play an important role in many compressed text indices, e.g., the FM-index. This repository contains the following bit vector implementations:

Dong Zhou and David G. Andersen and Michael Kaminsky, Space-Efficient, High-Performance Rank and Select Structures on Uncompressed Bit Sequences, SEA 2013.

  • improved rank and select support requiring the same amount of memory but providing faster rank (up to 8% speedup) and select (up to 16.5% speedup) queries, and
  • a very fast rank support that can also answer select queries.

Easy to Use

Since this is a header-only library, you have to simply add it to your projects include path to use it. A small example can be found below. We refer to the documentation for more information.

#include <pasta/bit_vector/bit_vector.hpp>
#include <pasta/bit_vector/support/flat_rank_select.hpp>

// Create a bit vector of size 1000 containing only zeros and flip every other bit.
pasta::BitVector bv(1000, 0);
for (size_t i = 0; i < bv.size(); ++i) {
  if (i % 2 == 0) {  bv[i] = 1; }
}
// Print the bit vector to see that everything worked ;-)
for (auto it = bv.begin(); it != bv.end(); ++it) {
  std::cout << ((*it == true) ? '1' : '0');
}
std::cout << std::endl;

// Create rank and select support and print the result of some queries.
pasta::FlatRankSelect rs(bv);
std::cout << rs.rank0(250) << ", " << rs.rank1(250)
          << ", "
          << rs.select0(250) << ", " << rs.rank1(250)
          << std::endl;

Benchmarks and Tests

There exist an easy to use benchmark, which helps to compare the implementations in this repository. To build the benchmark, run the CMake command with -DPASTA_BIT_VECTOR_BUILD_BENCHMARKS=On. Our tests are contained in the folder tests. To build the tests, run the CMake command with -DPASTA_BIT_VECTOR_BUILD_TESTS=On.

We also conducted an extensive experimental evaluation. To this end, we use our rank and select benchmark where we compare our implementations with many other compact rank and select data structures.

We refer to our paper for a full description of the results, i.e., hardware, inputs, and competitors. Below, you can find some of the figures we present in the paper.

Screenshot Documentation

Screenshot Documentation

Screenshot Documentation

Screenshot Documentation

How to Get This

Below, we list all commands that are required to build the code in this repository. To this end, we provide three CMake presets (debug, release, and release with debug information).

  • The debug preset creates a debug folder and uses the compiler flags -DDEBUG -O0 -g -ggdb -fsanitize=address.
  • The release preset creates a build folder and uses the compiler flags -DNDEBUG -march=native -O3.
  • The release with debug information preset creates a build_with_debug_info folder and uses the compiler flags -DDEBUG -g -march=native -O3.

Per default, we use the following compiler flags: -Wall -Wextra -Wpedantic -fdiagnostics-color=always.

Requirements

pasta::bit_vector is written in C++20 and requires a compiler that supports it. We use Ninja as build system. For more information on how to use this library, please refer to our documentation.

tl;dr

To just clone the source code, use the following.

git clone [email protected]:pasta-toolbox/bit_vector
cd bit_vector
git submodule update --init --recursive

If you also want to build the test, please continue with the following commands.

cmake --preset=[debug|build|relwithdeb]-DPASTA_BIT_VECTOR_BUILD_TESTS=On
cmake --build --preset=[debug|release|relwithdeb]
ctest --test-dir [debug|build|relwithdeb]