Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[External] Adding ankerl unordered_dense #12861

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

loumalouomega
Copy link
Member

📝 Description

Introduction

Adding ankerl unordered_dense, which provides top performance hashsed iterators: https://martin.ankerl.com/2022/08/27/hashmap-bench-01/. MIT license and header only.

Initial testing during efforts to modernize data_value_container, at the end current brute force solution for small number of variables is faster, but I found that ankerl` in my tests was the faster solution.

The using them to replace our current containers (not used anywhere).

Key Features

  1. Performance:

    • Optimized for fast lookups and insertions.
    • Minimizes memory usage by densely packing the data.
  2. Robin Hood Hashing:

    • Ensures even distribution of elements, reducing clustering.
    • Backward shift deletion minimizes gaps in the table when elements are removed.
  3. Template Customization:

    • Supports custom hash functions, key equality checks, allocators, and bucket types.
    • Provides both map (key-value pairs) and set (keys only) interfaces.
  4. Hashing Algorithm:

    • Based on wyhash, a fast and high-quality hashing algorithm.
    • Provides built-in and extensible hash function templates.
  5. API Compatibility:

    • Follows the conventions of the standard library's std::unordered_map and std::unordered_set.
    • Offers additional non-standard features like extract for moving data and replace for bulk updates.
  6. Exception Safety:

    • Designed with robust exception handling in mind.
    • Ensures no memory leaks or corruption even during exceptions.
  7. Modular and Extensible:

    • Offers segmentation options for memory management.
    • Integrates with polymorphic memory resources (PMR) for custom allocation strategies.
  8. C++17 and Higher:

    • Requires C++17 or newer due to the use of features like std::optional, std::tuple, and advanced template metaprogramming.

Main Components

  1. Hashing:

    • Customizable via the hash template, supporting standard types, strings, and custom objects.
    • Uses a combination of mixing and bit manipulation for uniform distribution.
  2. Buckets:

    • The Bucket structure stores metadata (distance and fingerprint) and an index into the value container.
  3. Data Storage:

    • Utilizes a segmented_vector or std::vector to store data contiguously.
    • The segmentation option (segmented_map or segmented_set) improves memory management for large datasets.
  4. Load Factor:

    • Maintains a default maximum load factor of 0.8, adjustable by the user.
    • Automatically grows the table to maintain performance.
  5. Transparent Lookup:

    • Supports heterogeneous lookups (e.g., std::string_view for std::string keys).
  6. Iterators:

    • Provides standard iterators for traversal.
    • Iterator invalidation rules are similar to std::unordered_map.

🆕 Changelog

Copy link
Contributor

@matekelemen matekelemen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea of finally using a hash table with better memory layout, but sorry in advance: I'm going to be a picky ass here. This is a very basic data structure and there's a mountain of options to choose from so I don't want us to make a poor decision.

I'd be wary of any advertising based on benchmarks someone does on their own libraries. I skimmed through the comparison you linked, and am missing some things:

  • I didn't find the source code of the tests
  • He didn't provide any scaling studies
  • lack of hardware diversity

I ask you to write benchmarks using google's library for a few hash map implementations and run it on some different hardware.

What to compare

Specifically I'd like the following implementations compared:

What to benchmark

As for what to benchmark, we're almost exclusively inserting/searching and practically never erasing anything from existing tables, do I'd like to see

  • an insertion benchmark
  • a search benchmark
    • with integer keys
    • with std::string keys. Specifically, longer ones that don't benefit from short-string-optimizations (make sure that
      sizeof(std::string) < key.size())
    • concurrent search with all physical cores participating

What's important is that you run this with different sizes so we can get an idea of how these operations scale.

Hardware

I'm interested in benchmarks running on

  • a decent desktop with an x86-based CPU
  • a NUMA cluster if you have access to one (if you don't I can run it on one)
  • some shitty laptop or a raspberry pi (optional, not super important but good to know because we have a lot of student users)
  • a Mac with an M-chip (optional, I'm just curious. I can run it on my machine if you don't have access to one)

I know this is a lot of work, but I think it's absolutely necessary for such a basic data structure.

If you are not familiar with google's benchmark framework, I can shoot you an example with std::unordered_map and you can build on top of that for the other implementations.

@loumalouomega
Copy link
Member Author

* [`tsl::robin_map`](https://github.com/Tessil/robin-map)

This one is super slow at least for moderate sizes in my own tests.

@loumalouomega
Copy link
Member Author

If you are not familiar with google's benchmark framework, I can shoot you an example with std::unordered_map and you can build on top of that for the other implementations.

I would like to have something standarized in Kratos instead of just hand made each time. It is not possible to reuse our GTest infrastructure?

@matekelemen
Copy link
Contributor

matekelemen commented Nov 21, 2024

* [`tsl::robin_map`](https://github.com/Tessil/robin-map)

This one is super slow at least for moderate sizes in my own tests.

Maybe, but that just highlights the dangers of taking devs benchmarks of their own libs at face value. The author of tsl::robin_map also did a benchmark that painted his own implementation in a rather flattering light.

@matekelemen
Copy link
Contributor

If you are not familiar with google's benchmark framework, I can shoot you an example with std::unordered_map and you can build on top of that for the other implementations.

I would like to have something standarized in Kratos instead of just hand made each time. It is not possible to reuse our GTest infrastructure?

I'm all for something standardized, but GTest is definitely not a benchmarking library. Google's framework is pretty simple and very popular, but I'm open to other suggestions.

@loumalouomega
Copy link
Member Author

* [`tsl::robin_map`](https://github.com/Tessil/robin-map)

This one is super slow at least for moderate sizes in my own tests.

Maybe, but that just highlights the dangers of taking devs benchmarks of their own libs at face value. The author of tsl::robin_map also did a benchmark that painted his own implementation in a rather flattering light.

Yes, in fcat this was the first one I tried and it was the slowest one of all I tried.

@loumalouomega
Copy link
Member Author

If you are not familiar with google's benchmark framework, I can shoot you an example with std::unordered_map and you can build on top of that for the other implementations.

I would like to have something standarized in Kratos instead of just hand made each time. It is not possible to reuse our GTest infrastructure?

I'm all for something standardized, but GTest is definitely not a benchmarking library. Google's framework is pretty simple and very popular, but I'm open to other suggestions.

Maybe we can at least add a cmake loop to compile the benchmarks ...

@matekelemen
Copy link
Contributor

Maybe we can at least add a cmake loop to compile the benchmarks ...

I'd put the benchmarks in a different repo, similar to how we deal with examples.

@loumalouomega
Copy link
Member Author

Maybe we can at least add a cmake loop to compile the benchmarks ...

I'd put the benchmarks in a different repo, similar to how we deal with examples.

Usually code of benchmakes is not very different from tests. Examples are huge in comparison.

@RiccardoRossi
Copy link
Member

Tonadd my two cents to Mate'@ comments:

  • memory occupation particularly for the dqtabase is of paramount importance. One simple way to improve efficiency is to decrease the loadfactor, but we cannot affor it in the database
  • portability
  • maintenance: is there a team behind the lib?

@loumalouomega
Copy link
Member Author

Merging master after #12867, I will write a benchmark...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: 👀 Next meeting TODO
Development

Successfully merging this pull request may close these issues.

3 participants