add `occupied(n)` for unordered set and map #427

tanzby · 2024-08-08T16:13:58Z

Add bool occupied(index n) function for unordered_set and unordered_map. So that we can:

struct AllocateNewBlocks {
  AllocateNewBlocks(stdgpu::unorderd_map<xx> block_map,
                    BlockBuffer buffer_buffer,
                    stdgpu::unorderd_set<xx> not_exist_block_indices)
      : buffer_buffer(buffer_buffer),
        block_map(block_map),
        not_exist_block_indices(not_exist_block_indices) {}

  __device__ void operator()(const stdgpu::index_t index) {
    if (!not_exist_block_indices.occupied(index)) { // Used here.
      return;
    }
    const BlockIndex block_index = *(not_exist_block_indices.begin() + index);
    if (const auto& [iter, is_inserted] = block_map.emplace(block_index, 0); is_inserted) {
      iter->second = buffer_view.AllocateBlock();
    }
  }

  BlockBuffer buffer_buffer;
  stdgpu::unorderd_map<xx> block_map;
  stdgpu::unorderd_set<xx> not_exist_block_indices;
}; 


stdgpu::for_each_index(thrust::cuda::par.on(stream()),
                       not_exist_block_indices().max_size(),
                       AllocateNewBlocks(block_map, block_buffer, not_exist_block_indices);

It skips the need for getting device_range.

tanzby · 2024-08-08T16:15:34Z

@stotko help to review this PR, thanks

codecov · 2024-08-08T16:31:06Z

Codecov Report

Attention: Patch coverage is 0% with 4 lines in your changes missing coverage. Please review.

Project coverage is 97.19%. Comparing base (3b7d712) to head (87fe7d7).
Report is 22 commits behind head on master.

Files with missing lines	Patch %	Lines
src/stdgpu/impl/unordered_map_detail.cuh	0.00%	2 Missing ⚠️
src/stdgpu/impl/unordered_set_detail.cuh	0.00%	2 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master     #427      +/-   ##
==========================================
- Coverage   97.34%   97.19%   -0.16%     
==========================================
  Files          32       32              
  Lines        2524     2528       +4     
==========================================
  Hits         2457     2457              
- Misses         67       71       +4

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚨 Try these New Features:

Flaky Tests Detection - Detect and resolve failed and flaky tests

stotko · 2024-08-12T13:18:24Z

Add bool occupied(index n) function for unordered_set and unordered_map. So that we can:

struct AllocateNewBlocks {
  AllocateNewBlocks(stdgpu::unorderd_map<xx> block_map,
                    BlockBuffer buffer_buffer,
                    stdgpu::unorderd_set<xx> not_exist_block_indices)
      : buffer_buffer(buffer_buffer),
        block_map(block_map),
        not_exist_block_indices(not_exist_block_indices) {}

  __device__ void operator()(const stdgpu::index_t index) {
    if (!not_exist_block_indices.occupied(index)) { // Used here.
      return;
    }
    const BlockIndex block_index = *(not_exist_block_indices.begin() + index);
    if (const auto& [iter, is_inserted] = block_map.emplace(block_index, 0); is_inserted) {
      iter->second = buffer_view.AllocateBlock();
    }
  }

  BlockBuffer buffer_buffer;
  stdgpu::unorderd_map<xx> block_map;
  stdgpu::unorderd_set<xx> not_exist_block_indices;
}; 


stdgpu::for_each_index(thrust::cuda::par.on(stream()),
                       not_exist_block_indices().max_size(),
                       AllocateNewBlocks(block_map, block_buffer, not_exist_block_indices);

It skips the need for getting device_range.

Thanks for working on this. However, I believe that exposing the occupied function is not the right way to move forward since this function really meant as an implementation detail of the base container. Even exposing begin(), for symmetry with end() (required for find()), already gives more access to the internals than typically needed.

While the use case you mentioned is fine, it may suffer from bad performance since the load factor of unordered_map/unordered_set is typically low and thus the container is only sparsely filled, which would lead to a low thread utilization in your kernel where many threads of a warp immediately return in the if statement. That is the reason for having device_range() as it allows to densely pack all occupied values.

As mentioned in #423, adding (host-only) overloads with an additional stream argument for the load() and store() function of atomic would be better as it addresses the actual underlying problem. More concretely, I see two options here:

Also implement a stream-aware host-to-device memcpy function: Clean, but not directly straightforward to do as the stream is a template class and the internal memory management system is intentionally strongly decoupled from the rest of the library.
Simulate the memcpy with a "no-op" transform_reduce_index to let thrust do the work for us: More like a workaround and might be inefficient in terms of performance.

stotko · 2024-11-20T10:55:35Z

The underlying issue has been fixed in #450.

add occupied(n) for unordered set and map

87fe7d7

stotko mentioned this pull request Oct 30, 2024

Expose occupied() in unordered map #436

Closed

stotko closed this Nov 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add `occupied(n)` for unordered set and map #427

add `occupied(n)` for unordered set and map #427

tanzby commented Aug 8, 2024 •

edited

Loading

tanzby commented Aug 8, 2024

codecov bot commented Aug 8, 2024 •

edited

Loading

stotko commented Aug 12, 2024

stotko commented Nov 20, 2024

add occupied(n) for unordered set and map #427

add occupied(n) for unordered set and map #427

Conversation

tanzby commented Aug 8, 2024 • edited Loading

tanzby commented Aug 8, 2024

codecov bot commented Aug 8, 2024 • edited Loading

Codecov Report

stotko commented Aug 12, 2024

stotko commented Nov 20, 2024

add `occupied(n)` for unordered set and map #427

add `occupied(n)` for unordered set and map #427

tanzby commented Aug 8, 2024 •

edited

Loading

codecov bot commented Aug 8, 2024 •

edited

Loading