Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unify and simplify batch functionality: Multivector #1651

Merged
merged 6 commits into from
Aug 21, 2024
Merged

Conversation

pratikvn
Copy link
Member

@pratikvn pratikvn commented Jul 23, 2024

This PR is the first part of the batched kernel refactoring. It removes the .inc style files, moves to .hpp, making the functions available in a common namespace. It also uses a single source for both HIP and CUDA.

TODO

  • Check if there are any issues with solver includes.
  • Remove existing redundant .inc files
  • Differences in __launch_bounds__ between CUDA and HIP needs to be resolved

@pratikvn pratikvn added type:batched-functionality This is related to the batched functionality in Ginkgo 1:ST:need-feedback The PR is somewhat ready but feedback on a blocking topic is required before a proper review. labels Jul 23, 2024
@pratikvn pratikvn self-assigned this Jul 23, 2024
@pratikvn pratikvn marked this pull request as draft July 23, 2024 11:41
@ginkgo-bot ginkgo-bot added reg:build This is related to the build system. mod:cuda This is related to the CUDA module. mod:hip This is related to the HIP module. labels Jul 23, 2024
@MarcelKoch
Copy link
Member

If you haven't noted that already, there is a naming inconsistency of the single-batch matrix-apply kernels for the reference and cuda/hip backend.
In the reference backend those are called simple|advanced_apply_kernel, while for cuda/hip they are called simple|advanced_apply.
I think that should also be unified.

@pratikvn pratikvn marked this pull request as ready for review August 19, 2024 09:47
@pratikvn pratikvn changed the title WIP: Unify and simplify batch functionality Unify and simplify batch functionality: Multivector Aug 19, 2024
@pratikvn pratikvn added 1:ST:ready-for-review This PR is ready for review and removed 1:ST:need-feedback The PR is somewhat ready but feedback on a blocking topic is required before a proper review. labels Aug 19, 2024
@pratikvn pratikvn requested review from a team August 19, 2024 09:48
Copy link
Member

@upsj upsj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, I'm wondering if we can get rid of some of the GKO_DEVICE_NAMESPACE:: annotations though - is there an ambiguity issue when removing them?

@pratikvn
Copy link
Member Author

@MarcelKoch , yes, I plan to unify those when refactoring the matrix format kernels.

@@ -20,8 +58,7 @@ __device__ __forceinline__ void scale(


template <typename ValueType, typename Mapping>
__global__
__launch_bounds__(default_block_size, sm_oversubscription) void scale_kernel(
__global__ __launch_bounds__(default_block_size) void scale_kernel(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you have used 4 for sm_oversubscription on both cuda/hip.
I assume the cuda is the correct and hip just uses it.
if you want to compute in more accurate mapping, hip should use (min_blocks_multiprocessor (4) * max_threads_per_block (256) )/32 = 32 for hip.
you will need to distinguish it by macro

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I havent fully benchmarked that yet. I agree that this will have to be a macro specialized for CUDA and HIP. But will done in a future PR. It has already been noted in #1376

@pratikvn pratikvn added 1:ST:ready-to-merge This PR is ready to merge. and removed 1:ST:ready-for-review This PR is ready for review labels Aug 20, 2024
@pratikvn
Copy link
Member Author

format!

@pratikvn pratikvn added the 1:ST:no-changelog-entry Skip the wiki check for changelog update label Aug 20, 2024
@pratikvn pratikvn merged commit 83a577c into develop Aug 21, 2024
12 of 14 checks passed
@pratikvn pratikvn deleted the batch-unif branch August 21, 2024 11:01
MarcelKoch pushed a commit to MarcelKoch/ginkgo that referenced this pull request Dec 2, 2024
Unify and simplify batch functionality: Multivector

Related PR: ginkgo-project#1651
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
1:ST:no-changelog-entry Skip the wiki check for changelog update 1:ST:ready-to-merge This PR is ready to merge. mod:cuda This is related to the CUDA module. mod:hip This is related to the HIP module. reg:build This is related to the build system. type:batched-functionality This is related to the batched functionality in Ginkgo
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants