[SYCL][NVPTX] Emit reqd_work_group_size attributes as NVVM annotations #14502

frasercrmck · 2024-07-09T16:47:20Z

Only emit the provided values as annotations in the LLVM IR. The NVPTX backend will pad missing values with 1s. This suits the fact that the attribute must provide as many values as the dimensionality of the work-group, and we can assume that the work-group size of unused dimensions is 1.

elizabethandrews

The code looks ok to me but this attribute is handled differently for different backends now and I do not know if that is ok (#13600). @premanandrao @steffenlarsen please weigh in here.

frasercrmck · 2024-07-11T10:04:01Z

The code looks ok to me but this attribute is handled differently for different backends now and I do not know if that is ok (#13600). @premanandrao @steffenlarsen please weigh in here.

It's not ideal, no. I do think NVPTX.cpp is a good place to add NVPTX-specific metadata, and CodeGenFunction.cpp is a good place to add target-independent metadata.

We do need this NVPTX-specific lowering as the NVVM annotations are the only way the NVPTX backend is going to make use of the attributes - it's just ignored otherwise. I think the most regretful outcome if that we have the same information repeated in multiple different ways: see also how we are already handling [[intel::max_work_group_size]] attribute this way. For example, intel::max_work_group_size(4, 8, 16) gives us:

define  void @foo() max_work_group_size !28 {
   ...
}

!nvvm.annotations = !{19, !20, !21}

!19 = !{ptr @_ZTSZZ4mainENKUlRN4sycl3_V17handlerEE_clES2_E1C, !"maxntidx", i32 16}
!20 = !{ptr @_ZTSZZ4mainENKUlRN4sycl3_V17handlerEE_clES2_E1C, !"maxntidy", i32 8}
!21 = !{ptr @_ZTSZZ4mainENKUlRN4sycl3_V17handlerEE_clES2_E1C, !"maxntidz", i32 4}

!28 = !{i32 16, i32 8, i32 4}

Perhaps one way of making it nicer would be to lower from Attribute -> Function Metadata in CodeGenFunction.cpp, then lower the Function Metadata to NVVM annotations in NVPTX.cpp. That may give a clearer lowering path, separating out responsibilities.

frasercrmck · 2024-07-11T10:15:15Z

Sorry, accidentally deleted the branch

steffenlarsen

Great! I attempted to do this a long time ago (#3755), but there were problems due to us padding the dimensions of the attribute at the time. Lovely to see it finally happening. 😄

I am not too concerned about it taking a new path for NVPTX and generating special NVVM metadata. From what I remember, the current behavior for the attribute is heavily based on the SPIR-V translator consuming the metadata that the OpenCL-related attribute set a precedence for.

An alternative could be to have a late-stage LLVM transformation pass for consuming the "reqd_work_group_size" metadata and generating these NVVM attributes. A benefit of this is that we could use this for handling "sycl-work-group-size" produced by the SYCL compile-time property alternative for this attribute.

clang/lib/CodeGen/Targets/NVPTX.cpp

premanandrao · 2024-07-12T18:15:34Z

The code looks ok to me but this attribute is handled differently for different backends now and I do not know if that is ok (#13600). @premanandrao @steffenlarsen please weigh in here.

It's not ideal, no. I do think NVPTX.cpp is a good place to add NVPTX-specific metadata, and CodeGenFunction.cpp is a good place to add target-independent metadata.

We do need this NVPTX-specific lowering as the NVVM annotations are the only way the NVPTX backend is going to make use of the attributes - it's just ignored otherwise. I think the most regretful outcome if that we have the same information repeated in multiple different ways: see also how we are already handling [[intel::max_work_group_size]] attribute this way. For example, intel::max_work_group_size(4, 8, 16) gives us:
define  void @foo() max_work_group_size !28 {
   ...
}

!nvvm.annotations = !{19, !20, !21}

!19 = !{ptr @_ZTSZZ4mainENKUlRN4sycl3_V17handlerEE_clES2_E1C, !"maxntidx", i32 16}
!20 = !{ptr @_ZTSZZ4mainENKUlRN4sycl3_V17handlerEE_clES2_E1C, !"maxntidy", i32 8}
!21 = !{ptr @_ZTSZZ4mainENKUlRN4sycl3_V17handlerEE_clES2_E1C, !"maxntidz", i32 4}

!28 = !{i32 16, i32 8, i32 4}
Perhaps one way of making it nicer would be to lower from Attribute -> Function Metadata in CodeGenFunction.cpp, then lower the Function Metadata to NVVM annotations in NVPTX.cpp. That may give a clearer lowering path, separating out responsibilities.

Thanks for the explanation. I don't have a strong opinion on changing what you have, so I will approve this. But @elizabethandrews might, but she is out until at least Monday.

frasercrmck · 2024-07-15T09:14:04Z

Great! I attempted to do this a long time ago (#3755), but there were problems due to us padding the dimensions of the attribute at the time. Lovely to see it finally happening. 😄

I am not too concerned about it taking a new path for NVPTX and generating special NVVM metadata. From what I remember, the current behavior for the attribute is heavily based on the SPIR-V translator consuming the metadata that the OpenCL-related attribute set a precedence for.

An alternative could be to have a late-stage LLVM transformation pass for consuming the "reqd_work_group_size" metadata and generating these NVVM attributes. A benefit of this is that we could use this for handling "sycl-work-group-size" produced by the SYCL compile-time property alternative for this attribute.

No problem! I wasn't aware you'd already tried it, sorry.

Yeah I agree the current lowering for NVPTX (and I'm guessing AMDGPU) leaves something to be desired. I think I'll look into this in a subsequent patch, if that's alright?

steffenlarsen · 2024-07-15T09:46:56Z

No problem! I wasn't aware you'd already tried it, sorry.

Don't be sorry! It was just to say that I think this is the right direction and it's great to see it actually happening. 😄

Yeah I agree the current lowering for NVPTX (and I'm guessing AMDGPU) leaves something to be desired. I think I'll look into this in a subsequent patch, if that's alright?

You could, but I figure it will involve moving the transformation done here to a later stage to handle both the property and the attribute the same. I don't see a problem in that though, but I'll let @elizabethandrews and @premanandrao have the final say.

elizabethandrews

Sorry for the delay. I have been battling covid and working reduced hours. This PR LGTM.

My uneasiness has more to do with these attributes in general and not your implementation.

smanna12

LGTM. Thanks

frasercrmck · 2024-07-16T06:36:47Z

This PR looks ready to merge, @intel/llvm-gatekeepers - thanks!

intel#14502) Only emit the provided values as annotations in the LLVM IR. The NVPTX backend will pad missing values with 1s. This suits the fact that the attribute must provide as many values as the dimensionality of the work-group, and we can assume that the work-group size of unused dimensions is 1.

frasercrmck requested a review from a team as a code owner July 9, 2024 16:47

frasercrmck temporarily deployed to WindowsCILock July 9, 2024 16:57 — with GitHub Actions Inactive

frasercrmck temporarily deployed to WindowsCILock July 9, 2024 20:10 — with GitHub Actions Inactive

frasercrmck added 2 commits July 10, 2024 12:33

fix excessively large work-group sizes

22ff4c8

Merge remote-tracking branch 'origin/sycl' into sycl-nvptx-reqd-wg-size

ea4e4c7

frasercrmck had a problem deploying to WindowsCILock July 10, 2024 11:34 — with GitHub Actions Error

frasercrmck temporarily deployed to WindowsCILock July 10, 2024 11:36 — with GitHub Actions Inactive

frasercrmck temporarily deployed to WindowsCILock July 10, 2024 12:54 — with GitHub Actions Inactive

elizabethandrews reviewed Jul 10, 2024

View reviewed changes

frasercrmck closed this Jul 11, 2024

frasercrmck deleted the sycl-nvptx-reqd-wg-size branch July 11, 2024 10:11

frasercrmck restored the sycl-nvptx-reqd-wg-size branch July 11, 2024 10:14

frasercrmck reopened this Jul 11, 2024

frasercrmck temporarily deployed to WindowsCILock July 11, 2024 10:16 — with GitHub Actions Inactive

frasercrmck temporarily deployed to WindowsCILock July 11, 2024 11:12 — with GitHub Actions Inactive

steffenlarsen reviewed Jul 11, 2024

View reviewed changes

clang/lib/CodeGen/Targets/NVPTX.cpp Show resolved Hide resolved

premanandrao approved these changes Jul 12, 2024

View reviewed changes

frasercrmck mentioned this pull request Jul 15, 2024

[SYCL] Add max work-group size kernel properties #14518

Open

elizabethandrews approved these changes Jul 15, 2024

View reviewed changes

smanna12 approved these changes Jul 15, 2024

View reviewed changes

sommerlukas merged commit fe18590 into intel:sycl Jul 16, 2024
27 checks passed

frasercrmck deleted the sycl-nvptx-reqd-wg-size branch July 16, 2024 06:43

frasercrmck mentioned this pull request Jul 17, 2024

[SYCL] Add kernel properties for three function attributes #14448

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SYCL][NVPTX] Emit reqd_work_group_size attributes as NVVM annotations #14502

[SYCL][NVPTX] Emit reqd_work_group_size attributes as NVVM annotations #14502

frasercrmck commented Jul 9, 2024

elizabethandrews left a comment

frasercrmck commented Jul 11, 2024

frasercrmck commented Jul 11, 2024

steffenlarsen left a comment

premanandrao commented Jul 12, 2024

frasercrmck commented Jul 15, 2024

steffenlarsen commented Jul 15, 2024

elizabethandrews left a comment

smanna12 left a comment

frasercrmck commented Jul 16, 2024

[SYCL][NVPTX] Emit reqd_work_group_size attributes as NVVM annotations #14502

[SYCL][NVPTX] Emit reqd_work_group_size attributes as NVVM annotations #14502

Conversation

frasercrmck commented Jul 9, 2024

elizabethandrews left a comment

Choose a reason for hiding this comment

frasercrmck commented Jul 11, 2024

frasercrmck commented Jul 11, 2024

steffenlarsen left a comment

Choose a reason for hiding this comment

premanandrao commented Jul 12, 2024

frasercrmck commented Jul 15, 2024

steffenlarsen commented Jul 15, 2024

elizabethandrews left a comment

Choose a reason for hiding this comment

smanna12 left a comment

Choose a reason for hiding this comment

frasercrmck commented Jul 16, 2024