Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SYCL] Use L0 queries for L0 devices #2245

Draft
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

mgouicem
Copy link
Contributor

@mgouicem mgouicem commented Dec 9, 2024

This PR replaces OCL queries with matching L0 queries for L0 devices on Intel devices.
Few notable things:

  • Updated L0 headers to access some recent queries
  • Added headers for Intel specific L0 extensions. These headers come from compute-runtime. We asked about forward compatibility status of these symbols and are waiting for clarification (hence why this PR is in draft)
  • switched ngen to dynamically load L0

@github-actions github-actions bot added the platform:gpu-intel Codeowner: @oneapi-src/onednn-gpu-intel label Dec 9, 2024
@mgouicem mgouicem requested a review from a team December 9, 2024 10:48
@mgouicem
Copy link
Contributor Author

mgouicem commented Dec 9, 2024

make test
disable device_cpu
enable device_gpu
enable thr_sycl
enable thr_ocl

@mgouicem mgouicem force-pushed the mgouicem/main/l0_queries branch 2 times, most recently from e0dc26f to 2c29101 Compare December 9, 2024 12:35
@mgouicem mgouicem force-pushed the mgouicem/main/l0_queries branch from 2c29101 to 6ef7b93 Compare December 10, 2024 09:39
using zeInit_decl_t = ze_result_t (*)(ze_init_flags_t flags);
const ze_init_flags_t default_ze_flags = 0;
#if defined(__linux__)
static const ze_result_t ze_result = reinterpret_cast<zeInit_decl_t>(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can actually omit the calls to zeInit as the caller must invoke L0 prior to entering nGEN (to locate the device, create a context, etc.).

src/gpu/intel/sycl/l0/utils.cpp Outdated Show resolved Hide resolved
src/gpu/intel/sycl/l0/utils.cpp Outdated Show resolved Hide resolved
src/gpu/intel/sycl/l0/utils.cpp Outdated Show resolved Hide resolved
src/gpu/intel/sycl/l0/utils.cpp Outdated Show resolved Hide resolved
src/gpu/intel/sycl/l0/utils.cpp Outdated Show resolved Hide resolved
src/gpu/intel/sycl/l0/utils.cpp Outdated Show resolved Hide resolved
src/gpu/intel/sycl/l0/utils.cpp Outdated Show resolved Hide resolved
@mgouicem mgouicem force-pushed the mgouicem/main/l0_queries branch from 6ef7b93 to 916aff8 Compare December 11, 2024 12:41
@github-actions github-actions bot added the devops Github automation label Dec 11, 2024
@mgouicem mgouicem force-pushed the mgouicem/main/l0_queries branch from 6114957 to aa79d8d Compare December 11, 2024 13:01
@mgouicem mgouicem force-pushed the mgouicem/main/l0_queries branch from aa79d8d to 9c03e06 Compare December 11, 2024 13:26
= {ZE_STRUCTURE_INTEL_DEVICE_MODULE_DP_EXP_PROPERTIES};
deviceModProps.pNext = &deviceModPropsExt;

CHECK(func_zeDeviceGetModuleProperties(device, &deviceModProps));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have a driver support matrix? Users may have pretty old drivers (esp. on Linux), so I'm wondering if we need a fallback path here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For backward compatibility, we validate only last driver available at time of release (see README). In this particular case, this query appeared on Oct 31st 2023 (99abb40a)

For forward compatibility, I am working on getting clarification (and this PR will remain draft in the meantime).

Comment on lines +329 to +330
= ZE_DEVICE_FP_ATOMIC_EXT_FLAG_GLOBAL_ADD
| ZE_DEVICE_FP_ATOMIC_EXT_FLAG_LOCAL_ADD;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No GPUs currently have floating point atomic adds in local memory, so we just need to know about global memory FP atomic support:

Suggested change
= ZE_DEVICE_FP_ATOMIC_EXT_FLAG_GLOBAL_ADD
| ZE_DEVICE_FP_ATOMIC_EXT_FLAG_LOCAL_ADD;
= ZE_DEVICE_FP_ATOMIC_EXT_FLAG_GLOBAL_ADD;

I think a similar change makes sense for load/store and min/max.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I aligned with the native_extension definition that was used so far from OCL queries here (flag is set only if both local and global are available).

I am fine changing the semantic of native_extensions (it seems broken anyway as you mention). But that might have wider implications.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Then I think it makes sense to keep L0 and OCL aligned and change both in a future PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
devops Github automation platform:gpu-intel Codeowner: @oneapi-src/onednn-gpu-intel
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants