From 86ea895c6e680d03a59283b40ae614f16a1a10ae Mon Sep 17 00:00:00 2001 From: Zhuoran Zhao Date: Thu, 8 Feb 2024 02:01:36 -0800 Subject: [PATCH] Fix BF16 group_index_select_2d on AMD GPU (#2321) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Summary: Pull Request resolved: https://github.com/pytorch/FBGEMM/pull/2321 as title ``` [zhuoran@devgpu003.snc8 /data/users/zhuoran/fbsource/fbcode (7932bb4ab|remote/fbsource/stable...)]$ HIP_VISIBLE_DEVICES=7 numactl --cpunodebind=1 --membind=1 buck2 run mode/{opt,amd-gpu} -c fbcode.triton_backend=amd -c fbcode.enable_gpu_sections=true //hammer/modules/sequential/encoders/tests:hstu_bench -- --enable-multi-stream=true --enable_profiler=true --num-streams=3 --num-workers=3 Watchman fresh instance: new mergebase, cleared graph state, cleared dep files ⚠ Python 3.8 is EOL, and is going away by the end of H1 2024. Upgrade //caffe2/tools/setup_helpers:gen_version_header to Python 3.10 now to avoid breakages. https://fburl.com/py38-sunsetting ⚠ Python 3.8 is EOL, and is going away by the end of H1 2024. Upgrade //caffe2:substitute to Python 3.10 now to avoid breakages. https://fburl.com/py38-sunsetting ⚠ Python 3.8 is EOL, and is going away by the end of H1 2024. Upgrade //caffe2/tools/amd_build:build_amd to Python 3.10 now to avoid breakages. https://fburl.com/py38-sunsetting ⚠ Python 3.8 is EOL, and is going away by the end of H1 2024. Upgrade //caffe2/torchgen:gen to Python 3.10 now to avoid breakages. https://fburl.com/py38-sunsetting ⚠ Python 3.8 is EOL, and is going away by the end of H1 2024. Upgrade //caffe2/tools/setup_helpers:generate_code to Python 3.10 now to avoid breakages. https://fburl.com/py38-sunsetting Action failed: fbcode//deeplearning/fbgemm/fbgemm_gpu:sparse_ops_hip (hip_compile src/sparse_ops/sparse_group_index.hip (pic)) Remote command returned non-zero exit code 1 Reproduce locally: `frecli cas download-action f0569d85851723e287f08ed03c0bc831587c0a05f94c911fe0b204ddd7670d24:145` stdout: stderr: buck-out/v2/gen/fbcode/2ab98e452e15a67d/deeplearning/fbgemm/fbgemm_gpu/__sparse_ops_hip_hipify_gen__/out/src/sparse_ops/sparse_group_index.hip:11:10: fatal error: 'cuda_bf16.h' file not found #include ^~~~~~~~~~~~~ 1 error generated when compiling for gfx90a. ``` Reviewed By: nrsatish, sryap, htyu Differential Revision: D53549323 fbshipit-source-id: 73753c91cbb4c327ff6952bfa7d889ef02b8a31f --- fbgemm_gpu/src/sparse_ops/sparse_group_index.cu | 11 ++++++----- 1 file changed, 6 insertions(+), 5 deletions(-) diff --git a/fbgemm_gpu/src/sparse_ops/sparse_group_index.cu b/fbgemm_gpu/src/sparse_ops/sparse_group_index.cu index 353fa1eab..b8dd05529 100644 --- a/fbgemm_gpu/src/sparse_ops/sparse_group_index.cu +++ b/fbgemm_gpu/src/sparse_ops/sparse_group_index.cu @@ -6,12 +6,13 @@ * LICENSE file in the root directory of this source tree. */ -#ifdef USE_ROCM -#include -#else +#if (defined(USE_ROCM)) +#include +#elif ( \ + (defined(CUDA_VERSION) && CUDA_VERSION < 11000) || \ + (defined(__CUDA_ARCH__) && (__CUDA_ARCH__ < 800))) #include -#endif // USE_ROCM - +#endif #include "common.cuh" using Tensor = at::Tensor;