Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

'ptxas' died due to signal 11 (Invalid memory reference) #322

Open
2 of 4 tasks
Semihal opened this issue Jul 3, 2024 · 3 comments
Open
2 of 4 tasks

'ptxas' died due to signal 11 (Invalid memory reference) #322

Semihal opened this issue Jul 3, 2024 · 3 comments

Comments

@Semihal
Copy link

Semihal commented Jul 3, 2024

System Info

Version: v.1.4.0
Cargo version: cargo 1.79.0 (ffa9cf99a 2024-06-03)
GCC version: 11.4.1
GPU: Compile with CUDA_COMPUTE_CAP=86 on machine without GPU (but with CUDA 12.1).
I plan to use this container with A40, but I don't have a GPU to build it.

Information

  • Docker
  • The CLI directly

Tasks

  • An officially supported command
  • My own modifications

Reproduction

I start this script:

export CUDA_COMPUTE_CAP=86
export CUDA_HOME=/usr/local/cuda-12.1
export PATH=${PATH}:/usr/local/cuda-12.1/bin
# Limit parallelism
export CARGO_BUILD_JOBS=1
export RAYON_NUM_THREADS=1
export CARGO_BUILD_INCREMENTAL=true


cd /usr/src/text-embeddings-inference || true

nvprune \
  --generate-code code=sm_80 \
  --generate-code code=sm_${CUDA_COMPUTE_CAP} \
  /usr/local/cuda/lib64/libcublas_static.a -o /usr/local/cuda/lib64/libcublas_static.a

cargo chef cook --release \
  --features candle-cuda \
  --features static-linking \
  --no-default-features \
  --recipe-path recipe.json && \
   sccache -s

I get this error:

[18:29:50] :	 [Step 1/2]  [0m [91merror: failed to run custom build command for `candle-flash-attn v0.5.0 (https://github.com/OlivierDehaene/candle?rev=33b7ecf9ed82bb7c20f1a94555218fabfbaa2fe3#33b7ecf9)`
[18:29:50] :	 [Step 1/2] 
[18:29:50] :	 [Step 1/2] Caused by:
[18:29:50] :	 [Step 1/2]  [0m [91m  process didn't exit successfully: `/usr/src/text-embeddings-inference/target/release/build/candle-flash-attn-67bc68aa050514c7/build-script-build` (exit status: 101)
[18:29:50] :	 [Step 1/2]   --- stdout
[18:29:50] :	 [Step 1/2]   cargo:rerun-if-changed=build.rs
[18:29:50] :	 [Step 1/2]   cargo:rerun-if-changed=kernels/flash_api.cu
[18:29:50] :	 [Step 1/2]   cargo:rerun-if-changed=kernels/flash_fwd_hdim128_fp16_sm80.cu
[18:29:50] :	 [Step 1/2]   cargo:rerun-if-changed=kernels/flash_fwd_hdim160_fp16_sm80.cu
[18:29:50] :	 [Step 1/2]   cargo:rerun-if-changed=kernels/flash_fwd_hdim192_fp16_sm80.cu
[18:29:50] :	 [Step 1/2]   cargo:rerun-if-changed=kernels/flash_fwd_hdim224_fp16_sm80.cu
[18:29:50] :	 [Step 1/2]   cargo:rerun-if-changed=kernels/flash_fwd_hdim256_fp16_sm80.cu
[18:29:50] :	 [Step 1/2]   cargo:rerun-if-changed=kernels/flash_fwd_hdim32_fp16_sm80.cu
[18:29:50] :	 [Step 1/2]   cargo:rerun-if-changed=kernels/flash_fwd_hdim64_fp16_sm80.cu
[18:29:50] :	 [Step 1/2]   cargo:rerun-if-changed=kernels/flash_fwd_hdim96_fp16_sm80.cu
[18:29:50] :	 [Step 1/2]   cargo:rerun-if-changed=kernels/flash_fwd_hdim128_bf16_sm80.cu
[18:29:50] :	 [Step 1/2]   cargo:rerun-if-changed=kernels/flash_fwd_hdim160_bf16_sm80.cu
[18:29:50] :	 [Step 1/2]   cargo:rerun-if-changed=kernels/flash_fwd_hdim192_bf16_sm80.cu
[18:29:50] :	 [Step 1/2]   cargo:rerun-if-changed=kernels/flash_fwd_hdim224_bf16_sm80.cu
[18:29:50] :	 [Step 1/2]   cargo:rerun-if-changed=kernels/flash_fwd_hdim256_bf16_sm80.cu
[18:29:50] :	 [Step 1/2]   cargo:rerun-if-changed=kernels/flash_fwd_hdim32_bf16_sm80.cu
[18:29:50] :	 [Step 1/2]   cargo:rerun-if-changed=kernels/flash_fwd_hdim64_bf16_sm80.cu
[18:29:50] :	 [Step 1/2]   cargo:rerun-if-changed=kernels/flash_fwd_hdim96_bf16_sm80.cu
[18:29:50] :	 [Step 1/2]   cargo:rerun-if-changed=kernels/flash_fwd_kernel.h
[18:29:50] :	 [Step 1/2]   cargo:rerun-if-changed=kernels/flash_fwd_launch_template.h
[18:29:50] :	 [Step 1/2]   cargo:rerun-if-changed=kernels/flash.h
[18:29:50] :	 [Step 1/2]   cargo:rerun-if-changed=kernels/philox.cuh
[18:29:50] :	 [Step 1/2]   cargo:rerun-if-changed=kernels/softmax.h
[18:29:50] :	 [Step 1/2]   cargo:rerun-if-changed=kernels/utils.h
[18:29:50] :	 [Step 1/2]   cargo:rerun-if-changed=kernels/kernel_traits.h
[18:29:50] :	 [Step 1/2]   cargo:rerun-if-changed=kernels/block_info.h
[18:29:50] :	 [Step 1/2]   cargo:rerun-if-changed=kernels/static_switch.h
[18:29:50] :	 [Step 1/2]   cargo:info=["/usr", "/usr/local/cuda", "/opt/cuda", "/usr/lib/cuda", "C:/Program Files/NVIDIA GPU Computing Toolkit", "C:/CUDA"]
[18:29:50] :	 [Step 1/2]   cargo:rerun-if-env-changed=CUDA_COMPUTE_CAP
[18:29:50] :	 [Step 1/2]   cargo:rustc-env=CUDA_COMPUTE_CAP=86
[18:29:50] :	 [Step 1/2] 
[18:29:50] :	 [Step 1/2]   --- stderr

[....]

[18:29:50] :	 [Step 1/2]   #$ CUDAFE_FLAGS=
[18:29:50] :	 [Step 1/2]   #$ PTXAS_FLAGS=
[18:29:50] :	 [Step 1/2]   #$ gcc -std=c++17 -D__CUDA_ARCH_LIST__=860 -E -x c++ -D__CUDACC__ -D__NVCC__ -D__CUDACC_EXTENDED_LAMBDA__ -D__CUDACC_RELAXED_CONSTEXPR__  -O3 -I"cutlass/include" "-I/usr/local/cuda-12.1/bin/../targets/x86_64-linux/include"    -U "__CUDA_NO_HALF_OPERATORS__" -U "__CUDA_NO_HALF_CONVERSIONS__" -U "__CUDA_NO_HALF2_OPERATORS__" -U "__CUDA_NO_BFLOAT16_CONVERSIONS__" -D__CUDACC_VER_MAJOR__=12 -D__CUDACC_VER_MINOR__=1 -D__CUDACC_VER_BUILD__=105 -D__CUDA_API_VER_MAJOR__=12 -D__CUDA_API_VER_MINOR__=1 -DCUDA_API_PER_THREAD_DEFAULT_STREAM=1 -D__NVCC_DIAG_PRAGMA_SUPPORT__=1 -include "cuda_runtime.h" -m64 "kernels/flash_fwd_hdim32_bf16_sm80.cu" -o "/tmp/tmpxft_000017c2_00000000-5_flash_fwd_hdim32_bf16_sm80.cpp4.ii" 
[18:29:50] :	 [Step 1/2]   #$ cudafe++ --c++17 --gnu_version=110401 --display_error_number --orig_src_file_name "kernels/flash_fwd_hdim32_bf16_sm80.cu" --orig_src_path_name "/root/.cargo/git/checkouts/candle-2c6db576e0f06e81/33b7ecf/candle-flash-attn/kernels/flash_fwd_hdim32_bf16_sm80.cu" --allow_managed --extended-lambda --relaxed_constexpr  --m64 --parse_templates --gen_c_file_name "/tmp/tmpxft_000017c2_00000000-6_flash_fwd_hdim32_bf16_sm80.cudafe1.cpp" --stub_file_name "tmpxft_000017c2_00000000-6_flash_fwd_hdim32_bf16_sm80.cudafe1.stub.c" --gen_module_id_file --module_id_file_name "/tmp/tmpxft_000017c2_00000000-4_flash_fwd_hdim32_bf16_sm80.module_id" "/tmp/tmpxft_000017c2_00000000-5_flash_fwd_hdim32_bf16_sm80.cpp4.ii" 
[18:29:50] :	 [Step 1/2]   #$ gcc -std=c++17 -D__CUDA_ARCH__=860 -D__CUDA_ARCH_LIST__=860 -E -x c++  -DCUDA_DOUBLE_MATH_FUNCTIONS -D__CUDACC__ -D__NVCC__ -D__CUDACC_EXTENDED_LAMBDA__ -D__CUDACC_RELAXED_CONSTEXPR__  -O3 -I"cutlass/include" "-I/usr/local/cuda-12.1/bin/../targets/x86_64-linux/include"    -U "__CUDA_NO_HALF_OPERATORS__" -U "__CUDA_NO_HALF_CONVERSIONS__" -U "__CUDA_NO_HALF2_OPERATORS__" -U "__CUDA_NO_BFLOAT16_CONVERSIONS__" -D__CUDACC_VER_MAJOR__=12 -D__CUDACC_VER_MINOR__=1 -D__CUDACC_VER_BUILD__=105 -D__CUDA_API_VER_MAJOR__=12 -D__CUDA_API_VER_MINOR__=1 -DCUDA_API_PER_THREAD_DEFAULT_STREAM=1 -D__NVCC_DIAG_PRAGMA_SUPPORT__=1 -include "cuda_runtime.h" -m64 "kernels/flash_fwd_hdim32_bf16_sm80.cu" -o "/tmp/tmpxft_000017c2_00000000-7_flash_fwd_hdim32_bf16_sm80.cpp1.ii" 
[18:29:50] :	 [Step 1/2]   #$ cicc --c++17 --gnu_version=110401 --display_error_number --orig_src_file_name "kernels/flash_fwd_hdim32_bf16_sm80.cu" --orig_src_path_name "/root/.cargo/git/checkouts/candle-2c6db576e0f06e81/33b7ecf/candle-flash-attn/kernels/flash_fwd_hdim32_bf16_sm80.cu" --allow_managed --extended-lambda --relaxed_constexpr   -arch compute_86 -m64 --no-version-ident -ftz=1 -prec_div=0 -prec_sqrt=0 -fmad=1 -fast-math --gen_div_approx_ftz --include_file_name "tmpxft_000017c2_00000000-3_flash_fwd_hdim32_bf16_sm80.fatbin.c" -tused --module_id_file_name "/tmp/tmpxft_000017c2_00000000-4_flash_fwd_hdim32_bf16_sm80.module_id" --gen_c_file_name "/tmp/tmpxft_000017c2_00000000-6_flash_fwd_hdim32_bf16_sm80.cudafe1.c" --stub_file_name "/tmp/tmpxft_000017c2_00000000-6_flash_fwd_hdim32_bf16_sm80.cudafe1.stub.c" --gen_device_file_name "/tmp/tmpxft_000017c2_00000000-6_flash_fwd_hdim32_bf16_sm80.cudafe1.gpu"  "/tmp/tmpxft_000017c2_00000000-7_flash_fwd_hdim32_bf16_sm80.cpp1.ii" -o "/tmp/tmpxft_000017c2_00000000-6_flash_fwd_hdim32_bf16_sm80.ptx"
[18:29:50] :	 [Step 1/2]   #$ ptxas -arch=sm_86 -m64  "/tmp/tmpxft_000017c2_00000000-6_flash_fwd_hdim32_bf16_sm80.ptx"  -o "/tmp/tmpxft_000017c2_00000000-8_flash_fwd_hdim32_bf16_sm80.sm_86.cubin" 
[18:29:50] :	 [Step 1/2]   nvcc error   : 'ptxas' died due to signal 11 (Invalid memory reference)
[18:29:50] :	 [Step 1/2]   nvcc error   : 'ptxas' core dumped
[18:29:50] :	 [Step 1/2]   # --error 0x8b --
[18:29:50] :	 [Step 1/2]   thread '<unnamed>' panicked at /root/.cargo/registry/src/index.crates.io-6f17d22bba15001f/bindgen_cuda-0.1.5/src/lib.rs:262:21:
[18:29:50] :	 [Step 1/2]   nvcc error while executing compiling: "nvcc" "--gpu-architecture=sm_86" "-c" "-o" "/usr/src/text-embeddings-inference/target/release/build/candle-flash-attn-6656f6d321f9dddf/out/flash_fwd_hdim32_bf16_sm80-aca7d8fdce93ef53.o" "--default-stream" "per-thread" "-std=c++17" "-O3" "-U__CUDA_NO_HALF_OPERATORS__" "-U__CUDA_NO_HALF_CONVERSIONS__" "-U__CUDA_NO_HALF2_OPERATORS__" "-U__CUDA_NO_BFLOAT16_CONVERSIONS__" "-Icutlass/include" "--expt-relaxed-constexpr" "--expt-extended-lambda" "--use_fast_math" "--verbose" "kernels/flash_fwd_hdim32_bf16_sm80.cu"
[18:29:50] :	 [Step 1/2] 
[18:29:50] :	 [Step 1/2]   # stdout
[18:29:50] :	 [Step 1/2] 
[18:29:50] :	 [Step 1/2] 
[18:29:50] :	 [Step 1/2]   # stderr
[18:29:50] :	 [Step 1/2] 
[18:29:50] :	 [Step 1/2]   note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
[18:29:51] :	 [Step 1/2]  [0m [91mthread 'main' panicked at /root/.cargo/registry/src/index.crates.io-6f17d22bba15001f/cargo-chef-0.1.67/src/recipe.rs:218:27:
[18:29:51] :	 [Step 1/2] Exited with status code: 101
[18:29:51] :	 [Step 1/2] note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
[18:29:59]W:	 [Step 1/2] The command '/bin/sh -c docker/build' returned a non-zero code: 101

Expected behavior

TEI compiled.

@OlivierDehaene
Copy link
Member

I plan to use this container

I'm confused, do you want a container or a binary?
If you want a container why not use the official one or the official command?

@Semihal
Copy link
Author

Semihal commented Jul 3, 2024

I'm confused, do you want a container or a binary?

I want to install TEI in a container image for future use.

If you want a container why not use the official one or the official command?

These are the instructions from the official Docker.

@Semihal
Copy link
Author

Semihal commented Jul 4, 2024

For clarity. The executable code looks exactly like this (from the official Docker image):

export CUDA_COMPUTE_CAP=86
export CUDA_HOME=/usr/local/cuda-12.1
export PATH=${PATH}:/usr/local/cuda-12.1/bin
# Limit parallelism
export CARGO_BUILD_JOBS=1
export RAYON_NUM_THREADS=1
export CARGO_BUILD_INCREMENTAL=true

if [ ${CUDA_COMPUTE_CAP} -ge 75 -a ${CUDA_COMPUTE_CAP} -lt 80 ];
then
    nvprune \
      --generate-code code=sm_${CUDA_COMPUTE_CAP} \
      /usr/local/cuda/lib64/libcublas_static.a -o /usr/local/cuda/lib64/libcublas_static.a;
elif [ ${CUDA_COMPUTE_CAP} -ge 80 -a ${CUDA_COMPUTE_CAP} -lt 90 ];
then
    nvprune \
      --generate-code code=sm_80 \
      --generate-code code=sm_${CUDA_COMPUTE_CAP} \
      /usr/local/cuda/lib64/libcublas_static.a -o /usr/local/cuda/lib64/libcublas_static.a;
elif [ ${CUDA_COMPUTE_CAP} -eq 90 ];
then
    nvprune \
      --generate-code code=sm_90 \
      /usr/local/cuda/lib64/libcublas_static.a -o /usr/local/cuda/lib64/libcublas_static.a;
else
    echo "cuda compute cap ${CUDA_COMPUTE_CAP} is not supported"; exit 1;
fi;

if [ ${CUDA_COMPUTE_CAP} -ge 75 -a ${CUDA_COMPUTE_CAP} -lt 80 ];
then
    cargo chef cook --release \
      --features candle-cuda-turing \
      --features static-linking \
      --no-default-features \
      --recipe-path recipe.json && \
      sccache -s;
else
    cargo chef cook --release \
      --features candle-cuda \
      --features static-linking \
      --no-default-features \
      --recipe-path recipe.json && \
      sccache -s;
fi;

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants