[Kaldi/Triton] cudaError_t 701 : "too many resources requested for launch" returned from 'cudaGetLastError()' #779

basicasicmatrix · 2020-12-09T04:33:58Z

Related to Kaldi example - LibriSpeech Model
Core Dump using default configuration of 20.03 Kaldi and 20.03 Triton, as outlined here

Have tried on two separate systems, once with a 1080 and again with a P100. Have tried altering config.pbtxt with many variations, no change in behavior.

**ERROR ([5.5]:splice_features_batched():feature-online-batched-ivector-cuda-kernels.cu:223) cudaError_t 701 : "too many resources requested for launch" returned from 'cudaGetLastError()'**

[ Stack-Trace: ]
/opt/kaldi/src/lib/libkaldi-base.so(kaldi::MessageLogger::LogMessage() const+0xb42) [0x7fe9008ea652]
/workspace/model-repo/kaldi_online/1/libkaldi-trtisbackend.so(kaldi::MessageLogger::LogAndThrow::operator=(kaldi::MessageLogger const&)+0x2e) [0x7fe90c644952]
/opt/kaldi/src/lib/libkaldi-cudafeat.so(kaldi::splice_features_batched(int, int, int, int, float const*, int, int, float const*, int, int, float*, int, int, kaldi::LaneDesc const*, int)+0x1fb) [0x7fe8ea51f53c]
/opt/kaldi/src/lib/libkaldi-cudafeat.so(kaldi::BatchedIvectorExtractorCuda::SpliceFeats(kaldi::CuMatrixBase<float> const&, kaldi::CuMatrix<float> const&, kaldi::CuMatrix<float>*, kaldi::LaneDesc const*, int)+0x62) [0x7fe8ea51bea0]
/opt/kaldi/src/lib/libkaldi-cudafeat.so(kaldi::BatchedIvectorExtractorCuda::GetIvectors(kaldi::CuMatrixBase<float> const&, kaldi::CuVectorBase<float>*, kaldi::LaneDesc const*, int)+0x72) [0x7fe8ea51c996]
/opt/kaldi/src/lib/libkaldi-cudafeat.so(kaldi::OnlineBatchedFeaturePipelineCuda::ComputeFeaturesBatched(int, std::vector<int, std::allocator<int> > const&, std::vector<int, std::allocator<int> > const&, std::vector<bool, std::allocator<bool> > const&, std::vector<bool, std::allocator<bool> > const&, float, kaldi::CuMatrixBase<float> const&, kaldi::CuMatrix<float>*, kaldi::CuVector<float>*, std::vector<int, std::allocator<int> >*)+0x3c1) [0x7fe8ea521167]
/opt/kaldi/src/lib/libkaldi-cudadecoder.so(kaldi::cuda_decoder::BatchedThreadedNnet3CudaOnlinePipeline::ComputeGPUFeatureExtraction(std::vector<int, std::allocator<int> > const&, std::vector<kaldi::SubVector<float>, std::allocator<kaldi::SubVector<float> > > const&, std::vector<bool, std::allocator<bool> > const&, std::vector<bool, std::allocator<bool> > const&)+0x1ba) [0x7fe90c0ab31c]
/opt/kaldi/src/lib/libkaldi-cudadecoder.so(kaldi::cuda_decoder::BatchedThreadedNnet3CudaOnlinePipeline::DecodeBatch(std::vector<unsigned long, std::allocator<unsigned long> > const&, std::vector<kaldi::SubVector<float>, std::allocator<kaldi::SubVector<float> > > const&, std::vector<bool, std::allocator<bool> > const&, std::vector<bool, std::allocator<bool> > const&)+0xca) [0x7fe90c0ac8e0]
/workspace/model-repo/kaldi_online/1/libkaldi-trtisbackend.so(nvidia::inferenceserver::custom::kaldi_cbe::Context::FlushBatch()+0x74) [0x7fe90c642c00]
/workspace/model-repo/kaldi_online/1/libkaldi-trtisbackend.so(nvidia::inferenceserver::custom::kaldi_cbe::Context::Execute(unsigned int, custom_payload_struct*, bool (*)(void*, char const*, void const**, unsigned long*), bool (*)(void*, char const*, unsigned long, long*, unsigned long, void**))+0x3f0) [0x7fe90c642b22]
/workspace/model-repo/kaldi_online/1/libkaldi-trtisbackend.so(CustomExecute+0x4f) [0x7fe90c643db2]
/opt/tensorrtserver/bin/../lib/libtrtserver.so(+0x2ada7c) [0x7fea04c85a7c]
/opt/tensorrtserver/bin/../lib/libtrtserver.so(+0x94617) [0x7fea04a6c617]
/opt/tensorrtserver/bin/../lib/libtrtserver.so(+0x2a99f2) [0x7fea04c819f2]
/opt/tensorrtserver/bin/../lib/libtrtserver.so(+0xae071) [0x7fea04a86071]
/usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xbd66f) [0x7fea03ec466f]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x76db) [0x7fea047c06db]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x3f) [0x7fea0358188f]

terminate called after throwing an instance of 'kaldi::KaldiFatalError'
  what():  kaldi::KaldiFatalError
/opt/trtis-kaldi/nvidia_kaldi_trtis_entrypoint.sh: line 22:    18 Aborted                 (core dumped) /opt/tensorrtserver/nvidia_entrypoint.sh $@

To Reproduce
Steps to reproduce the behavior:

Install latest CUDA drivers(450.80.02)
Follow instructions line by line: https://developer.nvidia.com/blog/integrating-nvidia-triton-inference-server-with-kaldi-asr/
Core dump upon attempted inference with included client test (even one iteration, with many GBs of free memory on GPU)

Expected behavior

Inference results. No core dump.

Environment
Please provide at least:

Container version: 20.03
GPUs in the system: Tesla P100 16GB
CUDA driver version: 450.80.02

The text was updated successfully, but these errors were encountered:

basicasicmatrix · 2020-12-09T04:41:07Z

@nv-kkudrynski

basicasicmatrix · 2020-12-09T14:53:21Z

a2281e3

This commit works as intended (nvcr.io/nvidia/kaldi:19.12-online-beta)

gavinljj · 2020-12-31T03:46:14Z

I have some issues

basicasicmatrix added the bug Something isn't working label Dec 9, 2020

basicasicmatrix changed the title ~~[Kaldi] cudaError_t 701 : "too many resources requested for launch" returned from 'cudaGetLastError()'~~ [Kaldi/Triton] cudaError_t 701 : "too many resources requested for launch" returned from 'cudaGetLastError()' Dec 9, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Kaldi/Triton] cudaError_t 701 : "too many resources requested for launch" returned from 'cudaGetLastError()' #779

[Kaldi/Triton] cudaError_t 701 : "too many resources requested for launch" returned from 'cudaGetLastError()' #779

basicasicmatrix commented Dec 9, 2020

basicasicmatrix commented Dec 9, 2020

basicasicmatrix commented Dec 9, 2020

gavinljj commented Dec 31, 2020

[Kaldi/Triton] cudaError_t 701 : "too many resources requested for launch" returned from 'cudaGetLastError()' #779

[Kaldi/Triton] cudaError_t 701 : "too many resources requested for launch" returned from 'cudaGetLastError()' #779

Comments

basicasicmatrix commented Dec 9, 2020

basicasicmatrix commented Dec 9, 2020

basicasicmatrix commented Dec 9, 2020

gavinljj commented Dec 31, 2020