why use darknet train command still cause 'CUDA status Error' #7

ericosmic · 2021-04-23T03:04:30Z

I adapt tag: gpu-cv-cc80 as yolo train image,
I fisrt trid use train command :

but it seems just display model summary, and not start training.

Secondly, I tried to compile darknet file by make command , but it threw out error:

Finally , I also try use detect command , no exception , it threw a error:

So I think it is due to missing cuda file, but I think doesn't it should be a completely darknet environment ? Could you give some idea to solve this problem

daisukekobayashi · 2021-04-23T12:30:14Z

What GPU are you using?

The images in this repository were built using multi stage build to reduce image size. So the image doesn’t contain build environment, only contain runtime. So I think your second error is correct behavior.

In your third image, CUDA outputs message about compatibility. cc80 tag expects your GPU supports compute capability 8.0. Does your GPU satisfy it? You can check GPU’s compute capability in this page

ericosmic · 2021-04-24T01:36:36Z

Thanks for your reply. I'm using RTX 2080Ti , nvidia-smi display cuda version =11 , but no nvcc . If you mean these images only for runtime but no for training?

daisukekobayashi · 2021-04-24T04:50:40Z

RTX 2080 Ti ’s compute capability is 7.5 which is described GeForce and TITAN Products section in this page. So could you try gpu-cv-cc75?
I mean docker images in this repository only contain runtime for executing not for developing. So you can do everything which darknet supports like detection and training but can't build darknet executable inside docker image.

ericosmic · 2021-04-24T07:10:57Z

I had used darknet_yolo_v3-gpu-cv-cc75 image , But why it display 'gpu not used' and not start training process when I training

ericosmic · 2021-04-24T08:35:35Z

Now I can use darknet to test image in this image, but it still not allow to training . it will report segmentation fault , I tried change batch size and max_batch size still no work.

daisukekobayashi · 2021-04-24T10:37:25Z

I don't know what the problem is from your image.
Could you try darknet_yolo_v4_pre-gpu-cv-cc75? The darknet_yolo_v3 was released about 2 years ago, and at that time yolo v4 wasn't released.
And I guess your problem isn't docker problem, it's darknet problem. So I recommend that you check AlexeyAB/darknet:Issues too.

Nordes · 2022-05-09T21:03:42Z

I might be something stupid, but, who knows, it might be a reason.

I actually tried to use the libdarknet.so instead of using the executable darknet. While doing so, I have copied the libdarknet.so to the /user/local/lib and also copied the include/darknet.h to /user/local/include. When I tried to use the libdarknet.so it gave me an error with the CUDA lib.

See below:

#0 41.72 /usr/bin/ld: warning: libcuda.so.1, needed by /usr/local/lib/libdarknet.so, not found (try using -rpath or -rpath-link)
#0 41.72 /usr/bin/ld: /usr/local/lib/libdarknet.so: undefined reference to `cuCtxGetCurrent'
#0 41.72 collect2: error: ld returned 1 exit status

@daisukekobayashi : I was also on the gpu cc 75. I used your repo as a base, but modified the build & runtime to copy the libdarknet.so and darknet.h. And I added a runtime where the go could also build/run (just for the sake of testing basic code).

Maybe when we install the cuda libraries, the libs are not added in the user/local folders.

daisukekobayashi · 2022-05-11T09:49:26Z

@Nordes
This repository uses two stage build to reduce docker image size. First building darknet binary based on nvidia/cuda:11.2.2-cudnn8-devel-ubuntu20.04 and then packaging darknet based on nvidia/cuda:11.2.2-cudnn8-runtime-ubuntu20.04. So there are no cuda development library in this docker image.

I guess you want to build some program using libdarknet and cuda. So I think it's easy to use nvidia-cuda-devel image instead of my image. If you reuse my dockerfile, you should try removing stage two from dockerfile. You can use libdarknet.so based on cuda-devel based image.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

why use darknet train command still cause 'CUDA status Error' #7

why use darknet train command still cause 'CUDA status Error' #7

ericosmic commented Apr 23, 2021

daisukekobayashi commented Apr 23, 2021

ericosmic commented Apr 24, 2021

daisukekobayashi commented Apr 24, 2021

ericosmic commented Apr 24, 2021 •

edited

Loading

ericosmic commented Apr 24, 2021 •

edited

Loading

daisukekobayashi commented Apr 24, 2021

Nordes commented May 9, 2022

daisukekobayashi commented May 11, 2022

why use darknet train command still cause 'CUDA status Error' #7

why use darknet train command still cause 'CUDA status Error' #7

Comments

ericosmic commented Apr 23, 2021

daisukekobayashi commented Apr 23, 2021

ericosmic commented Apr 24, 2021

daisukekobayashi commented Apr 24, 2021

ericosmic commented Apr 24, 2021 • edited Loading

ericosmic commented Apr 24, 2021 • edited Loading

daisukekobayashi commented Apr 24, 2021

Nordes commented May 9, 2022

daisukekobayashi commented May 11, 2022

ericosmic commented Apr 24, 2021 •

edited

Loading

ericosmic commented Apr 24, 2021 •

edited

Loading