Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

why use darknet train command still cause 'CUDA status Error' #7

Open
ericosmic opened this issue Apr 23, 2021 · 8 comments
Open

why use darknet train command still cause 'CUDA status Error' #7

ericosmic opened this issue Apr 23, 2021 · 8 comments

Comments

@ericosmic
Copy link

I adapt tag: gpu-cv-cc80 as yolo train image,
I fisrt trid use train command :
image
but it seems just display model summary, and not start training.
image
Secondly, I tried to compile darknet file by make command , but it threw out error:

image

Finally , I also try use detect command , no exception , it threw a error:

image

So I think it is due to missing cuda file, but I think doesn't it should be a completely darknet environment ? Could you give some idea to solve this problem

@daisukekobayashi
Copy link
Owner

What GPU are you using?

The images in this repository were built using multi stage build to reduce image size. So the image doesn’t contain build environment, only contain runtime. So I think your second error is correct behavior.

In your third image, CUDA outputs message about compatibility. cc80 tag expects your GPU supports compute capability 8.0. Does your GPU satisfy it? You can check GPU’s compute capability in this page

@ericosmic
Copy link
Author

Thanks for your reply. I'm using RTX 2080Ti , nvidia-smi display cuda version =11 , but no nvcc . If you mean these images only for runtime but no for training?

@daisukekobayashi
Copy link
Owner

RTX 2080 Ti ’s compute capability is 7.5 which is described GeForce and TITAN Products section in this page. So could you try gpu-cv-cc75?
I mean docker images in this repository only contain runtime for executing not for developing. So you can do everything which darknet supports like detection and training but can't build darknet executable inside docker image.

@ericosmic
Copy link
Author

ericosmic commented Apr 24, 2021

I had used darknet_yolo_v3-gpu-cv-cc75 image , But why it display 'gpu not used' and not start training process when I training
image

@ericosmic
Copy link
Author

ericosmic commented Apr 24, 2021

Now I can use darknet to test image in this image, but it still not allow to training . it will report segmentation fault , I tried change batch size and max_batch size still no work.
image

@daisukekobayashi
Copy link
Owner

I don't know what the problem is from your image.
Could you try darknet_yolo_v4_pre-gpu-cv-cc75? The darknet_yolo_v3 was released about 2 years ago, and at that time yolo v4 wasn't released.
And I guess your problem isn't docker problem, it's darknet problem. So I recommend that you check AlexeyAB/darknet:Issues too.

@Nordes
Copy link

Nordes commented May 9, 2022

I might be something stupid, but, who knows, it might be a reason.

I actually tried to use the libdarknet.so instead of using the executable darknet. While doing so, I have copied the libdarknet.so to the /user/local/lib and also copied the include/darknet.h to /user/local/include. When I tried to use the libdarknet.so it gave me an error with the CUDA lib.

See below:

#0 41.72 /usr/bin/ld: warning: libcuda.so.1, needed by /usr/local/lib/libdarknet.so, not found (try using -rpath or -rpath-link)
#0 41.72 /usr/bin/ld: /usr/local/lib/libdarknet.so: undefined reference to `cuCtxGetCurrent'
#0 41.72 collect2: error: ld returned 1 exit status

@daisukekobayashi : I was also on the gpu cc 75. I used your repo as a base, but modified the build & runtime to copy the libdarknet.so and darknet.h. And I added a runtime where the go could also build/run (just for the sake of testing basic code).

Maybe when we install the cuda libraries, the libs are not added in the user/local folders.

@daisukekobayashi
Copy link
Owner

@Nordes
This repository uses two stage build to reduce docker image size. First building darknet binary based on nvidia/cuda:11.2.2-cudnn8-devel-ubuntu20.04 and then packaging darknet based on nvidia/cuda:11.2.2-cudnn8-runtime-ubuntu20.04. So there are no cuda development library in this docker image.

I guess you want to build some program using libdarknet and cuda. So I think it's easy to use nvidia-cuda-devel image instead of my image. If you reuse my dockerfile, you should try removing stage two from dockerfile. You can use libdarknet.so based on cuda-devel based image.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants