Replies: 1 comment
-
Note that this repo is no longer supported. Development and maintenance of Darknet/YOLO is now done at https://github.com/hank-ai/darknet While the build steps currently remain the same, there is work underway in one of the development branches to significantly modify how Darknet is built. I will see what we can do about toggling cuDNN half-width floats. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi all,
in the last weeks I tried to speed up training using compile flag CUDNN_HALF on GeForce RTX 2080 SUPER and GeForce RTX 3080 Ti. I did not observe any speed up. I may have missed a few percent, but the appox. time left shown in the window did not change significantly.
Also I was hoping to increase mini batch size because of saved memory, but again memory usage of dedicated GPU memory in Windows task manager did not show any improvements.
I did check the message at start up to see if "half" is used or not and if it is compiled into the exe or not.
It was surprisingly difficult to use the build.ps1 script to build without CUDNN_HALF, because there seems to be no option for that. The script does not complain about option "-EnableCUDNN_HALF", but it seems to be silently ignored and it just worked accidentally because I also tried to change the targeted architectures. Note that for reasons I compile on a PC with a rather old/weak GPU, not on a PC the training is done on. After a lot of try and error and wondering about the build logic it seems CUDNN_HALF is turned on/off based on the target CUDA architectures, and the CUDA architectures are derived from the current GPU. To change the architectures to something portable I edited the line
[string]$AdditionalBuildSetup = "-DCMAKE_CUDA_ARCHITECTURES=all-major"
So I would like to discuss if this is just me and what I could have done wrong? If CUDNN_HALF just reduces numeric accuracy without speed up, I feel CUDNN_HALF should be turned off by default and only be applied when it is explicitly and specifically requested for experimental use.
Let me finish with a BIG thank you to all developers of this project for all the good work on this repo. THANKS :-)
Beta Was this translation helpful? Give feedback.
All reactions