-
Notifications
You must be signed in to change notification settings - Fork 81
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FER not working with GPU #30
Comments
Thanks for reporting the issue. I suspect possible compatibility issues between the OS and CUDA/cudnn versions: I ran the OS: Windows 10 The script ran without issues as seen below;
May I know your
Please try to upgrade to Cudnn==8.2 and let us know if the error persists |
It ran without issues as you said but does not appear to be using GPUs,
which is the issue reported:
Could not load dynamic library 'cudnn64_8.dll';
…On Sat 28. Aug 2021 at 14:36 Saranraj Nambusubramaniyan < ***@***.***> wrote:
Thanks for reporting the issue.
I suspect possible compatibility issues between the OS and CUDA/cudnn
versions:
I ran the example.py under the env:
OS: Windows 10
Python : 3.6
TF:2.4
CUDA:10.2 with 11.0 dll
Cudnn:8.2
The script ran without issues as seen below;
2021-08-28 14:07:48.781767: I
tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully
opened dynamic library cudart64_110.dll WARNING:tensorflow:From
C:\Users\saran\Anaconda3\envs\tfgpu\lib\site-packages\tensorflow\python\compat\v2_compat.py:96:
disable_resource_variables (from tensorflow.python.ops.variable_scope) is
deprecated and will be removed in a future version. Instructions for
updating: non-resource variables are not supported in the long term
2021-08-28 14:08:42.383958: I
tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully
opened dynamic library nvcuda.dll 2021-08-28 14:08:47.564227: I
tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with
properties: pciBusID: 0000:01:00.0 name: GeForce GTX 1060 with Max-Q Design
computeCapability: 6.1 coreClock: 1.3415GHz coreCount: 10 deviceMemorySize:
6.00GiB deviceMemoryBandwidth: 178.99GiB/s 2021-08-28 14:08:47.602994: I
tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully
opened dynamic library cudart64_110.dll 2021-08-28 14:08:47.743832: W
tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not
load dynamic library 'cublas64_11.dll'; dlerror: cublas64_11.dll not found
2021-08-28 14:08:47.801707: W
tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not
load dynamic library 'cublasLt64_11.dll'; dlerror: cublasLt64_11.dll not
found 2021-08-28 14:08:48.108700: I
tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully
opened dynamic library cufft64_10.dll 2021-08-28 14:08:48.163044: I
tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully
opened dynamic library curand64_10.dll 2021-08-28 14:08:48.172109: W
tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not
load dynamic library 'cusolver64_11.dll'; dlerror: cusolver64_11.dll not
found 2021-08-28 14:08:48.180278: W
tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not
load dynamic library 'cusparse64_11.dll'; dlerror: cusparse64_11.dll not
found 2021-08-28 14:08:48.189189: W
tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not
load dynamic library 'cudnn64_8.dll'; dlerror: cudnn64_8.dll not found
2021-08-28 14:08:48.195744: W
tensorflow/core/common_runtime/gpu/gpu_device.cc:1766] Cannot dlopen some
GPU libraries. Please make sure the missing libraries mentioned above are
installed properly if you would like to use GPU. Follow the guide at
https://www.tensorflow.org/install/gpu for how to download and setup the
required libraries for your platform. Skipping registering GPU devices...
2021-08-28 14:08:48.347281: I
tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary
is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the
following CPU instructions in performance-critical operations: AVX AVX2 To
enable them in other operations, rebuild TensorFlow with the appropriate
compiler flags. 2021-08-28 14:08:48.421196: I
tensorflow/core/common_runtime/gpu/gpu_device.cc:1258] Device interconnect
StreamExecutor with strength 1 edge matrix: 2021-08-28 14:08:48.435452: I
tensorflow/core/common_runtime/gpu/gpu_device.cc:1264] 2021-08-28
14:08:51.170786: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733]
Found device 0 with properties: pciBusID: 0000:01:00.0 name: GeForce GTX
1060 with Max-Q Design computeCapability: 6.1 coreClock: 1.3415GHz
coreCount: 10 deviceMemorySize: 6.00GiB deviceMemoryBandwidth: 178.99GiB/s
2021-08-28 14:08:51.183624: W
tensorflow/core/common_runtime/gpu/gpu_device.cc:1766] Cannot dlopen some
GPU libraries. Please make sure the missing libraries mentioned above are
installed properly if you would like to use GPU. Follow the guide at
https://www.tensorflow.org/install/gpu for how to download and setup the
required libraries for your platform. Skipping registering GPU devices...
2021-08-28 14:08:53.758128: I
tensorflow/core/common_runtime/gpu/gpu_device.cc:1258] Device interconnect
StreamExecutor with strength 1 edge matrix: 2021-08-28 14:08:53.765005: I
tensorflow/core/common_runtime/gpu/gpu_device.cc:1264] 0 2021-08-28
14:08:53.769173: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1277]
0: N WARNING:tensorflow:From
C:\Users\saran\Anaconda3\envs\tfgpu\lib\site-packages\tensorflow\python\keras\layers\normalization.py:534:
_colocate_with (from tensorflow.python.framework.ops) is deprecated and
will be removed in a future version. Instructions for updating: Colocations
handled automatically by placer. 28-08-2021:14:08:54,419 WARNING
[deprecation.py:336] From
C:\Users\saran\Anaconda3\envs\tfgpu\lib\site-packages\tensorflow\python\keras\layers\normalization.py:534:
_colocate_with (from tensorflow.python.framework.ops) is deprecated and
will be removed in a future version. Instructions for updating: Colocations
handled automatically by placer.
C:\Users\saran\Anaconda3\envs\tfgpu\lib\site-packages\tensorflow\python\keras\engine\training.py:2426:
UserWarning: Model.state_updateswill be removed in a future version. This
property should not be used in TensorFlow 2.0, asupdates are applied
automatically. warnings.warn('Model.state_updates will be removed in a
future version. ' [{'box': (83, 83, 200, 200), 'emotions': {'angry': 0.0,
'disgust': 0.0, 'fear': 0.0, 'happy': 0.97, 'sad': 0.0, 'surprise': 0.0,
'neutral': 0.03}}]
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#30 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACOLMZG56ZADOG4XOY7X2FTT7DJ3BANCNFSM5C5EID6Q>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
|
Dear @Saran-nns and @JustinShenk, My results were similar... So I think the question is still pending... Best regards, |
Hi @kormoczi . Thanks for the update. I updated CUDA and cudnn and found the Logs:
From your log, it is clear that the CUDA couldn't reach the cudnn Please make sure that,
Hope this helps |
This comment has been minimized.
This comment has been minimized.
Hi @Saran-nns, Thanks for your suggestions, I will check them...
Best regards |
Hi @Saran-nns, I have checked the project again, and to my very big surprise, after I have re-built the docker image (without any modification), right now the example is working without any problem! Best regards |
@kormoczi |
Thanks for the issue again and hope you enjoy ferr'ing :) |
Hi,
I was able to use FER on CPU, but cannot make it work on GPU.
Based on this link (https://www.tensorflow.org/install/source#tested_build_configurations) I have checked my configuration, and it looks fine and supported:
tensorflow 2.4.0 / python 3.8.10 / cuda 11.0.3 / cudnn 8.0.5
(I have tried other setups as well, but the results were even worse...)
When I try to run the example.py, the GPU device detected, the cuda libraries successfully opened, but after that I get the following errors:
2021-08-27 11:12:16.722491: E tensorflow/stream_executor/cuda/cuda_blas.cc:226] failed to create cublas handle: CUBLAS_STATUS_NOT_INITIALIZED
2021-08-27 11:12:16.722689: W tensorflow/core/framework/op_kernel.cc:1763] OP_REQUIRES failed at conv_ops.cc:1106 : Not found: No algorithm worked!
Traceback (most recent call last):
File "test.py", line 11, in
result = detector.detect_emotions(image)
File "/usr/local/lib/python3.8/dist-packages/fer/fer.py", line 225, in detect_emotions
face_rectangles = self.find_faces(img, bgr=True)
File "/usr/local/lib/python3.8/dist-packages/fer/fer.py", line 182, in find_faces
results = self._mtcnn.detect_faces(img)
File "/usr/local/lib/python3.8/dist-packages/mtcnn/mtcnn.py", line 300, in detect_faces
result = stage(img, result[0], result[1])
File "/usr/local/lib/python3.8/dist-packages/mtcnn/mtcnn.py", line 342, in __stage1
out = self._pnet.predict(img_y)
File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/keras/engine/training_v1.py", line 982, in predict
return func.predict(
File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/keras/engine/training_arrays_v1.py", line 706, in predict
return predict_loop(
File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/keras/engine/training_arrays_v1.py", line 384, in model_iteration
batch_outs = f(ins_batch)
File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/keras/backend.py", line 3956, in call
fetched = self._callable_fn(*array_vals,
File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/client/session.py", line 1480, in call
ret = tf_session.TF_SessionRunCallable(self._session._session,
tensorflow.python.framework.errors_impl.NotFoundError: 2 root error(s) found.
(0) Not found: No algorithm worked!
[[{{node conv2d/Conv2D}}]]
(1) Not found: No algorithm worked!
[[{{node conv2d/Conv2D}}]]
[[conv2d_4/BiasAdd/_783]]
0 successful operations.
0 derived errors ignored.
Any suggestion what shall I do?
(By the way, there is no difference if I install tensorflow==2.4.0 or tensorflow-gpu==2.4.0...)
Thanks!
The text was updated successfully, but these errors were encountered: