Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FER not working with GPU #30

Open
kormoczi opened this issue Aug 27, 2021 · 9 comments
Open

FER not working with GPU #30

kormoczi opened this issue Aug 27, 2021 · 9 comments

Comments

@kormoczi
Copy link

Hi,
I was able to use FER on CPU, but cannot make it work on GPU.
Based on this link (https://www.tensorflow.org/install/source#tested_build_configurations) I have checked my configuration, and it looks fine and supported:
tensorflow 2.4.0 / python 3.8.10 / cuda 11.0.3 / cudnn 8.0.5
(I have tried other setups as well, but the results were even worse...)

When I try to run the example.py, the GPU device detected, the cuda libraries successfully opened, but after that I get the following errors:
2021-08-27 11:12:16.722491: E tensorflow/stream_executor/cuda/cuda_blas.cc:226] failed to create cublas handle: CUBLAS_STATUS_NOT_INITIALIZED
2021-08-27 11:12:16.722689: W tensorflow/core/framework/op_kernel.cc:1763] OP_REQUIRES failed at conv_ops.cc:1106 : Not found: No algorithm worked!
Traceback (most recent call last):
File "test.py", line 11, in
result = detector.detect_emotions(image)
File "/usr/local/lib/python3.8/dist-packages/fer/fer.py", line 225, in detect_emotions
face_rectangles = self.find_faces(img, bgr=True)
File "/usr/local/lib/python3.8/dist-packages/fer/fer.py", line 182, in find_faces
results = self._mtcnn.detect_faces(img)
File "/usr/local/lib/python3.8/dist-packages/mtcnn/mtcnn.py", line 300, in detect_faces
result = stage(img, result[0], result[1])
File "/usr/local/lib/python3.8/dist-packages/mtcnn/mtcnn.py", line 342, in __stage1
out = self._pnet.predict(img_y)
File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/keras/engine/training_v1.py", line 982, in predict
return func.predict(
File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/keras/engine/training_arrays_v1.py", line 706, in predict
return predict_loop(
File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/keras/engine/training_arrays_v1.py", line 384, in model_iteration
batch_outs = f(ins_batch)
File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/keras/backend.py", line 3956, in call
fetched = self._callable_fn(*array_vals,
File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/client/session.py", line 1480, in call
ret = tf_session.TF_SessionRunCallable(self._session._session,
tensorflow.python.framework.errors_impl.NotFoundError: 2 root error(s) found.
(0) Not found: No algorithm worked!
[[{{node conv2d/Conv2D}}]]
(1) Not found: No algorithm worked!
[[{{node conv2d/Conv2D}}]]
[[conv2d_4/BiasAdd/_783]]
0 successful operations.
0 derived errors ignored.

Any suggestion what shall I do?
(By the way, there is no difference if I install tensorflow==2.4.0 or tensorflow-gpu==2.4.0...)

Thanks!

@Saran-nns
Copy link

Saran-nns commented Aug 28, 2021

Thanks for reporting the issue.

I suspect possible compatibility issues between the OS and CUDA/cudnn versions:

I ran the example.py under the env:

OS: Windows 10
Python : 3.6
TF:2.4
CUDA:10.2 with 11.0 dll
Cudnn:8.2

The script ran without issues as seen below;

2021-08-28 14:07:48.781767: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cudart64_110.dll WARNING:tensorflow:From C:\Users\saran\Anaconda3\envs\tfgpu\lib\site-packages\tensorflow\python\compat\v2_compat.py:96: disable_resource_variables (from tensorflow.python.ops.variable_scope) is deprecated and will be removed in a future version. Instructions for updating: non-resource variables are not supported in the long term 2021-08-28 14:08:42.383958: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library nvcuda.dll 2021-08-28 14:08:47.564227: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties: pciBusID: 0000:01:00.0 name: GeForce GTX 1060 with Max-Q Design computeCapability: 6.1 coreClock: 1.3415GHz coreCount: 10 deviceMemorySize: 6.00GiB deviceMemoryBandwidth: 178.99GiB/s 2021-08-28 14:08:47.602994: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cudart64_110.dll 2021-08-28 14:08:47.743832: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'cublas64_11.dll'; dlerror: cublas64_11.dll not found 2021-08-28 14:08:47.801707: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'cublasLt64_11.dll'; dlerror: cublasLt64_11.dll not found 2021-08-28 14:08:48.108700: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cufft64_10.dll 2021-08-28 14:08:48.163044: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library curand64_10.dll 2021-08-28 14:08:48.172109: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'cusolver64_11.dll'; dlerror: cusolver64_11.dll not found 2021-08-28 14:08:48.180278: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'cusparse64_11.dll'; dlerror: cusparse64_11.dll not found 2021-08-28 14:08:48.189189: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'cudnn64_8.dll'; dlerror: cudnn64_8.dll not found 2021-08-28 14:08:48.195744: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1766] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform. Skipping registering GPU devices... 2021-08-28 14:08:48.347281: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX AVX2 To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 2021-08-28 14:08:48.421196: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1258] Device interconnect StreamExecutor with strength 1 edge matrix: 2021-08-28 14:08:48.435452: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1264] 2021-08-28 14:08:51.170786: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties: pciBusID: 0000:01:00.0 name: GeForce GTX 1060 with Max-Q Design computeCapability: 6.1 coreClock: 1.3415GHz coreCount: 10 deviceMemorySize: 6.00GiB deviceMemoryBandwidth: 178.99GiB/s 2021-08-28 14:08:51.183624: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1766] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform. Skipping registering GPU devices... 2021-08-28 14:08:53.758128: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1258] Device interconnect StreamExecutor with strength 1 edge matrix: 2021-08-28 14:08:53.765005: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1264] 0 2021-08-28 14:08:53.769173: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1277] 0: N WARNING:tensorflow:From C:\Users\saran\Anaconda3\envs\tfgpu\lib\site-packages\tensorflow\python\keras\layers\normalization.py:534: _colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version. Instructions for updating: Colocations handled automatically by placer. 28-08-2021:14:08:54,419 WARNING [deprecation.py:336] From C:\Users\saran\Anaconda3\envs\tfgpu\lib\site-packages\tensorflow\python\keras\layers\normalization.py:534: _colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version. Instructions for updating: Colocations handled automatically by placer. C:\Users\saran\Anaconda3\envs\tfgpu\lib\site-packages\tensorflow\python\keras\engine\training.py:2426: UserWarning: Model.state_updateswill be removed in a future version. This property should not be used in TensorFlow 2.0, asupdates are applied automatically. warnings.warn('Model.state_updates will be removed in a future version. ' [{'box': (83, 83, 200, 200), 'emotions': {'angry': 0.0, 'disgust': 0.0, 'fear': 0.0, 'happy': 0.97, 'sad': 0.0, 'surprise': 0.0, 'neutral': 0.03}}]

May I know your

  1. OS
  2. Do you have multiple versions of CUDA installed?

Please try to upgrade to Cudnn==8.2 and let us know if the error persists

@JustinShenk
Copy link
Owner

JustinShenk commented Aug 28, 2021 via email

@kormoczi
Copy link
Author

Dear @Saran-nns and @JustinShenk,

My results were similar...
If there is any problem with the GPU initialization (similar to your example, @Saran-nns), then the system falls back using the CPU, so everything works (or at least it looks like).
But if the GPU initialization is OK, than after that there will be errors.

So I think the question is still pending...

Best regards,
Csaba

@Saran-nns
Copy link

Saran-nns commented Sep 2, 2021

Hi @kormoczi . Thanks for the update.

I updated CUDA and cudnn and found the example.py ran successfully with GPU.

Logs:

(tfgpu) N:\fer>python example.py 2021-09-02 17:16:47.723348: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cudart64_110.dll WARNING:tensorflow:From C:\Users\saran\Anaconda3\envs\tfgpu\lib\site-packages\tensorflow\python\compat\v2_compat.py:96: disable_resource_variables (from tensorflow.python.ops.variable_scope) is deprecated and will be removed in a future version. Instructions for updating: non-resource variables are not supported in the long term 2021-09-02 17:18:06.955330: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library nvcuda.dll 2021-09-02 17:18:13.564614: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties: pciBusID: 0000:01:00.0 name: GeForce GTX 1060 with Max-Q Design computeCapability: 6.1 coreClock: 1.3415GHz coreCount: 10 deviceMemorySize: 6.00GiB deviceMemoryBandwidth: 178.99GiB/s 2021-09-02 17:18:13.576812: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cudart64_110.dll 2021-09-02 17:18:16.750779: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cublas64_11.dll 2021-09-02 17:18:16.756477: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cublasLt64_11.dll 2021-09-02 17:18:17.022092: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cufft64_10.dll 2021-09-02 17:18:17.790750: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library curand64_10.dll 2021-09-02 17:18:18.722597: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cusolver64_11.dll 2021-09-02 17:18:19.191502: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cusparse64_11.dll 2021-09-02 17:18:21.192585: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cudnn64_8.dll 2021-09-02 17:18:22.231133: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1871] Adding visible gpu devices: 0 2021-09-02 17:18:23.184233: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX AVX2 To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 2021-09-02 17:18:23.578463: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties: pciBusID: 0000:01:00.0 name: GeForce GTX 1060 with Max-Q Design computeCapability: 6.1 coreClock: 1.3415GHz coreCount: 10 deviceMemorySize: 6.00GiB deviceMemoryBandwidth: 178.99GiB/s 2021-09-02 17:18:23.592416: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1871] Adding visible gpu devices: 0 2021-09-02 17:18:47.042383: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1258] Device interconnect StreamExecutor with strength 1 edge matrix: 2021-09-02 17:18:47.071966: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1264] 0 2021-09-02 17:18:47.076344: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1277] 0: N 2021-09-02 17:18:47.330953: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1418] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 4484 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1060 with Max-Q Design, pci bus id: 0000:01:00.0, compute capability: 6.1) 2021-09-02 17:18:49.850172: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties: pciBusID: 0000:01:00.0 name: GeForce GTX 1060 with Max-Q Design computeCapability: 6.1 coreClock: 1.3415GHz coreCount: 10 deviceMemorySize: 6.00GiB deviceMemoryBandwidth: 178.99GiB/s 2021-09-02 17:18:49.861020: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1871] Adding visible gpu devices: 0 2021-09-02 17:18:49.865012: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1258] Device interconnect StreamExecutor with strength 1 edge matrix: 2021-09-02 17:18:49.870792: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1264] 0 2021-09-02 17:18:49.873996: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1277] 0: N 2021-09-02 17:18:49.877882: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1418] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 4484 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1060 with Max-Q Design, pci bus id: 0000:01:00.0, compute capability: 6.1) WARNING:tensorflow:From C:\Users\saran\Anaconda3\envs\tfgpu\lib\site-packages\tensorflow\python\keras\layers\normalization.py:534: _colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version. Instructions for updating: Colocations handled automatically by placer. 02-09-2021:17:18:50,945 WARNING [deprecation.py:336] From C:\Users\saran\Anaconda3\envs\tfgpu\lib\site-packages\tensorflow\python\keras\layers\normalization.py:534: _colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version. Instructions for updating: Colocations handled automatically by placer. C:\Users\saran\Anaconda3\envs\tfgpu\lib\site-packages\tensorflow\python\keras\engine\training.py:2426: UserWarning: Model.state_updateswill be removed in a future version. This property should not be used in TensorFlow 2.0, asupdates are applied automatically. warnings.warn('Model.state_updates will be removed in a future version. ' 2021-09-02 17:18:56.204265: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cudnn64_8.dll 2021-09-02 17:19:15.136577: I tensorflow/stream_executor/cuda/cuda_dnn.cc:359] Loaded cuDNN version 8202 2021-09-02 17:19:48.728937: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cublas64_11.dll 2021-09-02 17:19:57.370451: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cublasLt64_11.dll 2021-09-02 17:20:06.030969: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:689] Error in PredictCost() for the op: op: "Softmax" attr { key: "T" value { type: DT_FLOAT } } inputs { dtype: DT_FLOAT shape { unknown_rank: true } } device { type: "GPU" vendor: "NVIDIA" model: "GeForce GTX 1060 with Max-Q Design" frequency: 1341 num_cores: 10 environment { key: "architecture" value: "6.1" } environment { key: "cuda" value: "11020" } environment { key: "cudnn" value: "8100" } num_registers: 65536 l1_cache_size: 24576 l2_cache_size: 1572864 shared_memory_size_per_multiprocessor: 98304 memory_size: 4702352179 bandwidth: 192192000 } outputs { dtype: DT_FLOAT shape { unknown_rank: true } } [{'box': (83, 83, 200, 200), 'emotions': {'angry': 0.0, 'disgust': 0.0, 'fear': 0.0, 'happy': 0.97, 'sad': 0.0, 'surprise': 0.0, 'neutral': 0.03}}]

From your log, it is clear that the CUDA couldn't reach the cudnn .dll files.

Please make sure that,

  1. You have the right version of cudnn. I suggest cuda 11.x, cudnn 8.x
  2. You have added cudnn in your system path
  3. Copy and paste the dll files from CUDNN bin to CUDA bin as in user guide
  4. Please remove any old versions of CUDA and cudnn from your system paths to avoid path conflicts
  5. Restart your system

Hope this helps

@Saran-nns

This comment has been minimized.

@kormoczi
Copy link
Author

kormoczi commented Sep 6, 2021

Hi @Saran-nns,

Thanks for your suggestions, I will check them...
But I have two questions:

  1. You wrote this: "From your log, it is clear that the CUDA couldn't reach the cudnn .dll files."
    Which part of my log shows this?
  2. You suggested to use cuda 11.x, cudnn 8.x, but as I have stated in the beginning, I am using cuda 11.0.3 / cudnn 8.0.5 already,
    so this should be ok... No?

Best regards

@kormoczi
Copy link
Author

kormoczi commented Sep 6, 2021

Hi @Saran-nns,

I have checked the project again, and to my very big surprise, after I have re-built the docker image (without any modification), right now the example is working without any problem!
I can't tell yet, what has changed, but most probably not the FER library and not the CUDA/CuDNN...

Best regards

@Saran-nns
Copy link

@kormoczi
Great that it works through docker.
cuda 11.x is not packaged with cublas. cudnn provides this functionality for ml frameworks like tf, pytorch or keras to generate (initialize) any cublas handles. Even you have the right versions installed, the error could still throw if their (cuda and cudnn) paths(including the python environment) are not well defined.

@Saran-nns
Copy link

Thanks for the issue again and hope you enjoy ferr'ing :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants