question on training on large gpu #13

johndpope · 2021-11-11T21:38:46Z

+-----------------------------------------------------------------------------+
| GPU PID USER GPU MEM %CPU %MEM TIME COMMAND |
| 0 2637 root 725MiB 2.5 0.3 04:35:08 /usr/lib/xorg/Xorg v |
| 0 3700 jp 133MiB 5.7 2.3 04:35:05 /usr/bin/gnome-shell |
| 0 5077 jp 198MiB 7.9 0.9 04:34:57 /opt/google/chrome/c |
| 0 11864 jp 64MiB 1.1 0.5 04:16:50 /snap/code/80/usr/sh |
| 0 26251 jp 287MiB 0.1 4.6 01:34:40 python generate.py |
| 0 58358 jp 3985MiB 99.8 15.7 47:52 python scripts/train |
+-----------------------------------------------------------------------------+

Currently - I'm seeing usage at 16% - 5GB memory of 24GB card. Is there some low hanging fruit to get code to use more resources?

I did take a look here
https://towardsdatascience.com/7-tips-for-squeezing-maximum-performance-from-pytorch-ca4a40951259
(only thing that stood out is tensor(init) not calling cuda() directly.)

were there efforts to reduce ram requirements (that I could remove)?

ciaua · 2022-05-25T04:14:56Z

A simple thing to try is to increase the number of workers in dataloader.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

question on training on large gpu #13

question on training on large gpu #13

johndpope commented Nov 11, 2021

ciaua commented May 25, 2022

question on training on large gpu #13

question on training on large gpu #13

Comments

johndpope commented Nov 11, 2021

ciaua commented May 25, 2022