Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow passing in cr/cl bounds and other settings #6

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

winston-zillow
Copy link

Fix execution on CPU and GPU. Fix model loading.

@@ -22,7 +22,7 @@ We need to put the data sets in the `dataset` folder. You can specify one data s

```bash
# trained on the tic-tac-toe data set with one GPU.
python3 experiment.py -d tic-tac-toe -bs 32 -s 1@16 -e401 -lrde 200 -lr 0.002 -ki 0 -mp 12481 -i 0 -wd 1e-6 &
python3 experiment.py -d tic-tac-toe -bs 32 -s 1@16 -e401 -lrde 200 -lr 0.002 -ki 0 -mp 12481 -i cuda:0 -wd 1e-6 &
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: see review comment on args.py changes

@@ -51,7 +52,8 @@
rrl_args.plot_file = os.path.join(rrl_args.folder_path, 'plot_file.pdf')
rrl_args.log = os.path.join(rrl_args.folder_path, 'log.txt')
rrl_args.test_res = os.path.join(rrl_args.folder_path, 'test_res.txt')
rrl_args.device_ids = list(map(int, rrl_args.device_ids.strip().split('@')))
rrl_args.device_ids = list(map(lambda id: torch.device(id), rrl_args.device_ids.strip().split('@'))) \
if rrl_args.device_ids else [None]
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: I found that passing in integer device ID would get the tensors pegged to the GPU memory but the GPU compute utilization remains at 0, as shown by nvidia-smi. After I change the device ID to that returned by torch.device("cuda:0"), the GPU is utilized fully. I do not know why that's the case as simple test using a python loop can cause GPU utilization.

Example run passing in integer device ID:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.142.00   Driver Version: 450.142.00   CUDA Version: 11.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla K80           On   | 00000000:00:1E.0 Off |                    0 |
| N/A   47C    P0    70W / 149W |    322MiB / 11441MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A     27173      C   ...vs/pytorch_p37/bin/python      319MiB |
+-----------------------------------------------------------------------------+

Example run passing in cuda:*:

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A     27346      C   ...vs/pytorch_p37/bin/python     1736MiB |
+-----------------------------------------------------------------------------+
Sat Dec  4 01:31:31 2021       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.142.00   Driver Version: 450.142.00   CUDA Version: 11.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla K80           On   | 00000000:00:1E.0 Off |                    0 |
| N/A   52C    P0   138W / 149W |   1739MiB / 11441MiB |    100%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A     27346      C   ...vs/pytorch_p37/bin/python     1736MiB |
+-----------------------------------------------------------------------------+

# lower_bound: [continuous cols]
# upper_bound: [continuous cols]
}
return settings
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: I added this new setting file so that the user can pass in CR/CL bounds as well as controlling normalization and one-hot encoding etc. (those are currently hard-coded)

if self.left is not None and self.right is not None:
if cl is not None and cr is not None: # bounds are specified
cl = torch.tensor(cl).type(torch.float).t()
cr = torch.tensor(cr).type(torch.float).t()
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: here we can pass in the cl/cr bounds directly.

cl = self.left + torch.rand(self.n, self.input_dim[1]) * (self.right - self.left)
cr = self.left + torch.rand(self.n, self.input_dim[1]) * (self.right - self.left)
else:
cl = 3. * (2. * torch.rand(self.n, self.input_dim[1]) - 1.)
cr = 3. * (2. * torch.rand(self.n, self.input_dim[1]) - 1.)
assert torch.Size([self.n, self.input_dim[1]]) == cl.size()
assert torch.Size([self.n, self.input_dim[1]]) == cr.size()
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: and verify the shapes are correct.


self.net.cuda(self.device_id)
if self.device_id and self.device_id.type == 'cuda':
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: the condition allows the program to run in CPU mode as well.

self.feature_enc = preprocessing.OneHotEncoder(categories='auto', drop=drop)
self.imp = SimpleImputer(missing_values=np.nan, strategy='mean')
self.feature_enc = preprocessing.OneHotEncoder(categories='auto', drop=drop) if one_hot_encode_features else None
self.imp = SimpleImputer(missing_values=np.nan, strategy='mean') if impute_continuous else None
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: for dataset not requiring or already have one-hot encoding or imputation, they can now be skipped.

@12wang3
Copy link
Owner

12wang3 commented Dec 7, 2021

Thank you very much for the PR. I am busy on other stuff now and will check the code after Dec 9.

@ASan1527
Copy link

ASan1527 commented Nov 8, 2022

image
I cant catch the device_ids, and I only have the single gpu, I don't know how to change the code. Could you please tell me to solve it? thank you!

@12wang3
Copy link
Owner

12wang3 commented Nov 8, 2022

image I cant catch the device_ids, and I only have the single gpu, I don't know how to change the code. Could you please tell me to solve it? thank you!

Could you please show the command you used? Have you set the "-i" argument? It seems that you did not set the device_ids since your device_ids was None. If you only have one single GPU, you can use "-i 0" to set the device_ids. By the way, maybe we should use issue rather than PR to discuss questions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants