Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Training the model with varying the cell size #5

Open
athier0 opened this issue Jun 26, 2019 · 6 comments
Open

Training the model with varying the cell size #5

athier0 opened this issue Jun 26, 2019 · 6 comments

Comments

@athier0
Copy link

athier0 commented Jun 26, 2019

Hello,boathit my name is Atheir
I'm trying to train the model with cells size 25 ,40 and 150

and I'm facing this error:
Traceback (most recent call last):
File "t2vec.py", line 115, in
train(args)
File "/home//t2vec/train.py", line 225, in train
loss = batchloss(output, target, m1, lossF, args.generator_batch)
File "/home/jahdalay/t2vec/train.py", line 89, in batchloss
loss += lossF(o, t)
File "/home/
/t2vec/train.py", line 168, in
lossF = lambda o, t: KLDIVloss(o, t, criterion, V, D)
File "/home/
/t2vec/train.py", line 42, in KLDIVloss
return criterion(outputk, targetk)
File "/home/
*/pytorch-0.1.12/lib/python3.6/site-packages/torch/nn/modules/module.py", line 206, in call
result = self.forward(*input, kwargs)
File "/home//pytorch-0.1.12/lib/python3.6/site-packages/torch/nn/modules/loss.py", line 36, in forward
return backend_fn(self.size_average, weight=self.weight)(input, target)
File "/home/
/pytorch-0.1.12/lib/python3.6/site-packages/torch/nn/_functions/thnn/auto.py", line 41, in forward
output, *self.additional_args)
RuntimeError: cudaEventCreateWithFlags in future ctor: device-side assert triggered

Could you please help me
Thank you very much
Best,

@boathit
Copy link
Owner

boathit commented Jun 26, 2019

Hi @athier0, do you change the cellsize in this line to the correct value and run the preprocessing script? If yes, how large is your GPU memory? In my experiment, I use a 12GB-GPU device.

@athier0
Copy link
Author

athier0 commented Jun 26, 2019

thank you very much for the fast reply
I change my work to another device has a better GPU memory
hope it works now

but I do have other question
Could you please share other baseline codes (vRNN and CMS)

Thank you

@boathit
Copy link
Owner

boathit commented Jun 27, 2019

The implementation of vRNN and CMS are quite straightforward, you can find the RNN example here.

@athier0
Copy link
Author

athier0 commented Jul 11, 2019

To compare the result of t2vec and the 3 other models I need to have the same result (close enough ) as its shown in the paper.
I worked with edwp model and I get a similar result as in the paper but, when I implemented vRNN and CMS I had a quite different result.
please if you could share the vRNN and CMS I would have the same result and build on that.

thank you very much

@boathit
Copy link
Owner

boathit commented Jul 11, 2019

Hi athier0, my original code repository was stored in our school's GPU server and they had taken back my account this February. I have checked my local machine and only the code of t2vec is found.

If you have got different results I think you should just report them as what you got.

@athier0
Copy link
Author

athier0 commented Jul 11, 2019

ok ill do that thank you,

in doc you mention that to train the 100 cell size model you terminate it after 14 h
but how much time it take when the cell size 25 ?
cuz i'm training the model for that past 7 days and its still going (cell size 25)
the checkpoints updated but the model is not updated
i have a good GPU memory almost 44 gb

----
i tried to edit the number of iterations to number rather that the number to trajectory (dataset)
but the part of encode into vector its gave me this error

File "t2vec.py", line 113, in
t2vec(args)
File "/home/jahdalay/t2vec/evaluate.py", line 83, in t2vec
m0.load_state_dict(checkpoint["m0"])
File "/home/jahdalay/pytorch-0.1.12/lib/python3.6/site-packages/torch/nn/modules/module.py", line 335, in load_state_dict
own_state[name].copy_(param)
RuntimeError: sizes do not match at /b/wheel/pytorch-src/torch/lib/THC/generic/THCTensorCopy.c:95

thank you very much
and sorry for asking a lot ):

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants