-
Notifications
You must be signed in to change notification settings - Fork 58
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
batch size #13
Comments
Which batch sizes? During training or testing? |
thank you for that fast response! |
Aha. There might be a bug in the match method when not using image paths. I think the best approach is to simple use the model forward for now. Make sure that you use the same data preprocessing as us (load and resize with pil bicubic, normalize with imagenet mean and std).
I'll try to clean up the code in the coming weeks.
…________________________________
From: sushi31415926 ***@***.***>
Sent: Wednesday, November 22, 2023 4:27:58 PM
To: Parskatt/RoMa ***@***.***>
Cc: Johan Edstedt ***@***.***>; Comment ***@***.***>
Subject: Re: [Parskatt/RoMa] batch size (Issue #13)
thank you for that fast response!
this issue happens during testing
when i use the method match.
I load the images using PIL and use torch.stack to create batch.
when i use different batch size the certainty value changes between running of the same images
roma_model.match(torch.stack([im1]* 16), torch.stack([im2]*16) roma_model.match(torch.stack([im1]* 8), torch.stack([im2]*8)
the certainty value of this running is different
thanks!
—
Reply to this email directly, view it on GitHub<#13 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AFIIB7QTJJTPXRNKOBACTSDYFYKX5AVCNFSM6AAAAAA7WMAQRGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMRSHE4DMOBQGE>.
You are receiving this because you commented.Message ID: ***@***.***>
|
i think i found the issue. |
Hi @Parskatt , with my 11GB of GPU memory, I run out of memory when trying batching, does that make sense or am I doing something wrong?
I am loading my image batches to the device, use the imagenet mean and std, and resize them to 560x560 pixels.
This is how I call it
currently, even batch_size=2 is too much.... In the
It runs out of memory... I tried calling Thank you in advance |
Hiya! Did you forget to wrap forward in inference_mode/no_grad? For reference, during training a batch of 8 at res 560 fills up about 40 GB, so it would make sense that batch 2 can oom 11gb, since weights take some space. If you want to finetune I'd suggest freezing batchnorm and using batchsize 1. |
@Parskatt Thanks for the very fast reply! Uh nice, yes with
I am able to reduce the memory food print (makes sense, forgot it), and can process batches of 6 images. Thank you very much! |
Somehow I don't see the recommendation to apply model.eval here @Parskatt , but thank you, I changed my implementation to
|
Yeah github bugged for me and showed my comment as duplicated, removed one and both disappeared... |
alright, in any case thank you very much! |
@sushi31415926, Thanks for the reporting! I had the same issue and by your solution I could resolve it. By the way, could you let me know how this solution solve the problem? |
Hi, @Parskatt, does training batchsize significantly affect the final performance of the model? I only have 24G gpus, and the maximum batchsize is 2 rather than 8 you used as described in the paper.
|
In general, yes lower batchsize reduces results. I would not go below 8. Difficult to give advice, but you can decrease resolution for lower mem. You can also reduce mem through making better local correlation kernel. I have done this, but it's part of a new project which I can't share yet. You could also try things like sync batchnorm and gradient accumulation. Batchnorm is scary though. |
Thank you for your fast response. Do you mean a batch size of 8 in total (global batch size) or 8 per GPU (local batch size)? |
I have verified that batchsize of 8 of 1 gpu works (global batch size). Local batch size of 4 on 2 gpus with bn syncro should also give very similar results. I really would like to throw out bn, but it's just too good for the refiners to switch easily (atleast in my older experiments this was the case). |
I see. bn is quite annoying! |
hello
i noticed that when i used different batch size the certainty value is changing
did you any idea why?
The text was updated successfully, but these errors were encountered: