Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[performance] slogdet is slow on GPU #5

Open
Nintorac opened this issue Sep 9, 2020 · 3 comments
Open

[performance] slogdet is slow on GPU #5

Nintorac opened this issue Sep 9, 2020 · 3 comments

Comments

@Nintorac
Copy link
Contributor

Nintorac commented Sep 9, 2020

Hey, great codebase, thank you!

I was looking into performance bottlenecks and I found the following which gave me almost a 2x (1.76 it/s -> 2.96 it/s) speedup for the cifar10 example

The issue is in the Conv1x1 module. the calculation of torch.slogdet is much slower on GPU than CPU

https://github.com/didriknielsen/survae_flows/blob/master/survae/transforms/bijections/conv1x1.py#L40

This is the modified fast _logdet

    def _logdet(self, x_shape):
        b, c, h, w = x_shape
        _, ldj_per_pixel = torch.slogdet(self.weight.to('cpu'))
        ldj = ldj_per_pixel * h * w
        return ldj.expand([b]).to(self.weight.device)
@hmdolatabadi
Copy link

hmdolatabadi commented Sep 10, 2020

Hi,

Related to this issue, if you try a large network (e.g. the Glow architecture for CIFAR-10), then you may encounter an error in the middle of training which says:

File "./examples/cifar10_aug_flow.py", line 102, in <module>
    loss.backward()
  File "/home/user/.conda/envs/idf/lib/python3.7/site-packages/torch/tensor.py", line 185, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph)
  File "/home/user/.conda/envs/idf/lib/python3.7/site-packages/torch/autograd/__init__.py", line 127, in backward
    allow_unreachable=True)  # allow_unreachable flag
RuntimeError: svd_cuda: the updating process of SBDSDC did not converge (error: 23)

After looking it up on Google, it seems to me that the SVD operation of _slogdet may be responsible for this. On a note in PyTorch official documentation they say:

Backward through slogdet() internally uses SVD results when input is not invertible. In this case, double backward through slogdet() will be unstable in when input doesn’t have distinct singular values. See svd() for details.

I haven't tested the above solution to see whether it has an effect or not.

UPDATE:
After trying the above solution, the same problem happened to me on epoch 40 when I was training a model:

File "./examples/cifar10_aug_flow.py", line 102, in <module>
    loss.backward()
  File "/home/user/.conda/envs/idf/lib/python3.7/site-packages/torch/tensor.py", line 185, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph)
  File "/home/user/.conda/envs/idf/lib/python3.7/site-packages/torch/autograd/__init__.py", line 127, in backward
    allow_unreachable=True)  # allow_unreachable flag
RuntimeError: svd_cpu: the updating process of SBDSDC did not converge (error: 23)

@didriknielsen
Copy link
Owner

The issue is in the Conv1x1 module. the calculation of torch.slogdet is much slower on GPU than CPU

Hi,

Thanks! This is gold. I tried on my computer I also found a ~20% speedup by running torch.slogdet on CPU.
I've added an argument slogdet_cpu in Conv1x1 with default=True.

@didriknielsen
Copy link
Owner

didriknielsen commented Sep 10, 2020

Related to this issue, if you try a large network (e.g. the Glow architecture for CIFAR-10), then you may encounter an error in the middle of training which says: [...]

The CIFAR-10 example uses the default scale_fn=lambda s: torch.exp(s) in the AffineCouplingBijection.
This choice can lead to instability during longer training since the scales output by the coupling networks can become very large.

I would suggest using something like

  • scale_fn=lambda s: torch.exp(2. * torch.tanh(s / 2.)) or
  • scale_fn=lambda s: torch.sigmoid(s+2.)+1e-3

instead, which keep the scales bounded.

The first choice is what we used in our image experiments, the second is what was used in the Glow code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants