Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Detail architecture of dynamic instance normalization #18

Open
zy-xc opened this issue Dec 22, 2020 · 21 comments
Open

Detail architecture of dynamic instance normalization #18

zy-xc opened this issue Dec 22, 2020 · 21 comments

Comments

@zy-xc
Copy link

zy-xc commented Dec 22, 2020

Hello @ycjing
Thanks for your brilliant works! I am interesting in paper "Dynamic Instance Normalization for Arbitrary Style Transfer" but I don't know the detail architecture of DIN and can't find the supplementary material.
Would you please provide the detailed network architecture of this paper?
Thank you!

@ycjing
Copy link
Owner

ycjing commented Dec 23, 2020

Hi @zy-xc

Thank you for your interest in our work. Here is the link for the corresponding supplement: https://drive.google.com/file/d/1sBFXqWaWOeMuaaVHMM-ddBssKr3OmutW/view?usp=sharing

Please feel free to contact me if there is any other question. Thank you!

Best,
Yongcheng

@zy-xc
Copy link
Author

zy-xc commented Dec 23, 2020

Thank you for your reply!

I am a bit confusing about size of weight generated by Weight/Bias network. Is the dynamic convolution layer set groups = 64(num_channels of content feature) ?

It seens that the size of style image should be large if we set groups=1. For example, considering standard DIN with kernel_size=1. The weight size generated by Weight Net should be 64 * 64 * 1 * 1. So the vgg features size of style image should be at least 64 * 64 * 64(C * H * W), and the size of style image should be at least 512 * 512. Then if we want to train standard DIN with kernel_size=3, the size of style image should be at least 1536 * 1536.

Or standard DIN set groups=64 and the size of generated weight should be 64 * kernel_size * kernel_size ?

Thank you!

@ycjing
Copy link
Owner

ycjing commented Dec 24, 2020

Hi @zy-xc

Thank you for your interests in our work! Regarding your question, yes, we indeed set group # to be equal to the feature channel, which is indicated in the "Architecture Details" in the supplement. Also, please kindly note that the size of the generated weight and bias is not correlated with the input size, since we use an adaptive pooling layer in the corresponding weight and bias networks. You can set the desired size of the weight and bias by controlling the adaptive pooling layer.

Please let me know if there is any other question. Thank you.

Best,
Yongcheng

@sonnguyen129
Copy link
Contributor

I find the supplementary detail confusing to implement. Has anyone implemented in Pytorch yet? Can you help me?
Thank you so much

@ycjing
Copy link
Owner

ycjing commented Dec 2, 2021

Hi @sonnguyen129

Thank you for your interests in our work! Could you please elaborate which part exactly is confusing? I am more than happy to clarify it. Also, if you would like our source code, please drop me an email to apply for the necessary permission that is required by the company. Thanks!

Best,
Yongcheng

@sonnguyen129
Copy link
Contributor

Hi @ycjing
I sent you an email. I hope to hear from you as soon as possible.
Thank you.

@sonnguyen129
Copy link
Contributor

Hi @ycjing
I have a few questions as follows:

  1. as I understand it, that's the correlation between proposed architecture and illustration. Am I correct? (Sorry for the bad drawing)

image

2.Res layer and upsampling layer is quite lacking in information and I don't know where it is on the illustration
3. With DIN module, when training, input is each image style separately or in batch. If batch, does the bactch size need to match the content dataset?
4. Shilei Wen's mail on paper([email protected]) is currently incorrect
I hope to be of your help. Thanks very much.

@ycjing
Copy link
Owner

ycjing commented Dec 2, 2021

Hi @sonnguyen129

  1. Yes.
  2. Since it is quite redundant to show the residual connections in the figure, I just use the blocks to represent the corresponding residual modules. Our used residual blocks have no differences with those used in other tasks, just the most common ones.
  3. We follow the settings in AdaIN. Please refer to https://github.com/naoto0804/pytorch-AdaIN
  4. As I already mentioned in my email, you can alternatively contact Dr. Errui Ding. Other information is also already provided in the mail.

Thanks for your interests again! Please feel free to reach me if there is anything else that is not clear.

Cheers,
Yongcheng

@ycjing
Copy link
Owner

ycjing commented Dec 3, 2021

Hi @sonnguyen129

Could you please provide the detailed log information? Thanks!

Best,

@sonnguyen129
Copy link
Contributor

sonnguyen129 commented Dec 3, 2021

Here is my test case:

c = torch.rand(8,64,224,224)
s = torch.rand(8,64,224,224)
out = DIN(3)(c, s)
print(out)

Logs:

Traceback (most recent call last):
  File "model.py", line 136, in <module>
    out = DIN(3)(c, s)
  File "model.py", line 70, in __init__
    self.weight_bias = WeightAndBias(inp = inp)
  File "model.py", line 49, in __init__
    self.dwconv1 = DepthWiseConv2d(inp, 128, 3, 128, 2)
  File "model.py", line 10, in __init__
    groups = groups, stride = stride, padding = 1)
  File "/home/truongson/.local/bin/.virtualenvs/dl4cv/lib/python3.6/site-packages/torch/nn/modules/conv.py", line 432, in __init__
    False, _pair(0), groups, bias, padding_mode, **factory_kwargs)
  File "/home/truongson/.local/bin/.virtualenvs/dl4cv/lib/python3.6/site-packages/torch/nn/modules/conv.py", line 84, in __init__
    raise ValueError('in_channels must be divisible by groups')
ValueError: in_channels must be divisible by groups

@ycjing
Copy link
Owner

ycjing commented Dec 3, 2021

Hi @sonnguyen129

As depicted in the log, the group # is wrong, which should be equal to in_channel.

Best,
Yongcheng

@sonnguyen129
Copy link
Contributor

Hi @ycjing
I have 2 questions:

  1. Can you provide information about the AdaptivePooling layer, specifically the target size.
  2. Is add method in Fig 4 a concat channel or just like basic residual block?
    Thank you so much.

@sonnguyen129
Copy link
Contributor

Hi @ycjing
I got error.

Traceback (most recent call last):
  File "model.py", line 197, in <module>
    out = WeightAndBias(512)(out)
  File "/home/truongson/.local/bin/.virtualenvs/dl4cv/lib/python3.6/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "model.py", line 79, in forward
    out = self.dwconv2(out)
  File "/home/truongson/.local/bin/.virtualenvs/dl4cv/lib/python3.6/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "model.py", line 25, in forward
    out = self.pointwise(out)
  File "/home/truongson/.local/bin/.virtualenvs/dl4cv/lib/python3.6/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/truongson/.local/bin/.virtualenvs/dl4cv/lib/python3.6/site-packages/torch/nn/modules/container.py", line 139, in forward
    input = module(input)
  File "/home/truongson/.local/bin/.virtualenvs/dl4cv/lib/python3.6/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/truongson/.local/bin/.virtualenvs/dl4cv/lib/python3.6/site-packages/torch/nn/modules/instancenorm.py", line 59, in forward
    self.training or not self.track_running_stats, self.momentum, self.eps)
  File "/home/truongson/.local/bin/.virtualenvs/dl4cv/lib/python3.6/site-packages/torch/nn/functional.py", line 2325, in instance_norm
    _verify_spatial_size(input.size())
  File "/home/truongson/.local/bin/.virtualenvs/dl4cv/lib/python3.6/site-packages/torch/nn/functional.py", line 2292, in _verify_spatial_size
    raise ValueError("Expected more than 1 spatial element when training, got input size {}".format(size))
ValueError: Expected more than 1 spatial element when training, got input size torch.Size([8, 64, 1, 1])

Here is my code:

class DepthWiseConv2d(nn.Module):
    def __init__(self, in_channels, out_channels, kernel_size, groups, stride):
        super(DepthWiseConv2d, self).__init__()
        self.depthwise = nn.Sequential(
                nn.Conv2d(in_channels, in_channels, kernel_size = kernel_size,
                    groups = groups, stride = stride, padding = 1),
                nn.InstanceNorm2d(in_channels),
                nn.ReLU(True)
        )
        self.pointwise = nn.Sequential(
                nn.Conv2d(in_channels, out_channels, kernel_size = kernel_size,
                    stride = stride),
                nn.InstanceNorm2d(out_channels),
                nn.ReLU(True)
        )

    def forward(self, x):
        out = self.depthwise(x)
        out = self.pointwise(out)
        return out

class VGGEncoder(nn.Module):
    def __init__(self):
        super().__init__()
        vgg = vgg19(pretrained=True).features
        self.slice1 = vgg[: 2]
        self.slice2 = vgg[2: 7]
        self.slice3 = vgg[7: 12]
        self.slice4 = vgg[12: 21]
        for p in self.parameters():
            p.requires_grad = False

    def forward(self, images, output_last_feature=False):
        h1 = self.slice1(images)
        h2 = self.slice2(h1)
        h3 = self.slice3(h2)
        h4 = self.slice4(h3)
        if output_last_feature:
            return h4
        else:
            return h1, h2, h3, h4

class WeightAndBias(nn.Module):
    """Weight/Bias Network"""

    def __init__(self, in_channels = 512):
        super(WeightAndBias,self).__init__()
        self.dwconv1 = DepthWiseConv2d(in_channels, 128, 3, 128, 2)
        self.dwconv2 = DepthWiseConv2d(128, 64, 3, 64, 2)
        # self.adapool1 = nn.AdaptiveMaxPool2d()
        self.dwconv3 = DepthWiseConv2d(64, 64, 3, 64, 2)
        # self.adapool2 = nn.AdaptiveMaxPool2d()

    def forward(self, x):
        out = self.dwconv1(x)
        out = self.dwconv2(out)
        print(out.shape)
        # out = self.adapool1(out)
        out = self.dwconv3(out)
        # out = self.adapool2(out)
        return out
#test case
s = torch.rand(8,3,256,256)
out = VGGEncoder()(s, True)
out = WeightAndBias(512)(out)
print(out.shape)

Hope you help me. Thank you so much.

@ycjing
Copy link
Owner

ycjing commented Dec 5, 2021

Hi @ycjing I have 2 questions:

  1. Can you provide information about the AdaptivePooling layer, specifically the target size.
  2. Is add method in Fig 4 a concat channel or just like basic residual block?
    Thank you so much.
  1. Our adaptive pooling layer is defined as follows:
    nn.AdaptiveAvgPool2d((1,1))

  2. Please be noted that the 'add' operation is not part of the residual blocks. It simply adds the output feature maps from the first few layers and the last few layers.

@ycjing
Copy link
Owner

ycjing commented Dec 5, 2021

Hi @ycjing I got error.

Traceback (most recent call last):
  File "model.py", line 197, in <module>
    out = WeightAndBias(512)(out)
  File "/home/truongson/.local/bin/.virtualenvs/dl4cv/lib/python3.6/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "model.py", line 79, in forward
    out = self.dwconv2(out)
  File "/home/truongson/.local/bin/.virtualenvs/dl4cv/lib/python3.6/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "model.py", line 25, in forward
    out = self.pointwise(out)
  File "/home/truongson/.local/bin/.virtualenvs/dl4cv/lib/python3.6/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/truongson/.local/bin/.virtualenvs/dl4cv/lib/python3.6/site-packages/torch/nn/modules/container.py", line 139, in forward
    input = module(input)
  File "/home/truongson/.local/bin/.virtualenvs/dl4cv/lib/python3.6/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/truongson/.local/bin/.virtualenvs/dl4cv/lib/python3.6/site-packages/torch/nn/modules/instancenorm.py", line 59, in forward
    self.training or not self.track_running_stats, self.momentum, self.eps)
  File "/home/truongson/.local/bin/.virtualenvs/dl4cv/lib/python3.6/site-packages/torch/nn/functional.py", line 2325, in instance_norm
    _verify_spatial_size(input.size())
  File "/home/truongson/.local/bin/.virtualenvs/dl4cv/lib/python3.6/site-packages/torch/nn/functional.py", line 2292, in _verify_spatial_size
    raise ValueError("Expected more than 1 spatial element when training, got input size {}".format(size))
ValueError: Expected more than 1 spatial element when training, got input size torch.Size([8, 64, 1, 1])

Here is my code:

class DepthWiseConv2d(nn.Module):
    def __init__(self, in_channels, out_channels, kernel_size, groups, stride):
        super(DepthWiseConv2d, self).__init__()
        self.depthwise = nn.Sequential(
                nn.Conv2d(in_channels, in_channels, kernel_size = kernel_size,
                    groups = groups, stride = stride, padding = 1),
                nn.InstanceNorm2d(in_channels),
                nn.ReLU(True)
        )
        self.pointwise = nn.Sequential(
                nn.Conv2d(in_channels, out_channels, kernel_size = kernel_size,
                    stride = stride),
                nn.InstanceNorm2d(out_channels),
                nn.ReLU(True)
        )

    def forward(self, x):
        out = self.depthwise(x)
        out = self.pointwise(out)
        return out

class VGGEncoder(nn.Module):
    def __init__(self):
        super().__init__()
        vgg = vgg19(pretrained=True).features
        self.slice1 = vgg[: 2]
        self.slice2 = vgg[2: 7]
        self.slice3 = vgg[7: 12]
        self.slice4 = vgg[12: 21]
        for p in self.parameters():
            p.requires_grad = False

    def forward(self, images, output_last_feature=False):
        h1 = self.slice1(images)
        h2 = self.slice2(h1)
        h3 = self.slice3(h2)
        h4 = self.slice4(h3)
        if output_last_feature:
            return h4
        else:
            return h1, h2, h3, h4

class WeightAndBias(nn.Module):
    """Weight/Bias Network"""

    def __init__(self, in_channels = 512):
        super(WeightAndBias,self).__init__()
        self.dwconv1 = DepthWiseConv2d(in_channels, 128, 3, 128, 2)
        self.dwconv2 = DepthWiseConv2d(128, 64, 3, 64, 2)
        # self.adapool1 = nn.AdaptiveMaxPool2d()
        self.dwconv3 = DepthWiseConv2d(64, 64, 3, 64, 2)
        # self.adapool2 = nn.AdaptiveMaxPool2d()

    def forward(self, x):
        out = self.dwconv1(x)
        out = self.dwconv2(out)
        print(out.shape)
        # out = self.adapool1(out)
        out = self.dwconv3(out)
        # out = self.adapool2(out)
        return out
#test case
s = torch.rand(8,3,256,256)
out = VGGEncoder()(s, True)
out = WeightAndBias(512)(out)
print(out.shape)

Hope you help me. Thank you so much.

Hi @sonnguyen129

Please refer to my previous reply and be careful about the output dimensions.

Best,

@sonnguyen129
Copy link
Contributor

Hi @ycjing
Thanks for your reply, thanks to that I fixed the error. Despite reading the paper quite carefully, I still don't understand how Weight/Bias Network generates weight and bias. How to get that weight and bias in Pytorch?
Thank you so much

@ycjing
Copy link
Owner

ycjing commented Dec 7, 2021

Hi @sonnguyen129

Thank you for your interests. From your code, I think you have already got the point, i.e., dynamically predicting the weight and bias via the weight and bias networks. Could you please further elaborate your question? Thanks!

Best,

@sonnguyen129
Copy link
Contributor

Hi @ycjing
Sorry for my unclear question. As I understand it, the style image after encoded by VGG will go through the weight and bias network. Do the generated weight and bias are the weights and biases of the last conv layer of the weight/bias network?(In my code in dwconv3).
Thank you so much.

@ycjing
Copy link
Owner

ycjing commented Dec 8, 2021

Hi @sonnguyen129

No problem. The weight and bias are, actually, the output of the corresponding weight/bias networks, which is somewhat similar to the dynamic filter network (https://arxiv.org/abs/1605.09673).

Cheers,
Yongcheng

@sonnguyen129
Copy link
Contributor

Hi @ycjing
I already read dynamic filter network. However, if the weight and bias are both outputs of the network, the values will be the same, right? But when reading about dynamic convolution in Pytorch, the weight and bias should be different. I hope you answer. Thank you so much.
image
image

@ycjing
Copy link
Owner

ycjing commented Dec 14, 2021

Hi @sonnguyen129

Thank you for your interest. The values are, in fact, not the same. As demonstrated in the figure and explained in the paper, we use a separate weight net and bias net to produce the corresponding weight and bias.

Best,
Yongcheng

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants