疑问：结构转换后输出怎么不一样？ #12

luhaiwei · 2021-12-02T07:21:55Z

您好，最近拜读了您的论文，基于目前的重参数化提出了新颖性的观点，想继续研究以下代码，但是发现输入相同的值，不知道为什么结构转换后输出不一样？

fxmeng · 2021-12-02T08:30:04Z

需要在model.eval()下对比，因为结构不同了，model.train()时，BN层的均值方差会有些变化。

luhaiwei · 2021-12-03T09:26:55Z

需要在model.eval()下对比，因为结构不同了，model.train()时，BN层的均值方差会有些变化。

感谢作者的答复！

还有另外一个问题想请教一下：

上图大致是MobileNetv2的一个Block，从左到右，分别为正常训练、networkslim稀疏稀疏训练、“RM(reserving and merging)”、平坦网络裁剪

对于第三幅图不知道画的正不正确，有一些疑问：
第一种理解就是Block的输入经过跳跃连接的BN层之后输出和另一条分支的输出合并（如第三幅图所示），之后到第四幅图进行裁剪；
第二种理解就是在第三幅图（RM）中直接将Block的输入和另一条分支的输出合并，跳跃连接中的BN层的训练只是为了在第四幅图中网络裁剪时判断保留哪些重要的特征作为Block的输入。

fxmeng · 2021-12-05T03:44:57Z

我自己只用过第一种方法，但是剪枝的范式你可以多试试，相信可能会有比我用的更好的方法。

luhaiwei · 2021-12-06T06:32:43Z

感谢您的回复！

您好，想问一下，在网络裁剪的时候，当block含有跳跃连接，按照剪枝范式，如果在跳跃连接分支中裁剪后剩余通道索引为[1,4,5,7,8],另一个分支裁剪后剩余索引是[2,3,5,9],这样的情况该怎么处理呢？谢谢！

fxmeng · 2021-12-06T08:15:01Z

我是只把两个加在一起比较差的通道删除了，这种细节处理就像调参一样，需要多对比。

luhaiwei · 2021-12-08T06:52:09Z

我是只把两个加在一起比较差的通道删除了，这种细节处理就像调参一样，需要多对比。

谢谢您的回答，感觉受益匪浅，学习到了很多 ~
请问判断加在一起差一些的通道标准是什么？是根据通道本身值的大小，或者是根据删除后准确率的下降程度？

XinyingZheng · 2021-12-23T01:40:42Z

你好，你转换前后的结果对齐了么，我已经设置成了eval模式，还是结果不一样，能不能和你沟通交流一下，不知道是不是我理解的有偏差，我微信17809207817，非常非常感谢呀

fxmeng · 2021-12-23T08:36:30Z

你好，你转换前后的结果对齐了么，我已经设置成了eval模式，还是结果不一样，能不能和你沟通交流一下，不知道是不是我理解的有偏差，我微信17809207817，非常非常感谢呀

需要注意：转换前的模型，还有转换后的模型都需要设置为eval模式。如果检查没有问题，可以把你转换的代码以及结果贴一下，我看看问题出在哪里了。

XinyingZheng · 2021-12-23T08:51:51Z

我转的resnet100, 和你给的resnet结构不一样，我按照论文的理解自己加了deploy函数import torch from torch import nn from torch.nn import functional as F

…

__all__ = ['iresnet18', 'iresnet34', 'iresnet50', 'iresnet100', 'iresnet200'] def conv3x3(in_planes, out_planes, stride=1, groups=1, dilation=1): """3x3 convolution with padding""" return nn.Conv2d(in_planes, out_planes, kernel_size=3, stride=stride, padding=dilation, groups=groups, bias=False, dilation=dilation) def conv1x1(in_planes, out_planes, stride=1): """1x1 convolution""" return nn.Conv2d(in_planes, out_planes, kernel_size=1, stride=stride, bias=False) class IBasicBlock(nn.Module): expansion = 1 def __init__(self, inplanes, mid_planes, planes, stride=1, downsample=None, groups=1, base_width=64, dilation=1): super(IBasicBlock, self).__init__() if groups != 1 or base_width != 64: raise ValueError('BasicBlock only supports groups=1 and base_width=64') if dilation > 1: raise NotImplementedError("Dilation > 1 not supported in BasicBlock") self.in_planes = inplanes self.planes = planes self.stride = stride self.mid_planes = mid_planes - planes + self.in_planes self.bn1 = nn.BatchNorm2d(self.in_planes, eps=1e-05,) self.conv1 = conv3x3(inplanes, self.planes) self.bn2 = nn.BatchNorm2d(self.planes, eps=1e-05,) self.prelu = nn.PReLU(self.planes) self.relu = nn.ReLU(inplace=True) self.conv2 = conv3x3(self.planes, self.planes, stride) self.bn3 = nn.BatchNorm2d(planes, eps=1e-05,) self.downsample = downsample def forward(self, x): identity = x out = self.bn1(x) out = self.conv1(out) out = self.bn2(out) out = self.prelu(out) out = self.conv2(out) out = self.bn3(out) if self.downsample is not None: identity = self.downsample(x) print('self.downsample', self.downsample) out += identity return out def deploy(self, merge_bn): idconv1 = nn.Conv2d(self.in_planes, self.mid_planes, kernel_size=3, stride=self.stride, padding=1, bias=False).eval() idbn1 = nn.BatchNorm2d(self.mid_planes).eval() nn.init.dirac_(idconv1.weight.data[:self.in_planes]) idbn1.weight.data[:self.in_planes] = 1 idbn1.bias.data[:self.in_planes] = 0 idbn1.running_mean.data[:self.in_planes] = 0 idbn1.running_var.data[:self.in_planes] = 1 idconv1.weight.data[self.in_planes:] = self.conv1.weight.data idbn1.weight.data[self.planes:] = self.bn1.weight.data idbn1.bias.data[self.planes:] = self.bn1.bias.data idbn1.running_mean.data[self.planes:] = self.bn1.running_mean idbn1.running_var.data[self.planes:] = self.bn1.running_var idconv2 = nn.Conv2d(self.mid_planes, self.planes, kernel_size=3, stride=1, padding=1, bias=False).eval() # idbn2 = nn.BatchNorm2d(self.planes).eval() downsample_bias = 0 if self.in_planes == self.planes: nn.init.dirac_(idconv2.weight.data[:, :self.in_planes]) else: idconv2.weight.data[:, :self.in_planes], downsample_bias = self.fuse( F.pad(self.downsample[0].weight.data, [1, 1, 1, 1]), self.downsample[1].running_mean, self.downsample[1].running_var, self.downsample[1].weight, self.downsample[1].bias, self.downsample[1].eps) # if merge_bn: # return [torch.nn.utils.fuse_conv_bn_eval(idconv1, idbn1), self.relu, # torch.nn.utils.fuse_conv_bn_eval(idconv2, idbn2), self.relu] # else: return [idconv1, idbn1, self.relu, idconv2] def fuse(self, conv_w, bn_rm, bn_rv, bn_w, bn_b, eps): bn_var_rsqrt = torch.rsqrt(bn_rv + eps) conv_w = conv_w * (bn_w * bn_var_rsqrt).reshape([-1] + [1] * (len(conv_w.shape) - 1)) conv_b = bn_rm * bn_var_rsqrt * bn_w - bn_b return conv_w, conv_b class Flatten(nn.Module): def forward(self, input): return input.view(input.size(0), -1) class IResNet(nn.Module): fc_scale = 7 * 7 def __init__(self, block, layers, dropout=0, num_features=128, zero_init_residual=False, groups=1, width_per_group=64, replace_stride_with_dilation=None, fp16=False): super(IResNet, self).__init__() self.fp16 = fp16 self.inplanes = 64 self.dilation = 1 if replace_stride_with_dilation is None: replace_stride_with_dilation = [False, False, False] if len(replace_stride_with_dilation) != 3: raise ValueError("replace_stride_with_dilation should be None " "or a 3-element tuple, got {}".format(replace_stride_with_dilation)) self.groups = groups self.base_width = width_per_group self.conv1 = nn.Conv2d(3, self.inplanes, kernel_size=3, stride=1, padding=1, bias=False) self.bn1 = nn.BatchNorm2d(self.inplanes, eps=1e-05) self.prelu = nn.PReLU(self.inplanes) self.layer1 = self._make_layer(block, 64, layers[0], stride=2) self.layer2 = self._make_layer(block, 128, layers[1], stride=2, dilate=replace_stride_with_dilation[0]) self.layer3 = self._make_layer(block, 256, layers[2], stride=2, dilate=replace_stride_with_dilation[1]) self.layer4 = self._make_layer(block, 512, layers[3], stride=2, dilate=replace_stride_with_dilation[2]) self.bn2 = nn.BatchNorm2d(512 * block.expansion, eps=1e-05,) self.dropout = nn.Dropout(p=dropout, inplace=True) self.flatten = Flatten() self.fc = nn.Linear(512 * block.expansion * self.fc_scale, num_features) self.features = nn.BatchNorm1d(num_features, eps=1e-05) nn.init.constant_(self.features.weight, 1.0) self.features.weight.requires_grad = False for m in self.modules(): if isinstance(m, nn.Conv2d): nn.init.normal_(m.weight, 0, 0.1) elif isinstance(m, (nn.BatchNorm2d, nn.GroupNorm)): nn.init.constant_(m.weight, 1) nn.init.constant_(m.bias, 0) if zero_init_residual: for m in self.modules(): if isinstance(m, IBasicBlock): nn.init.constant_(m.bn2.weight, 0) def _make_layer(self, block, planes, blocks, stride=1, dilate=False): downsample = None previous_dilation = self.dilation if dilate: self.dilation *= stride stride = 1 if stride != 1 or self.inplanes != planes * block.expansion: downsample = nn.Sequential( conv1x1(self.inplanes, planes * block.expansion, stride), nn.BatchNorm2d(planes * block.expansion, eps=1e-05, ), ) layers = [] layers.append( block(self.inplanes, planes*2, planes, stride, downsample, self.groups, self.base_width, previous_dilation)) self.inplanes = planes * block.expansion for _ in range(1, blocks): layers.append( block(self.inplanes, planes * 2, planes, groups=self.groups, base_width=self.base_width, dilation=self.dilation)) return nn.Sequential(*layers) def forward(self, x): with torch.cuda.amp.autocast(self.fp16): x = self.conv1(x) x = self.bn1(x) x = self.prelu(x) x = self.layer1(x) x = self.layer2(x) x = self.layer3(x) x = self.layer4(x) x = self.bn2(x) # x = torch.flatten(x, 1) x = self.flatten(x) x = self.dropout(x) x = self.fc(x.float() if self.fp16 else x) x = self.features(x) return x # def deploy(self): # for m in self.modules: # if isinstance(m, IBasicBlock): # m.deploy() def deploy(self, merge_bn=False): def foo(net): global blocks childrens = list(net.children()) if isinstance(net, IBasicBlock): blocks += net.deploy(merge_bn) elif not childrens: if isinstance(net, nn.BatchNorm2d) and isinstance(blocks[-1], nn.Conv2d): blocks[-1] = torch.nn.utils.fuse_conv_bn_eval(blocks[-1], net) else: print('net', net) blocks += [net] else: for c in childrens: foo(c) global blocks blocks = [] foo(self.eval()) return nn.Sequential(*blocks) def _iresnet(arch, block, layers, pretrained, progress, **kwargs): model = IResNet(block, layers, **kwargs) if pretrained: raise ValueError() return model def iresnet18(pretrained=False, progress=True, **kwargs): return _iresnet('iresnet18', IBasicBlock, [2, 2, 2, 2], pretrained, progress, **kwargs) def iresnet34(pretrained=False, progress=True, **kwargs): return _iresnet('iresnet34', IBasicBlock, [3, 4, 6, 3], pretrained, progress, **kwargs) def iresnet50(pretrained=False, progress=True, **kwargs): return _iresnet('iresnet50', IBasicBlock, [3, 4, 14, 3], pretrained, progress, **kwargs) def iresnet100(pretrained=False, progress=True, **kwargs): return _iresnet('iresnet100', IBasicBlock, [3, 13, 30, 3], pretrained, progress, **kwargs) def iresnet200(pretrained=False, progress=True, **kwargs): return _iresnet('iresnet200', IBasicBlock, [6, 26, 60, 6], pretrained, progress, **kwargs) 我的resnet 残差结构如下

------------------ 原始邮件 ------------------ 发件人: "fxmeng/RMNet" ***@***.***>; 发送时间: 2021年12月23日(星期四) 下午4:36 ***@***.***>; ***@***.******@***.***>; 主题: Re: [fxmeng/RMNet] 疑问：结构转换后输出怎么不一样？ (Issue #12) 你好，你转换前后的结果对齐了么，我已经设置成了eval模式，还是结果不一样，能不能和你沟通交流一下，不知道是不是我理解的有偏差，我微信17809207817，非常非常感谢呀需要注意：转换前的模型，还有转换后的模型都需要设置为eval模式。如果检查没有问题，可以把你转换的代码以及结果贴一下，我看看问题出在哪里了。 — Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android. You are receiving this because you commented.Message ID: ***@***.***>

fxmeng · 2021-12-24T07:45:59Z

我转的resnet100, 和你给的resnet结构不一样，我按照论文的理解自己加了deploy函数

需要修改代码的时候，建议先把基本的IBasicBlock写完，测试转化前后等价了，再完成整个模型的代码。
你的这个IBasicBlock代码的forward函数前向传播过程是（bn1，conv1，bn2，prelu，conv2，bn3），
但是你deploy的时候：

bn1在前，conv1在后，不该照搬我的实现，建议解决方案：
- 把bn1和conv1合并，可以参考https://github.com/fxmeng/RMNet/blob/0829642895f23c7787ff18b56507851faaa15331/models/rmobilenet.py#L22，不要用torch.nn.utils.fuse_conv_bn这个，这个的顺序和你的也是反的。
- 保证residual通过bn1，conv1结果不变，这时conv1不能dirac初始化了，需要使用bn1对输入变化的逆变换。（好处是变换完可以保留bn1，finetune效果好；但是实现难度大，建议在充分理解算法之后再尝试）
对bn1的操作应该是bn2的，bn3你没有进行操作，其实bn3的操作是比较复杂的，建议：
- 把conv2和bn3合并，这时不需要考虑bn3，只需考虑conv2的weight和bias就行
- 参考我代码里对bn2的实现
对prelu的处理是错误的：
- 如果你训练的时候就想用prelu，那参考我的mobilenet实现
- 如果你训练时候想用relu，参考我的resnet实现
训练时候残差连接最后没有接relu，效果可能会不好
- 如果是故意没有用relu，那上一条就只能用prelu了，且可以将连续两个33卷积合并为一个55卷积

XinyingZheng · 2021-12-24T08:00:59Z

好，非常感谢作者给的思路，我再好好改改

…

------------------ 原始邮件 ------------------ 发件人: "fxmeng/RMNet" ***@***.***>; 发送时间: 2021年12月24日(星期五) 下午3:46 ***@***.***>; ***@***.******@***.***>; 主题: Re: [fxmeng/RMNet] 疑问：结构转换后输出怎么不一样？ (Issue #12) 我转的resnet100, 和你给的resnet结构不一样，我按照论文的理解自己加了deploy函数需要修改代码的时候，建议先把基本的IBasicBlock写完，测试转化前后等价了，再完成整个模型的代码。你的这个IBasicBlock代码的forward函数前向传播过程是（bn1，conv1，bn2，prelu，conv2，bn3），但是你deploy的时候： bn1在前，conv1在后，不该照搬我的实现，建议解决方案：把bn1和conv1合并，可以参考https://github.com/fxmeng/RMNet/blob/0829642895f23c7787ff18b56507851faaa15331/models/rmobilenet.py#L22，不要用torch.nn.utils.fuse_conv_bn这个，这个的顺序和你的也是反的。保证residual通过bn1，conv1结果不变，这时conv1不能dirac初始化了，需要使用bn1对输入变化的逆变换。（好处是变换完可以保留bn1，finetune效果好；但是实现难度大，建议在充分理解算法之后再尝试）对bn1的操作应该是bn2的，bn3你没有进行操作，其实bn3的操作是比较复杂的，建议：把conv2和bn3合并，这时不需要考虑bn3，只需考虑conv2的weight和bias就行参考我代码里对bn2的实现对prelu的处理是错误的：如果你训练的时候就想用prelu，那参考我的mobilenet实现如果你训练时候想用relu，参考我的resnet实现训练时候残差连接最后没有接relu，效果可能会不好如果是故意没有用relu，那上一条就只能用prelu了，且可以将连续两个33卷积合并为一个55卷积 — Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android. You are receiving this because you commented.Message ID: ***@***.***>

luhaiwei · 2021-12-28T03:08:54Z

我是只把两个加在一起比较差的通道删除了，这种细节处理就像调参一样，需要多对比。

作者，您好，请问判断比较差的通道依据是什么哈？有点不太明白，还望指教，谢谢

fxmeng · 2021-12-28T06:43:45Z

我是只把两个加在一起比较差的通道删除了，这种细节处理就像调参一样，需要多对比。

作者，您好，请问判断比较差的通道依据是什么哈？有点不太明白，还望指教，谢谢

评价通道好坏的方法非常多，每种方法都各有各的道理，我这里使用的方法是使用用一个mask乘在待裁剪的通道上，在训练的时候稀疏化这个mask，当一个通道对应的值被稀疏化到接近0，就把这个通道删除。

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

疑问：结构转换后输出怎么不一样？ #12

疑问：结构转换后输出怎么不一样？ #12

luhaiwei commented Dec 2, 2021

fxmeng commented Dec 2, 2021

luhaiwei commented Dec 3, 2021

fxmeng commented Dec 5, 2021

luhaiwei commented Dec 6, 2021

fxmeng commented Dec 6, 2021

luhaiwei commented Dec 8, 2021

XinyingZheng commented Dec 23, 2021 •

edited

Loading

fxmeng commented Dec 23, 2021

XinyingZheng commented Dec 23, 2021 via email

fxmeng commented Dec 24, 2021

XinyingZheng commented Dec 24, 2021 via email

luhaiwei commented Dec 28, 2021

fxmeng commented Dec 28, 2021

疑问：结构转换后输出怎么不一样？ #12

疑问：结构转换后输出怎么不一样？ #12

Comments

luhaiwei commented Dec 2, 2021

fxmeng commented Dec 2, 2021

luhaiwei commented Dec 3, 2021

fxmeng commented Dec 5, 2021

luhaiwei commented Dec 6, 2021

fxmeng commented Dec 6, 2021

luhaiwei commented Dec 8, 2021

XinyingZheng commented Dec 23, 2021 • edited Loading

fxmeng commented Dec 23, 2021

XinyingZheng commented Dec 23, 2021 via email

fxmeng commented Dec 24, 2021

XinyingZheng commented Dec 24, 2021 via email

luhaiwei commented Dec 28, 2021

fxmeng commented Dec 28, 2021

XinyingZheng commented Dec 23, 2021 •

edited

Loading