-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
疑问:结构转换后输出怎么不一样? #12
Comments
需要在model.eval()下对比,因为结构不同了,model.train()时,BN层的均值方差会有些变化。 |
我自己只用过第一种方法,但是剪枝的范式你可以多试试,相信可能会有比我用的更好的方法。 |
我是只把两个加在一起比较差的通道删除了,这种细节处理就像调参一样,需要多对比。 |
谢谢您的回答,感觉受益匪浅,学习到了很多 ~ |
你好,你转换前后的结果对齐了么,我已经设置成了eval模式,还是结果不一样,能不能和你沟通交流一下,不知道是不是我理解的有偏差, 我微信17809207817,非常非常感谢呀 |
需要注意:转换前的模型,还有转换后的模型都需要设置为eval模式。如果检查没有问题,可以把你转换的代码以及结果贴一下,我看看问题出在哪里了。 |
我转的resnet100, 和你给的resnet结构不一样,我按照论文的理解自己加了deploy函数import torch
from torch import nn
from torch.nn import functional as F
…__all__ = ['iresnet18', 'iresnet34', 'iresnet50', 'iresnet100', 'iresnet200']
def conv3x3(in_planes, out_planes, stride=1, groups=1, dilation=1):
"""3x3 convolution with padding"""
return nn.Conv2d(in_planes,
out_planes,
kernel_size=3,
stride=stride,
padding=dilation,
groups=groups,
bias=False,
dilation=dilation)
def conv1x1(in_planes, out_planes, stride=1):
"""1x1 convolution"""
return nn.Conv2d(in_planes,
out_planes,
kernel_size=1,
stride=stride,
bias=False)
class IBasicBlock(nn.Module):
expansion = 1
def __init__(self, inplanes, mid_planes, planes, stride=1, downsample=None,
groups=1, base_width=64, dilation=1):
super(IBasicBlock, self).__init__()
if groups != 1 or base_width != 64:
raise ValueError('BasicBlock only supports groups=1 and base_width=64')
if dilation > 1:
raise NotImplementedError("Dilation > 1 not supported in BasicBlock")
self.in_planes = inplanes
self.planes = planes
self.stride = stride
self.mid_planes = mid_planes - planes + self.in_planes
self.bn1 = nn.BatchNorm2d(self.in_planes, eps=1e-05,)
self.conv1 = conv3x3(inplanes, self.planes)
self.bn2 = nn.BatchNorm2d(self.planes, eps=1e-05,)
self.prelu = nn.PReLU(self.planes)
self.relu = nn.ReLU(inplace=True)
self.conv2 = conv3x3(self.planes, self.planes, stride)
self.bn3 = nn.BatchNorm2d(planes, eps=1e-05,)
self.downsample = downsample
def forward(self, x):
identity = x
out = self.bn1(x)
out = self.conv1(out)
out = self.bn2(out)
out = self.prelu(out)
out = self.conv2(out)
out = self.bn3(out)
if self.downsample is not None:
identity = self.downsample(x)
print('self.downsample', self.downsample)
out += identity
return out
def deploy(self, merge_bn):
idconv1 = nn.Conv2d(self.in_planes, self.mid_planes, kernel_size=3, stride=self.stride, padding=1,
bias=False).eval()
idbn1 = nn.BatchNorm2d(self.mid_planes).eval()
nn.init.dirac_(idconv1.weight.data[:self.in_planes])
idbn1.weight.data[:self.in_planes] = 1
idbn1.bias.data[:self.in_planes] = 0
idbn1.running_mean.data[:self.in_planes] = 0
idbn1.running_var.data[:self.in_planes] = 1
idconv1.weight.data[self.in_planes:] = self.conv1.weight.data
idbn1.weight.data[self.planes:] = self.bn1.weight.data
idbn1.bias.data[self.planes:] = self.bn1.bias.data
idbn1.running_mean.data[self.planes:] = self.bn1.running_mean
idbn1.running_var.data[self.planes:] = self.bn1.running_var
idconv2 = nn.Conv2d(self.mid_planes, self.planes, kernel_size=3, stride=1, padding=1, bias=False).eval()
# idbn2 = nn.BatchNorm2d(self.planes).eval()
downsample_bias = 0
if self.in_planes == self.planes:
nn.init.dirac_(idconv2.weight.data[:, :self.in_planes])
else:
idconv2.weight.data[:, :self.in_planes], downsample_bias = self.fuse(
F.pad(self.downsample[0].weight.data, [1, 1, 1, 1]), self.downsample[1].running_mean,
self.downsample[1].running_var, self.downsample[1].weight, self.downsample[1].bias,
self.downsample[1].eps)
# if merge_bn:
# return [torch.nn.utils.fuse_conv_bn_eval(idconv1, idbn1), self.relu,
# torch.nn.utils.fuse_conv_bn_eval(idconv2, idbn2), self.relu]
# else:
return [idconv1, idbn1, self.relu, idconv2]
def fuse(self, conv_w, bn_rm, bn_rv, bn_w, bn_b, eps):
bn_var_rsqrt = torch.rsqrt(bn_rv + eps)
conv_w = conv_w * (bn_w * bn_var_rsqrt).reshape([-1] + [1] * (len(conv_w.shape) - 1))
conv_b = bn_rm * bn_var_rsqrt * bn_w - bn_b
return conv_w, conv_b
class Flatten(nn.Module):
def forward(self, input):
return input.view(input.size(0), -1)
class IResNet(nn.Module):
fc_scale = 7 * 7
def __init__(self,
block, layers, dropout=0, num_features=128, zero_init_residual=False,
groups=1, width_per_group=64, replace_stride_with_dilation=None, fp16=False):
super(IResNet, self).__init__()
self.fp16 = fp16
self.inplanes = 64
self.dilation = 1
if replace_stride_with_dilation is None:
replace_stride_with_dilation = [False, False, False]
if len(replace_stride_with_dilation) != 3:
raise ValueError("replace_stride_with_dilation should be None "
"or a 3-element tuple, got {}".format(replace_stride_with_dilation))
self.groups = groups
self.base_width = width_per_group
self.conv1 = nn.Conv2d(3, self.inplanes, kernel_size=3, stride=1, padding=1, bias=False)
self.bn1 = nn.BatchNorm2d(self.inplanes, eps=1e-05)
self.prelu = nn.PReLU(self.inplanes)
self.layer1 = self._make_layer(block, 64, layers[0], stride=2)
self.layer2 = self._make_layer(block,
128,
layers[1],
stride=2,
dilate=replace_stride_with_dilation[0])
self.layer3 = self._make_layer(block,
256,
layers[2],
stride=2,
dilate=replace_stride_with_dilation[1])
self.layer4 = self._make_layer(block,
512,
layers[3],
stride=2,
dilate=replace_stride_with_dilation[2])
self.bn2 = nn.BatchNorm2d(512 * block.expansion, eps=1e-05,)
self.dropout = nn.Dropout(p=dropout, inplace=True)
self.flatten = Flatten()
self.fc = nn.Linear(512 * block.expansion * self.fc_scale, num_features)
self.features = nn.BatchNorm1d(num_features, eps=1e-05)
nn.init.constant_(self.features.weight, 1.0)
self.features.weight.requires_grad = False
for m in self.modules():
if isinstance(m, nn.Conv2d):
nn.init.normal_(m.weight, 0, 0.1)
elif isinstance(m, (nn.BatchNorm2d, nn.GroupNorm)):
nn.init.constant_(m.weight, 1)
nn.init.constant_(m.bias, 0)
if zero_init_residual:
for m in self.modules():
if isinstance(m, IBasicBlock):
nn.init.constant_(m.bn2.weight, 0)
def _make_layer(self, block, planes, blocks, stride=1, dilate=False):
downsample = None
previous_dilation = self.dilation
if dilate:
self.dilation *= stride
stride = 1
if stride != 1 or self.inplanes != planes * block.expansion:
downsample = nn.Sequential(
conv1x1(self.inplanes, planes * block.expansion, stride),
nn.BatchNorm2d(planes * block.expansion, eps=1e-05, ),
)
layers = []
layers.append(
block(self.inplanes, planes*2, planes, stride, downsample, self.groups,
self.base_width, previous_dilation))
self.inplanes = planes * block.expansion
for _ in range(1, blocks):
layers.append(
block(self.inplanes,
planes * 2,
planes,
groups=self.groups,
base_width=self.base_width,
dilation=self.dilation))
return nn.Sequential(*layers)
def forward(self, x):
with torch.cuda.amp.autocast(self.fp16):
x = self.conv1(x)
x = self.bn1(x)
x = self.prelu(x)
x = self.layer1(x)
x = self.layer2(x)
x = self.layer3(x)
x = self.layer4(x)
x = self.bn2(x)
# x = torch.flatten(x, 1)
x = self.flatten(x)
x = self.dropout(x)
x = self.fc(x.float() if self.fp16 else x)
x = self.features(x)
return x
# def deploy(self):
# for m in self.modules:
# if isinstance(m, IBasicBlock):
# m.deploy()
def deploy(self, merge_bn=False):
def foo(net):
global blocks
childrens = list(net.children())
if isinstance(net, IBasicBlock):
blocks += net.deploy(merge_bn)
elif not childrens:
if isinstance(net, nn.BatchNorm2d) and isinstance(blocks[-1], nn.Conv2d):
blocks[-1] = torch.nn.utils.fuse_conv_bn_eval(blocks[-1], net)
else:
print('net', net)
blocks += [net]
else:
for c in childrens:
foo(c)
global blocks
blocks = []
foo(self.eval())
return nn.Sequential(*blocks)
def _iresnet(arch, block, layers, pretrained, progress, **kwargs):
model = IResNet(block, layers, **kwargs)
if pretrained:
raise ValueError()
return model
def iresnet18(pretrained=False, progress=True, **kwargs):
return _iresnet('iresnet18', IBasicBlock, [2, 2, 2, 2], pretrained,
progress, **kwargs)
def iresnet34(pretrained=False, progress=True, **kwargs):
return _iresnet('iresnet34', IBasicBlock, [3, 4, 6, 3], pretrained,
progress, **kwargs)
def iresnet50(pretrained=False, progress=True, **kwargs):
return _iresnet('iresnet50', IBasicBlock, [3, 4, 14, 3], pretrained,
progress, **kwargs)
def iresnet100(pretrained=False, progress=True, **kwargs):
return _iresnet('iresnet100', IBasicBlock, [3, 13, 30, 3], pretrained, progress, **kwargs)
def iresnet200(pretrained=False, progress=True, **kwargs):
return _iresnet('iresnet200', IBasicBlock, [6, 26, 60, 6], pretrained,
progress, **kwargs)
我的resnet 残差结构如下
------------------ 原始邮件 ------------------
发件人: "fxmeng/RMNet" ***@***.***>;
发送时间: 2021年12月23日(星期四) 下午4:36
***@***.***>;
***@***.******@***.***>;
主题: Re: [fxmeng/RMNet] 疑问:结构转换后输出怎么不一样? (Issue #12)
你好,你转换前后的结果对齐了么,我已经设置成了eval模式,还是结果不一样,能不能和你沟通交流一下,不知道是不是我理解的有偏差, 我微信17809207817,非常非常感谢呀
需要注意:转换前的模型,还有转换后的模型都需要设置为eval模式。如果检查没有问题,可以把你转换的代码以及结果贴一下,我看看问题出在哪里了。
—
Reply to this email directly, view it on GitHub, or unsubscribe.
Triage notifications on the go with GitHub Mobile for iOS or Android.
You are receiving this because you commented.Message ID: ***@***.***>
|
需要修改代码的时候,建议先把基本的IBasicBlock写完,测试转化前后等价了,再完成整个模型的代码。
|
好,非常感谢作者给的思路,我再好好改改
…------------------ 原始邮件 ------------------
发件人: "fxmeng/RMNet" ***@***.***>;
发送时间: 2021年12月24日(星期五) 下午3:46
***@***.***>;
***@***.******@***.***>;
主题: Re: [fxmeng/RMNet] 疑问:结构转换后输出怎么不一样? (Issue #12)
我转的resnet100, 和你给的resnet结构不一样,我按照论文的理解自己加了deploy函数
需要修改代码的时候,建议先把基本的IBasicBlock写完,测试转化前后等价了,再完成整个模型的代码。
你的这个IBasicBlock代码的forward函数前向传播过程是(bn1,conv1,bn2,prelu,conv2,bn3),
但是你deploy的时候:
bn1在前,conv1在后,不该照搬我的实现,建议解决方案:
把bn1和conv1合并,可以参考https://github.com/fxmeng/RMNet/blob/0829642895f23c7787ff18b56507851faaa15331/models/rmobilenet.py#L22, 不要用torch.nn.utils.fuse_conv_bn这个,这个的顺序和你的也是反的。
保证residual通过bn1,conv1结果不变,这时conv1不能dirac初始化了,需要使用bn1对输入变化的逆变换。(好处是变换完可以保留bn1,finetune效果好;但是实现难度大,建议在充分理解算法之后再尝试)
对bn1的操作应该是bn2的,bn3你没有进行操作,其实bn3的操作是比较复杂的,建议:
把conv2和bn3合并,这时不需要考虑bn3,只需考虑conv2的weight和bias就行
参考我代码里对bn2的实现
对prelu的处理是错误的:
如果你训练的时候就想用prelu,那参考我的mobilenet实现
如果你训练时候想用relu,参考我的resnet实现
训练时候残差连接最后没有接relu,效果可能会不好
如果是故意没有用relu,那上一条就只能用prelu了,且可以将连续两个33卷积合并为一个55卷积
—
Reply to this email directly, view it on GitHub, or unsubscribe.
Triage notifications on the go with GitHub Mobile for iOS or Android.
You are receiving this because you commented.Message ID: ***@***.***>
|
作者,您好,请问判断比较差的通道依据是什么哈?有点不太明白,还望指教,谢谢 |
评价通道好坏的方法非常多,每种方法都各有各的道理,我这里使用的方法是使用用一个mask乘在待裁剪的通道上,在训练的时候稀疏化这个mask,当一个通道对应的值被稀疏化到接近0,就把这个通道删除。 |
您好,最近拜读了您的论文,基于目前的重参数化提出了新颖性的观点,想继续研究以下代码,但是发现输入相同的值,不知道为什么结构转换后输出不一样?
The text was updated successfully, but these errors were encountered: