Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hi, what 'convq_layer' means in net_pruner.py and net_skipper.py? #22

Open
zhujiacheng opened this issue Mar 12, 2018 · 4 comments
Open

Comments

@zhujiacheng
Copy link

hi,
In my opinion there should be some python scripts that can remove all-ZEROs-weights filters(row sparsity) directly to accelerate GPU inferences without any CPU subroutines, so are the net_pruner.py and net_skipper.py used for that? Or Can you give me some advises?
and I can not figure out what 'convq_layer' and 'convq_param_key' means in net_pruner.py and net_skipper.py, for example there obvioursly do not exit 'conv1q' key in src_net.params.
Thanks a lot for your help!
`

src_net = caffe.Net(srcproto,srcmodel, caffe.TEST)
print("src net:\n blobs {}\nparams {}\n".format(src_net.blobs.keys(), src_net.params.keys()))
src_net_parser = caffeparser.CaffeProtoParser(srcproto)
net_msg = src_net_parser.readProtoNetFile()

layer_idx = 0
loop_layers = net_msg.layer[:] #adding : implicitly makes a copy to avoid being modified in the loop
convxq_positions = []
convxq_m = []
convxq_add_layers = []
position_idx = 0

total_all_zero_counter = 0

# generate and save dst prototxt

for cur_layer in loop_layers:
    if 'Convolution'==cur_layer.type and re.match("^conv[0-9]+$",cur_layer.name):
        convq_layer = net_msg.layer._values[position_idx-1]
        convq_param_key = cur_layer.name+"q"
        param_key = cur_layer.name
        convx_ptr = net_msg.layer._values.pop(position_idx)
        convx_ptr.CopyFrom(cur_layer)
        convxq_ptr = net_msg.layer._values.pop(position_idx-1)
        convxq_ptr.CopyFrom(convq_layer)

        assert len(src_net.params[convq_param_key])==1
        weights_convxq = src_net.params[convq_param_key][0].data
        weights_convx = src_net.params[param_key][0].data
        assert weights_convx.shape[3]==1 and weights_convx.shape[2]==1

        orig_grp_num = weights_convxq.shape[0]/weights_convx.shape[1]
        cur_m = convq_layer.convolution_param.group
        orig_grp_num = cur_layer.convolution_param.group
        num_per_orig_grp = (cur_m/orig_grp_num)
        cur_sxs = weights_convx.shape[1]*orig_grp_num/cur_m

`

@wenwei202
Copy link
Owner

Hello, those scripts are deprecated without usages for any purposes. The LOWERED_CCNMM conv_mode enables all-zero weights removal. Please all check issues for some implementation details. Specifically, the cpu mode is fully supported while gpu mode uses some cpu functions for temporary test.

@zhujiacheng
Copy link
Author

zhujiacheng commented Mar 13, 2018

@wenwei202 thanks for your help.
but when I test the inference time with examples/cifar10_classifier.py on one nivdia 1070 gpu, the model's sparsity and the result are following.

cifar10_full_ssl_200000.caffemodel sparsity

I0313 15:25:18.214439 3057 base_conv_layer.cpp:17] layer conv1 has sparsity of 0.610833 I0313 15:25:18.215625 3057 base_conv_layer.cpp:61] ConvolutionParameter_ConvMode_LOWERED_CCNMM I0313 15:25:18.215688 3057 base_conv_layer.cpp:80] concatenating weight matrix I0313 15:25:18.215701 3057 base_conv_layer.cpp:88] conv1 left_cols=75 left_rows=14 I0313 15:25:18.215739 3057 base_conv_layer.cpp:91] squeezing weight matrix I0313 15:25:18.215749 3057 base_conv_layer.cpp:102] conv1 squeezing to 14x75 I0313 15:25:18.215775 3057 base_conv_layer.cpp:114] weight matrix squeezed I0313 15:25:18.215785 3057 base_conv_layer.cpp:180] weights lying in all-zero groups of conv1 are frozen I0313 15:25:18.216166 3057 base_conv_layer.cpp:17] layer conv2 has sparsity of 0.848477 I0313 15:25:18.226200 3057 base_conv_layer.cpp:61] ConvolutionParameter_ConvMode_LOWERED_CCNMM I0313 15:25:18.226290 3057 base_conv_layer.cpp:80] concatenating weight matrix I0313 15:25:18.226305 3057 base_conv_layer.cpp:88] conv2 left_cols=270 left_rows=20 I0313 15:25:18.226348 3057 base_conv_layer.cpp:91] squeezing weight matrix I0313 15:25:18.226358 3057 base_conv_layer.cpp:102] conv2 squeezing to 20x270 I0313 15:25:18.226404 3057 base_conv_layer.cpp:114] weight matrix squeezed I0313 15:25:18.226415 3057 base_conv_layer.cpp:180] weights lying in all-zero groups of conv2 are frozen I0313 15:25:18.227262 3057 base_conv_layer.cpp:17] layer conv3 has sparsity of 0.660352 I0313 15:25:18.249153 3057 base_conv_layer.cpp:61] ConvolutionParameter_ConvMode_LOWERED_CCNMM I0313 15:25:18.249279 3057 base_conv_layer.cpp:80] concatenating weight matrix I0313 15:25:18.249299 3057 base_conv_layer.cpp:88] conv3 left_cols=486 left_rows=62 I0313 15:25:18.249359 3057 base_conv_layer.cpp:91] squeezing weight matrix I0313 15:25:18.249370 3057 base_conv_layer.cpp:102] conv3 squeezing to 62x486 I0313 15:25:18.249470 3057 base_conv_layer.cpp:114] weight matrix squeezed I0313 15:25:18.249481 3057 base_conv_layer.cpp:180] weights lying in all-zero groups of conv3 are frozen I0313 15:25:18.249981 3057 inner_product_layer.cpp:12] layer ip1 has sparsity of 0.153613 I0313 15:25:18.254674 3057 inner_product_layer.cpp:20] weights lying in all-zero groups of ip1 are frozen I0313 15:25:18.254782 3057 net.cpp:895] Ignoring source layer loss

inference times (batch_size=32)

  cifar10_full.prototxt cifar10_full_ccnmm.prototxt conv_mode: LOWERED_CCNMM
cifar10_full_baseline.caffemodel 5ms (Top 1): 81.52%  (Top 5): 99.04% 31ms (Top 1): 81.52%  (Top 5): 99.05%
cifar10_full_ssl_200000.caffemodel 5ms (Top 1): 80.37%  (Top 5): 98.90% 31ms (Top 1): 80.37%  (Top 5): 98.90%

Why?

so Why the inference time is much more when conv_mode: LOWERED_CCNMM, and I can not see the inference time cuts down when using cifar10_full_ssl_200000.caffemodel?

@wenwei202
Copy link
Owner

To duplicate the results, please refer here on how I measured speed. I only counted the time of matrix-matrix multiplication and excluded all of others. For example, in cpu mode, the lowering process im2col consumes 80% time. I didn't want such kinds of inefficient implementations of those functionalities to deteriorate the results.

@zhujiacheng
Copy link
Author

@wenwei202 thanks for your help. I get it now.
then I will make some efforts to cut off the zeros filters and zeros channels directly in its weight caffemodel and prototxt , just according to row sparsity. Maybe make it when saving caffemodel in the end of train.
Is that a good way to avoid that kinds of inefficient implementations?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants