Subsampling while maintaining an identity block #79

addisonklinke · 2022-07-14T19:12:28Z

In the paper, I noticed there is a lot of emphasis on allowing the Non-Local Block to be initialized as an identity block so that it can be inserted into pre-trained architectures without adverse effect. For instance

Section 4.1: "The scale parameter of this BN layer is initialized as zero, following [17]. This ensures that the initial state of the entire non-local block is an identity mapping, so it can be inserted into any pre-trained networks while maintaining its initial behavior"
Section 3.3: " The residual connection allows us to insert a new non-local block into any pre-trained model, without breaking its initial behavior (e.g., if Wz is initialized as zero)"

Separately, section 3.3 talks about a subsampling trick which inserts pooling layers after phi and g from figure 2. I am struggling to see how pooling can be used while maintaining the identity initialization of a Non-Local Block described above. If pooled, then the output of f(i, j) • g(x) goes from shape THW x 512 to TH'W' x 512 where H' < H and W' < W. With smaller spatial dimensions, this output can no longer do an element-wise sum with the input X for the residual connection.

What additional operations are used in your implementation to enable an element-wise sum when subsampling? Do you upsample the f(i, j) • g(x) output before applying the 1x1 convolution in W_z?

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Subsampling while maintaining an identity block #79

Subsampling while maintaining an identity block #79

addisonklinke commented Jul 14, 2022 •

edited

Loading

Subsampling while maintaining an identity block #79

Subsampling while maintaining an identity block #79

Comments

addisonklinke commented Jul 14, 2022 • edited Loading

addisonklinke commented Jul 14, 2022 •

edited

Loading