You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Nov 1, 2024. It is now read-only.
In the paper, I noticed there is a lot of emphasis on allowing the Non-Local Block to be initialized as an identity block so that it can be inserted into pre-trained architectures without adverse effect. For instance
Section 4.1: "The scale parameter of this BN layer is initialized as zero, following [17]. This ensures that the initial state of the entire non-local block is an identity mapping, so it can be inserted into any pre-trained networks while maintaining its initial behavior"
Section 3.3: " The residual connection allows us to insert a new non-local block into any pre-trained model, without breaking its initial behavior (e.g., if Wz is initialized as zero)"
Separately, section 3.3 talks about a subsampling trick which inserts pooling layers after phi and g from figure 2. I am struggling to see how pooling can be used while maintaining the identity initialization of a Non-Local Block described above. If pooled, then the output of f(i, j) • g(x) goes from shape THW x 512 to TH'W' x 512 where H' < H and W' < W. With smaller spatial dimensions, this output can no longer do an element-wise sum with the input X for the residual connection.
What additional operations are used in your implementation to enable an element-wise sum when subsampling? Do you upsample the f(i, j) • g(x) output before applying the 1x1 convolution in W_z?
The text was updated successfully, but these errors were encountered:
Sign up for freeto subscribe to this conversation on GitHub.
Already have an account?
Sign in.
In the paper, I noticed there is a lot of emphasis on allowing the Non-Local Block to be initialized as an identity block so that it can be inserted into pre-trained architectures without adverse effect. For instance
Separately, section 3.3 talks about a subsampling trick which inserts pooling layers after
phi
andg
from figure 2. I am struggling to see how pooling can be used while maintaining the identity initialization of a Non-Local Block described above. If pooled, then the output off(i, j) • g(x)
goes from shapeTHW x 512
toTH'W' x 512
whereH' < H
andW' < W
. With smaller spatial dimensions, this output can no longer do an element-wise sum with the inputX
for the residual connection.What additional operations are used in your implementation to enable an element-wise sum when subsampling? Do you upsample the
f(i, j) • g(x)
output before applying the 1x1 convolution inW_z
?The text was updated successfully, but these errors were encountered: