You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
h2 = tanh(Wcx*ctx_ + r2*(Ux_nl*h1 + bx_nl)) [Notice that the read gate r2 is applied onto the bias bx_nl]
If r2 "kills" some dimensions of the bias term bx_nl then some decision hyperplanes of Wcx are forced to go through origin.
Is this asymmetry intended?
The text was updated successfully, but these errors were encountered:
jozef-mokry
changed the title
Asymetry in read gate application
Asymmetry in read gate application
Nov 2, 2016
This probably does not make much difference, but I noticed that the read gates r1 and r2 in gru_cond_layer method are used slightly differently:
Here (https://github.com/nyu-dl/dl4mt-tutorial/blob/master/session3/nmt.py#L448) the hidden state is computed as:
h1 = tanh(xx_ + r1*(Ux*h))
wherexx_
isWx*state_below + bx
[Notice that the read gater1
is not applied onto the biasbx
]However, when computing the second hidden state h2 at (https://github.com/nyu-dl/dl4mt-tutorial/blob/master/session3/nmt.py#L477) the hidden state is computed as:
h2 = tanh(Wcx*ctx_ + r2*(Ux_nl*h1 + bx_nl))
[Notice that the read gater2
is applied onto the biasbx_nl
]If
r2
"kills" some dimensions of the bias termbx_nl
then some decision hyperplanes ofWcx
are forced to go through origin.Is this asymmetry intended?
The text was updated successfully, but these errors were encountered: