Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Asymmetry in read gate application #68

Open
jozef-mokry opened this issue Nov 2, 2016 · 0 comments
Open

Asymmetry in read gate application #68

jozef-mokry opened this issue Nov 2, 2016 · 0 comments

Comments

@jozef-mokry
Copy link

This probably does not make much difference, but I noticed that the read gates r1 and r2 in gru_cond_layer method are used slightly differently:

Here (https://github.com/nyu-dl/dl4mt-tutorial/blob/master/session3/nmt.py#L448) the hidden state is computed as:

h1 = tanh(xx_ + r1*(Ux*h)) where xx_ is Wx*state_below + bx [Notice that the read gate r1 is not applied onto the bias bx]

However, when computing the second hidden state h2 at (https://github.com/nyu-dl/dl4mt-tutorial/blob/master/session3/nmt.py#L477) the hidden state is computed as:

h2 = tanh(Wcx*ctx_ + r2*(Ux_nl*h1 + bx_nl)) [Notice that the read gate r2 is applied onto the bias bx_nl]
If r2 "kills" some dimensions of the bias term bx_nl then some decision hyperplanes of Wcx are forced to go through origin.

Is this asymmetry intended?

@jozef-mokry jozef-mokry changed the title Asymetry in read gate application Asymmetry in read gate application Nov 2, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant