You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thanks for a great project, which helps me build model on top of that.
I was wondering one thing: it seems like you do not implement skip connection (residual network) in Transformer?
Is it because you implemented it and you didn't observe improvement?
Or is it just because you didn't implement it?
I asked because when I use more layers, I got worser performance actually. I am not sure whether it is what it is (i.e. having more layers does not help), or it is because I don't have skip connections, which usually helps build a deeper model.
Best,
The text was updated successfully, but these errors were encountered:
There are skip connections.
See Add() in EncoderLayer/DecoderLayer.
The tricks (lr scheduler, etc.) should be used if the network is deep, even if there are skip connections.
Hello,
Thanks for a great project, which helps me build model on top of that.
I was wondering one thing: it seems like you do not implement skip connection (residual network) in Transformer?
Is it because you implemented it and you didn't observe improvement?
Or is it just because you didn't implement it?
I asked because when I use more layers, I got worser performance actually. I am not sure whether it is what it is (i.e. having more layers does not help), or it is because I don't have skip connections, which usually helps build a deeper model.
Best,
The text was updated successfully, but these errors were encountered: