This is a PyTorch implementation of Effective Approaches to Attention-based Neural Machine Translation using scheduled sampling to improve the parameter estimation process. It uses tab-delimited bilingual sentence pairs acquired from here to train predictive language models.
The model is trained end-to-end using stacked RNNs for sequence encoding and decoding. The decoder is additionally conditioned on a context vector for predicting the next constituent token in the sequence. This vector is computed using an attention mechanism at each time step. Intuitively, the decoder is attempting to leverage information conglomerated by the encoder by deciding the relevancy of each encoding at each time step of the decoding process.
Input Sequence (English) | Output Sequence (Spanish) |
---|---|
how are you doing | estas haciendo |
i am going to the store | voy a la tienda |
she is a scientist | ella es cientifico |
he is an engineer | el es un ingeniero |
i am going out to the city | voy al la de la ciudad |
i am running out of ideas | me estoy quedando sin ideas |
To train a new language model invoke train.py with the desired language abbreviation you would like to translate english to. For instance, spanish can be translated to by specifying 'spa' as input. 'spa-eng.txt' in the data directory will be used. Other languages can be acquired from here.
python train.py langname
To translate an input sequence in english into another language, invoke eval.py and specify the desired language and sentence. The program will exit if the language model parameters are not found in the data directory or if the language prefix is mistyped.
python eval.py langname 'some english words'
-
Attention nn module that is responsible for computing the alignment scores.
-
Recurrent neural network that makes use of gated recurrent units to translate encoded inputs using attention.
-
Recurrent neural network that encodes a given input sequence.
-
Helper functions for data extraction, transformation, and loading.
-
Script for evaluating the sequence-to-sequence model.
-
General helper functions.
-
Class that keeps record of some corpus. Attributes such as vocabulary counts and tokens are stored within instances of this class.
-
Script for training a new sequence-to-sequence model.