Keras and Tensorflow implementation of Siamese Recurrent Architectures for Learning Sentence Similarity
The Keras implementation for the paper Siamese Recurrent Architectures for Learning Sentence Similarity which implements Siamese Architecture using LSTM to provide a state-of-the-art yet simpler model for Semantic Textual Similarity (STS) task.
- Input: Two sentences.
- Output: Semantic similarity between the input two sentences.
- Sentences encoded using Word2Vec (download from here)
- Siamese network.
- Use one LSTM.
- Distance: Manhattan distance.
- Both left LSTM and right LSTM have the same weights.
- The LSTM learns a mapping from the space of variable length sequences of 300 dimensional vectors into 50
- Optimization of the parameters using Adadelta.
- Use L1 (Manhattan distance).
- LSTM takes as input embeddings of 300-dimensional word2vec.
- This method do not require extensive manual feature generation beyond the separately trained word2vec vectors.
- The siamese network is trained using backpropagation-through-time under the mean squared error (MSE) loss function (after rescaling the training-set relatedness labels to lie in [0, 1]).
- LSTM weights initialized with small random Gaussian entries.
- Pre-training on separate sentence-pair data is provided for the earlier SemEval 2013 Semantic Textual Similarity task.
- Dataset thesaurus-based augmentation.
- Learned weights visualization as provided in the paper.
- We plan to provide Pytorch implementation.
- Although we provide a Keras correct backend implementation to
pearson_correlation
, You shouldn't rely onpearson_correlation
result that is returned fromevaluate
function unless you specify abatch_size
>= the testing set size. This is because Keras apply metrics in batchs and don't apply the metric for the whole set! - We provided implementation to
pearson_correlation
using Keras backend in order to visualize the learning curves. It gives only indications not the correctpearson_correlation
measures.
--word2vec
or-w
Path to word2vec .bin file with 300 dims.--data
or-d
Path to SICK data used for training.
--pretrained
or-p
Path to pre-trained weights.--epochs
or-e
Number of epochs.--save
or-s
Folder path to save both the trained model and its weights.--cudnnlstm
or-c
Use CUDNN LSTM for fast training. This requires GPU and CUDA.
python train.py --word2vec=/path/to/word2vec/GoogleNews-vectors-negative300.bin --data=/path/to/sick/SICK.txt --epochs=50 --cudnnlstm=true
--model
or-p
Path to trained model.--word2vec
or-w
Path to word2vec .bin file with 300 dims.--data
or-d
Path to SICK data used for testing.
--save
or-s
csv file path to save test output.
python test.py --model=/path/to/model/model.h5 --word2vec=/path/to/word2vec/GoogleNews-vectors-negative300.bin --data=/path/to/sick/SICK.txt --save=/path/to/save/location/test.csv