Skip to content

Recurrent Neural Networks

Niall Walsh edited this page Mar 31, 2019 · 1 revision

Recurrent Neural Networks

Recurrent neural networks are great at encapsulating sequence or time-series data, due to the connections formed between nodes create a directed graph along the sequence. A great resource that explains LSTM's better than I could here is found here. Check it out.

There has been a great study on the optimal parameters for LSTM networks in sequence classification tasks here. In section 7 there is a comparison of different features and additions and the impact they have on the network. I will summarize them here:

High Impact

Word Embeddings

Word embeddings improved accuracy when used accross the board.

Optimizer

Adam and Nadam proved the best, followed by RMSProp

Classifier (not relevant, for multilabel)

CRF instead of softmax proved to be better.

Dropout

Variational Dropout performed significantly better than naive or no dropout.

Gradient Clipping / Normalization

Gradient clipping did not help at all, however gradient normalization with T=1 proved to significatly increase accuracy.

Medium Impact

Tagging Scheme

The BIO and IOBES tagging scheme performed consistently better than the IOB tagging scheme.

No. of LSTM Layers

If the number of recurrent units is kept constant, two stacked BiLSTM-layers resulted in the best performance.

Mini-batch size

The optimal size for the mini-batch appears to depend on the task. For POS tagging and event recognition, a size of 1 was optimal,for chunking a size of 8 and for NER and Entity Recognition a size of 31.

Character representation

Character-based representations were in a lot of tested configurations not that helpful and could not improve the performance of the network.

Low/No Impact

Recurrent Units

The number of recurrent units, as long as it is not far too large or far too small, has only a minor effect on the results. A value of about 100 for each LSTM-network appears to be a good rule of thumb for the tested tasks.

Backend

Theano as well as Tensorflow performed equally in terms of test performance.

Clone this wiki locally