Pipeline for training NER models using PyTorch.
ONNX export supported.
Instead of writing custom code for specific NER task, you just need:
- install pipeline:
pip install pytorch-ner
- run pipeline:
- either in terminal:
pytorch-ner-train --path_to_config config.yaml
- or in python:
import pytorch_ner
pytorch_ner.train(path_to_config="config.yaml")
The user interface consists of only one file config.yaml.
Change it to create the desired configuration.
Default config.yaml:
torch:
device: 'cpu'
seed: 42
data:
train_data:
path: 'data/conll2003/train.txt'
sep: ' '
lower: true
verbose: true
valid_data:
path: 'data/conll2003/valid.txt'
sep: ' '
lower: true
verbose: true
test_data:
path: 'data/conll2003/test.txt'
sep: ' '
lower: true
verbose: true
token2idx:
min_count: 1
add_pad: true
add_unk: true
dataloader:
preprocess: true
token_padding: '<PAD>'
label_padding: 'O'
percentile: 100
batch_size: 256
model:
embedding:
embedding_dim: 128
rnn:
rnn_unit: LSTM # GRU, RNN
hidden_size: 256
num_layers: 1
dropout: 0
bidirectional: true
optimizer:
optimizer_type: Adam # torch.optim
clip_grad_norm: 0.1
params:
lr: 0.001
weight_decay: 0
amsgrad: false
train:
n_epoch: 10
verbose: true
save:
path_to_folder: 'models'
export_onnx: true
NOTE: to export trained model to ONNX use the following config parameter:
save:
export_onnx: true
Pipeline works with text file containing separated tokens and labels on each line. Sentences are separated by empty line. Labels should already be in necessary format, e.g. IO, BIO, BILUO, ...
Example:
token_11 label_11
token_12 label_12
token_21 label_21
token_22 label_22
token_23 label_23
...
After training the model, the pipeline will return the following files:
model.pth
- pytorch NER modelmodel.onnx
- onnx NER model (optional)token2idx.json
- mapping from token to its indexlabel2idx.json
- mapping from label to its indexconfig.yaml
- config that was used to train the modellogging.txt
- logging file
List of implemented models:
- BiLTSM
- BiLTSMCRF
- BiLTSMAttn
- BiLTSMAttnCRF
- BiLTSMCNN
- BiLTSMCNNCRF
- BiLTSMCNNAttn
- BiLTSMCNNAttnCRF
All results are obtained on CoNLL-2003 dataset. We didn't search the best parameters.
Model | Train F1-weighted | Validation F1-weighted | Test F1-weighted |
---|---|---|---|
BiLSTM | 0.968 | 0.928 | 0.876 |
Python >= 3.6
If you use pytorch_ner in a scientific publication, we would appreciate references to the following BibTex entry:
@misc{dayyass2020ner,
author = {El-Ayyass, Dani},
title = {Pipeline for training NER models using PyTorch},
howpublished = {\url{https://github.com/dayyass/pytorch_ner}},
year = {2020}
}