Training Spacy Model: Train / Dev / Test - Datasets - rel_component #7003

JRE-joe · 2021-02-10T07:52:48Z

JRE-joe
Feb 10, 2021

Hello,

i want to train a ML model for the pipeline component rel_component. I've collected annotations in prodigy and have split the data into a training and evaluation dataset. In the tutorial https://youtu.be/8HL-Ap5_Axo?t=2040 it has been said, the dev set is used to evaluate the f-score during training.
My question is, what is a practical approach to get a dev dataset if i have the "classic approach" with training and evaluation datasets ?

davesuketu215 · 2021-02-10T08:47:13Z

davesuketu215
Feb 10, 2021

I feel the term 'dev' dataset is used for evaluation/validation set, if I am not wrong.
In fact in all of the spaCy documentation I see the term dev corpora, dev set etc. for validation sets.

0 replies

svlandeg · 2021-02-10T12:18:05Z

svlandeg
Feb 10, 2021
Maintainer

When I use "dev" set, I typically mean a part of the training dataset that is used to evaluate hyperparameters such as number of epochs to run the training on, or more generally which configuration setting is best. That means that typically I'd have a "train", "dev" and "test" set.

If you have just two datasets, "training" and "evaluation", I would split the "training" into "train" and "dev". While the models are being trained on the "train" portion, you'll get the loss scores on that same train portion, and the accuracy numbers and other evaluation scores during training are reported on the "dev" set.

At some point when you're happy with the performance on the "dev" set, you probably want to test your model on a realistic "test" or "evaluation" set to determine the final performance of your final model.

Hope that kind of clears things up? If not, happy to discuss further!

0 replies

yosiasz · 2021-02-10T17:34:01Z

yosiasz
Feb 10, 2021

what is considered a good performance range one needs to see when training a model

  "performance":{
    "tag_acc":0.4090151575,
    "tok2vec_loss":121.675166625,
    "tagger_loss":290.9605807673
  }

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Training Spacy Model: Train / Dev / Test - Datasets - rel_component #7003

{{title}}

Replies: 3 comments

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Training Spacy Model: Train / Dev / Test - Datasets - rel_component #7003

JRE-joe Feb 10, 2021

Replies: 3 comments

davesuketu215 Feb 10, 2021

svlandeg Feb 10, 2021 Maintainer

yosiasz Feb 10, 2021

JRE-joe
Feb 10, 2021

davesuketu215
Feb 10, 2021

svlandeg
Feb 10, 2021
Maintainer

yosiasz
Feb 10, 2021