Training Spacy Model: Train / Dev / Test - Datasets - rel_component #7003
Replies: 3 comments
-
I feel the term 'dev' dataset is used for evaluation/validation set, if I am not wrong. |
Beta Was this translation helpful? Give feedback.
-
When I use "dev" set, I typically mean a part of the training dataset that is used to evaluate hyperparameters such as number of epochs to run the training on, or more generally which configuration setting is best. That means that typically I'd have a "train", "dev" and "test" set. If you have just two datasets, "training" and "evaluation", I would split the "training" into "train" and "dev". While the models are being trained on the "train" portion, you'll get the loss scores on that same train portion, and the accuracy numbers and other evaluation scores during training are reported on the "dev" set. At some point when you're happy with the performance on the "dev" set, you probably want to test your model on a realistic "test" or "evaluation" set to determine the final performance of your final model. Hope that kind of clears things up? If not, happy to discuss further! |
Beta Was this translation helpful? Give feedback.
-
what is considered a good performance range one needs to see when training a model
|
Beta Was this translation helpful? Give feedback.
-
Hello,
i want to train a ML model for the pipeline component rel_component. I've collected annotations in prodigy and have split the data into a training and evaluation dataset. In the tutorial https://youtu.be/8HL-Ap5_Axo?t=2040 it has been said, the dev set is used to evaluate the f-score during training.
My question is, what is a practical approach to get a dev dataset if i have the "classic approach" with training and evaluation datasets ?
Beta Was this translation helpful? Give feedback.
All reactions