Tok2Vec loss is increasing during training. Bad performance on short sentences #13505
Unanswered
skarokin
asked this question in
Help: Model Advice
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I'm training tok2vec, parser, tagger, and morphologizer with
vectors=null
. I'm trying to train the parser and tagger on ungrammatical sentences so I can't use pretrained vectors.I used
spacy init config
with the--efficiency
flag and--gpu
and left all parameters default except the ones described below.My training data is OntoNotes 5.0 with ~30% of the data augmented with some predetermined grammatical errors, then copied and appended to the dataset as new
.conllu
files containing ~120 augmented sentences each.I've tried training many times with learning rates ranging from 0.01 to 0.0005, with batch sizes ranging from 128 to 1024, and the width and depth of
[components.tok2vec.model.encode]
withwidth=96,128
anddepth=4,6
Another weird thing is that my model seems to be more performant on test data when Tok2Vec losses are higher. For example...
The model with 200,000 Tok2Vec loss performs much better on test data and my domain data than the model with 40,000 loss.
Is loss generally not an important metric for this use case? If it helps all of the models I've trained fall within 1% of these below metrics
Below is my most recent training with a 0.001 learning rate, 96 width and 4 depth, and 1000 batch size
One final thing to note is that my model performs very poorly on short sentences. For example, using
en_core_web_sm
, this is the result of printingtoken.text_, token.tag_, token.dep_, token.head.text
on 'She is beautiful.'When using the model I showed metrics for above, I get this for
model-best
andmodel-last
respectivelyBeta Was this translation helpful? Give feedback.
All reactions