Improve custom NER model performance for different input texts #13314
Replies: 1 comment
-
Hi Dave! As always, it depends a bit on the actual data & annotation scheme, but some ideas from my end:
I would only separate models if the input is really significantly different. If you believe that a trained model on input type X has valuable information to transfer to input type Y, then I would probably attempt to train a single model on both types.
The easiest, and likely best performing model, will be to just train from scratch, and mix in data from both the original type of text as well as the new ones. Don't just run another training iteration on the mistakes or the new data, as you risk running into the infamous Catastrophic forgetting problem. We've had an experimental "rehearse" function in the library to combat this, but honestly I would just run regular training from scratch with a dataset that is updated to your new input types. If you don't have access to your original annotations anymore for the old data type, you could use your model to produce annotations for these texts, but with 70% F-score those "silver" annotations are not a great source to train on. If your NER would be like >90%, then it'd be a much safer bet.
If you do end up training different models, then ideally the pipeline will only run one NER model for each input text. A component earlier on in the pipeline should decide which type of input text it is, and then you'd run the correct NER model accordingly. This earlier component can be rule-based or could be looking at doc's meta data, or it could be a supervised textcat component for instance. Finally, one more comment on your labeling scheme...
Have you tried, as a baseline, to merge these labels as SKILL and try and recognize that first? Your F-score could be significantly higher, and then you could try and determine HARD vs SOFT in a post-processing script, by either some sort of dictionary-based approach, a binary textcat model or even an entity linking type of algorithm. For NER, I can imagine that some soft skills look a lot like hard skills, both in the way they're used in a sentence as well as their lexicographic features, so I wouldn't be surprised if the NER is actually good at finding the offsets of these entities, but not the exact label. |
Beta Was this translation helpful? Give feedback.
-
Hello,
I trained a custom NER model using spaCy 3.6.0 a while ago, specialized in recognizing two types of labels (HARDSKILL, SOFTSKILL) in 15K manually labeled Job Posting texts. It performed acceptably (i.e, F1-Score of 71%, as that was my metric of interest) when the input text was a Job Posting, but its quality got reduced when the input was something else (i.e., a curriculum or a syllabus). To improve the performance, I know I must do some further training, but I have the following queries:
Should I train one isolated model per type of input text (i.e., one custom NER model more for syllabi, and another one for curricula), or can I "resume" the training of my current model, gathering samples for the input texts where my current model is performing poorly? (i.e. re – train my current NER model with more samples with texts of syllabi and curricula)
If I need to train "one specialized model per type of input text", how to "chain" the predictions? (i.e. how to "intersect" the entities retrieved by model1 + model2 + model 3 without having overlapping spans?) I was thinking in something like this, but I do not know if it would be the right approach.
If on the contrary, I can "re – train my current custom NER model with more text of the poorly-performing types of texts", is there any command I could use to do this re – training of the model? Any additional recommendations or readin material? BTW, I will probably label the new texts using prodigy.
Thanks and BR,
Dave.
Beta Was this translation helpful? Give feedback.
All reactions