How the model is retrain by spacy? #4008
-
Earlier token 'Modi' is recognised as an Org by spacy to I retrain it with the following code:
And I got the following answer:
It changes the Modi to the person at the same time it doing incorrect NER as compare to the previous mode. In the previous model, Amazon was recognized as ORG but now change to GPE.
But looks like it crashes my model and getting the following result:
Please let me know the behind the seen reason and also how can I achieve that only entity which I label should change while all other should be according to spacy. |
Beta Was this translation helpful? Give feedback.
Replies: 5 comments
-
If you're really only using 1 or 2 data sets as
Another thing is that you don't just have to give spaCy examples of the new entities, but of already trained ones and datasets with no entities at all, as well. Otherwise spaCy will most likely forget already learned patterns. |
Beta Was this translation helpful? Give feedback.
-
Since you're using But I missed an important point before: You need to add your entity examples to full sentences before using them for spaCy to learn them. The model learns all kind of patterns from the example sentences to find entities and other information in new sentences after the training. So if you give spaCy just a word and say that it is an entity spaCy most likely won't find it in a full sentence, because there is so much more in the sentence. Instead of this: TRAIN_DATA = [
(u"Modi", {"entities": [(0, 4, "CELEBRITY")]})] the data should look more like: TRAIN_DATA = [
(u"On last Saturday Modi was in my hometown.", {"entities": [(17, 21, "CELEBRITY")]})] You should be good if you read a through the docs and start your training with a few example sentences of your new entity. |
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.
-
Yes, see @BreakBB's comment above. You might also want to check out the documentation and read a bit about named entity recognition in general. My free spaCy course also has a chapter on training that explains all of this in more detail : https://course.spacy.io/chapter4 |
Beta Was this translation helpful? Give feedback.
If you're really only using 1 or 2 data sets as
TRAIN_DATA
the problem lies in there. The NER pipline is more than just a advanced regex therefore you will need more input data to train it. The docs of Training an additional entity type say:Another thing is that you don't just have to give spaCy examples of the new entities, but of…