How the model is retrain by spacy? #4008

imsaiful · 2019-07-23T08:21:20Z

imsaiful
Jul 23, 2019

Earlier token 'Modi' is recognised as an Org by spacy to I retrain it with the following code:

import spacy 
import random
nlp = spacy.load('en')
nlp.entity.add_label('CELEBRITY')
TRAIN_DATA = [
        (u"Modi", {"entities": [(0, 4, "PERSON")]}),
        (u"India", {"entities": [(0, 5, "GPE")]})]

optimizer = nlp.begin_training()
for i in range(20):
    random.shuffle(TRAIN_DATA)
    for text, annotations in TRAIN_DATA:
        nlp.update([text], [annotations],drop=0.3, sgd=optimizer)


text = "But Modi is starting India. The company made a late push\ninto hardware, and Apple’s Siri and Google available on iPhones, and Amazon’s Alexa\nsoftware, which runs on its Echo and Dot devices, have clear leads in\nconsumer adoption."
doc = nlp(text)
for ent in doc.ents:
    print(ent.text,ent.label_)

And I got the following answer:

Modi PERSON
India GPE
Apple’s Siri ORG
Google ORG
iPhones ORG
Amazon GPE
Echo PERSON
Dot PERSON

It changes the Modi to the person at the same time it doing incorrect NER as compare to the previous mode. In the previous model, Amazon was recognized as ORG but now change to GPE.
Now I add the extra-label CELEBRITY and categorize Modi to CELEBRITY with this following code


import spacy 
import random
nlp = spacy.load('en')
nlp.entity.add_label('CELEBRITY')
TRAIN_DATA = [
        (u"Modi", {"entities": [(0, 4, "CELEBRITY")]})]

optimizer = nlp.begin_training()
for i in range(20):
    random.shuffle(TRAIN_DATA)
    for text, annotations in TRAIN_DATA:
        nlp.update([text], [annotations],drop=0.3, sgd=optimizer)


text = "But Modi is starting India. The company made a late push\ninto hardware, and Apple’s Siri and Google available on iPhones, and Amazon’s Alexa\nsoftware, which runs on its Echo and Dot devices, have clear leads in\nconsumer adoption."
doc = nlp(text)
for ent in doc.ents:
    print(ent.text,ent.label_)

But looks like it crashes my model and getting the following result:

But CELEBRITY
Modi CELEBRITY
is CELEBRITY
starting CELEBRITY
India GPE
. CELEBRITY
The CELEBRITY
company CELEBRITY
made CELEBRITY
a CELEBRITY
late CELEBRITY
push CELEBRITY
into CELEBRITY
hardware CELEBRITY
, CELEBRITY
and CELEBRITY
Apple CELEBRITY

Please let me know the behind the seen reason and also how can I achieve that only entity which I label should change while all other should be according to spacy.

Answered by BreakBB

Jul 23, 2019

If you're really only using 1 or 2 data sets as TRAIN_DATA the problem lies in there. The NER pipline is more than just a advanced regex therefore you will need more input data to train it. The docs of Training an additional entity type say:

To keep the example short and simple, only a few sentences are provided as examples. In practice, you’ll need many more — a few hundred would be a good start. You will also likely need to mix in examples of other entity types, which might be obtained by running the entity recognizer over unlabelled sentences, and adding their annotations to the training set.

Another thing is that you don't just have to give spaCy examples of the new entities, but of…

View full answer

BreakBB · 2019-07-23T08:41:03Z

BreakBB
Jul 23, 2019

If you're really only using 1 or 2 data sets as TRAIN_DATA the problem lies in there. The NER pipline is more than just a advanced regex therefore you will need more input data to train it. The docs of Training an additional entity type say:

To keep the example short and simple, only a few sentences are provided as examples. In practice, you’ll need many more — a few hundred would be a good start. You will also likely need to mix in examples of other entity types, which might be obtained by running the entity recognizer over unlabelled sentences, and adding their annotations to the training set.

Another thing is that you don't just have to give spaCy examples of the new entities, but of already trained ones and datasets with no entities at all, as well. Otherwise spaCy will most likely forget already learned patterns.

0 replies

BreakBB · 2019-07-23T09:18:34Z

BreakBB
Jul 23, 2019

Since you're using random.shuffle(TRAIN_DATA) the order doesn't matter.

But I missed an important point before: You need to add your entity examples to full sentences before using them for spaCy to learn them. The model learns all kind of patterns from the example sentences to find entities and other information in new sentences after the training.

So if you give spaCy just a word and say that it is an entity spaCy most likely won't find it in a full sentence, because there is so much more in the sentence.

Instead of this:

TRAIN_DATA = [
        (u"Modi", {"entities": [(0, 4, "CELEBRITY")]})]

the data should look more like:

TRAIN_DATA = [
        (u"On last Saturday Modi was in my hometown.", {"entities": [(17, 21, "CELEBRITY")]})]

You should be good if you read a through the docs and start your training with a few example sentences of your new entity.

0 replies

imsaiful · 2019-07-23T09:44:14Z

imsaiful
Jul 23, 2019
Author

Since I have more than 1000 rows in my excel file with only two columns first one is for the name entity and the second one is for the type.Like the given below image

So I need to convert into the sentence one by one as the following

TRAIN_DATA = [
        (u"Github is the open source community", {"entities": [(0, 6, "Community")]})]

0 replies

ines · 2019-07-23T11:28:05Z

ines
Jul 23, 2019
Maintainer

Yes, see @BreakBB's comment above. You might also want to check out the documentation and read a bit about named entity recognition in general.

My free spaCy course also has a chapter on training that explains all of this in more detail : https://course.spacy.io/chapter4

0 replies

imsaiful · 2019-07-23T12:25:55Z

imsaiful
Jul 23, 2019
Author

Thank you both @ines and @BreakBB . Understand by now.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How the model is retrain by spacy? #4008

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 5 comments

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

How the model is retrain by spacy? #4008

imsaiful Jul 23, 2019

Replies: 5 comments

BreakBB Jul 23, 2019

BreakBB Jul 23, 2019

imsaiful Jul 23, 2019 Author

ines Jul 23, 2019 Maintainer

imsaiful Jul 23, 2019 Author

imsaiful
Jul 23, 2019

BreakBB
Jul 23, 2019

BreakBB
Jul 23, 2019

imsaiful
Jul 23, 2019
Author

ines
Jul 23, 2019
Maintainer

imsaiful
Jul 23, 2019
Author