PhraseMatcher persistance issue #5493

tahirahmad2030 · 2020-05-22T18:26:37Z

tahirahmad2030
May 22, 2020

Feature description

I create a PhraseMatcher and in the process use nlp = spacy.blank('en') model to create doc objects of patterns. The phrasematcher works well.

I save PhraseMatcher in pickle format and load the pickle file and because PhraseMatcher needs doc I use nlp = spacy.blank('en') to create a doc object but Phrasematcher returns empty list.

How do I persist PhraseMatcher so that I can anytime load it and match phrases without learning patterns from scratch.

Could the feature be a custom component or spaCy plugin?

If so, we will tag it as project idea so other users can take it on.

Answered by adrianeboyd

May 25, 2020

Hi, after unpickling the PhraseMatcher, I think the missing step is to initialize your new nlp pipeline with the same vocab like this:

import pickle
import spacy
from spacy.matcher import PhraseMatcher

nlp = spacy.blank("en")
pm = PhraseMatcher(nlp.vocab)
pm.add("A", [nlp("a"), nlp("the")])
b = pickle.dumps(pm)

pm2 = pickle.loads(b)
nlp2 = spacy.blank("en", vocab=pm2.vocab) # vocab from reloaded PhraseMatcher

print(pm2(nlp("the dog is not a cat")))
# [(14862748245026736845, 0, 1), (14862748245026736845, 4, 5)]

View full answer

tahirahmad2030 · 2020-05-22T18:32:09Z

tahirahmad2030
May 22, 2020
Author

It works when I persist both PhraseMatcher and nlp together

     nlp.to_disk(location_to_save)
     # Save the matcher as a pickle
     with open(join(location_to_save, 'phrase_matcher.pkl'), 'wb') as f:
          pickle.dump(obj= phrase_matcher, file=f)```

and I load them from same source(location_to_save), It works


 but if only PhraseMatcher is persisted and nlp is initialised from spacy. It returns empty list while matching.

0 replies

adrianeboyd · 2020-05-25T08:13:45Z

adrianeboyd
May 25, 2020

Hi, after unpickling the PhraseMatcher, I think the missing step is to initialize your new nlp pipeline with the same vocab like this:

import pickle
import spacy
from spacy.matcher import PhraseMatcher

nlp = spacy.blank("en")
pm = PhraseMatcher(nlp.vocab)
pm.add("A", [nlp("a"), nlp("the")])
b = pickle.dumps(pm)

pm2 = pickle.loads(b)
nlp2 = spacy.blank("en", vocab=pm2.vocab) # vocab from reloaded PhraseMatcher

print(pm2(nlp("the dog is not a cat")))
# [(14862748245026736845, 0, 1), (14862748245026736845, 4, 5)]

0 replies

tahirahmad2030 · 2020-05-25T13:35:52Z

tahirahmad2030
May 25, 2020
Author

Thanks @adrianeboyd

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PhraseMatcher persistance issue #5493

{{title}}

Replies: 3 comments

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

Select a reply

PhraseMatcher persistance issue #5493

tahirahmad2030 May 22, 2020

Feature description

Could the feature be a custom component or spaCy plugin?

Replies: 3 comments

tahirahmad2030 May 22, 2020 Author

adrianeboyd May 25, 2020

tahirahmad2030 May 25, 2020 Author

tahirahmad2030
May 22, 2020

tahirahmad2030
May 22, 2020
Author

adrianeboyd
May 25, 2020

tahirahmad2030
May 25, 2020
Author