Custom POS pipeline #5417
-
Greetings, import your_custom_model
from spacy.symbols import POS
from spacy.tokens import Doc
import numpy
def custom_model_wrapper(doc):
words = [token.text for token in doc]
pos = your_custom_model(words)
# Convert the strings to integers and add them to the string store
pos = [doc.vocab.strings.add(label) for label in pos]
# Create a new Doc from a numpy array
attrs = [POS]
arr = numpy.array(list(zip(pos)), dtype="uint64")
new_doc = Doc(doc.vocab, words=words).from_array(attrs, arr)
return new_doc And after that I just tried to access like: for token in new_doc:
print(token.pos_) With code above I get a Error nlp.add_pipe(custom_model_wrapper, name="custom_pos_tagger", first=True) The model is a blank xx_ent_wiki_sm, then trained it with POS+WORD for NER, and it only has ner pipeline besides the custom added pipeline. Additional question do I need a tokenizer pipeline? For NER to work properly? Environment Information:
|
Beta Was this translation helpful? Give feedback.
Replies: 2 comments
-
A spacy token has two attributes that store POS information, the fine-grained tag ( For custom tags, you want to use |
Beta Was this translation helpful? Give feedback.
-
Thanks for the quick replay, pos = [doc.vocab.strings.add(label) for label in pos]
for index, token in enumarate(doc):
token.tag = pos[index] |
Beta Was this translation helpful? Give feedback.
A spacy token has two attributes that store POS information, the fine-grained tag (
tag
) and the coarse-grained Universal POS (pos
). Thetag
can be any tag, but thepos
is restricted to being a UPOS tag from this tag set: https://universaldependencies.org/u/pos/index.html. (I looked again through the related docs and this should be explained more clearly!)For custom tags, you want to use
tag
instead ofpos
. Trying to useDoc.from_array
as a workaround is just kind of masking the underlying problem. As a check, you should be able to set any attribute in theDoc
directly that you also want to load withDoc.from_array
.