Matcher IS_PUNCT does not match on periods. #5874
-
How to reproduce the behaviourFor some reason the Matcher
Your EnvironmentspaCy version 2.3.0 |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
Remember to check that your pattern aligns with the underlying tokenization. The default English tokenizer tokenizes You can either change the matcher pattern or the tokenizer. In this case, there's a tokenizer exception that handles |
Beta Was this translation helpful? Give feedback.
Remember to check that your pattern aligns with the underlying tokenization. The default English tokenizer tokenizes
t.
as one token, which is why this pattern isn't matching like you'd expect. See the highlighted note near the top of this section: https://spacy.io/usage/rule-based-matching#adding-patternsYou can either change the matcher pattern or the tokenizer. In this case, there's a tokenizer exception that handles
t.
, so that's where you'd need to look if you want to modify the tokenizer. See the docs starting around here for more details: https://spacy.io/usage/linguistic-features#tokenizer-debug