Skip to content

Matcher IS_PUNCT does not match on periods. #5874

Discussion options

You must be logged in to vote

Remember to check that your pattern aligns with the underlying tokenization. The default English tokenizer tokenizes t. as one token, which is why this pattern isn't matching like you'd expect. See the highlighted note near the top of this section: https://spacy.io/usage/rule-based-matching#adding-patterns

You can either change the matcher pattern or the tokenizer. In this case, there's a tokenizer exception that handles t., so that's where you'd need to look if you want to modify the tokenizer. See the docs starting around here for more details: https://spacy.io/usage/linguistic-features#tokenizer-debug

Replies: 1 comment

Comment options

You must be logged in to vote
0 replies
Answer selected by ines
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feat / matcher Feature: Token, phrase and dependency matcher
2 participants
Converted from issue

This discussion was converted from issue #5874 on December 11, 2020 00:06.