How would you define a Matcher based on a regular expression? #11233
Replies: 1 comment 3 replies
-
For IDs with regular structure like IBAN it makes sense to use regular expressions instead of NER models. If you have cases where you have things that match your regex but may not be IDs for some reason an NER model can help but I think IBAN numbers are too long for that to be an issue. For a guide to writing Matcher patterns, which are not just regular expressions, see the rule-based matching documentation, which also covers using matchers with NER. I'm not really familiar with IBANs, but based on your regex above, it looks like you may have variation in the number of tokens. In that case it can sometimes make sense to use a regex directly against the text of the doc and use |
Beta Was this translation helpful? Give feedback.
-
Hi,
I have a question about using the matcher feature of spaCy.
Is it possible to define a matcher based on a regular expression? I want to extract IBAN/BIC from a invoice text.
For example a regular expression for IBAN looks like this:
Background: currently I do NER and train my model on invoices giving the position of an IBAN/BIC. Even as I have trained multiple thousands of training data the recognition of IBAN/BIC is not so good as I hoped which may be the missing text context in an invoice document.
Now I think about if it would be better to detect IBAN/BIC by patterns using the matcher feature? I have seen pattern formats like the
LIKE_EMAIL
and I guess this is also some regular expression. But how can I define my own matcher pattern?And can I combine the matcher with my NER? Or do I need two models for that?
Can someone give me a hint how to get started?
Beta Was this translation helpful? Give feedback.
All reactions