Some date entities being tagged as ORG #5647
-
I was recently working on a project using spacy, and I observed something wrong. When I check all entities in a sentence, dates like 27th September and 7th October were tagged as DATE type, but 5th September, for some reason, was labeled as ORG. Any explanation for this behavior? Code Snippet
This was the output for this But when I changed the date to 5th September, here's the code and result
The result was Info about spaCy
How do I ensure Spacy picks up such dates correctly as dates? Any help with regards to this? |
Beta Was this translation helpful? Give feedback.
Replies: 3 comments
-
Statistical models aren't going to be perfect and I think "5th September" isn't a common date format in the training data (OntoNotes, which contains a lot of American newspaper-type text). One option is to update the model with additional training examples (https://spacy.io/usage/training#ner). For dates with a consistent format like this, you might get better results with less effort using the |
Beta Was this translation helpful? Give feedback.
-
Hey thanks for the response. But if we were to make use of the |
Beta Was this translation helpful? Give feedback.
-
Well, that depends on your data. Like Adriane says: if the format is consistent, you can define patterns that would match your type of dates in your text - please refer to the docs here on the different types of rule-based matching included in spaCy. The docs also include many examples that should help you get started. In general, the pretrained models will get you only so far. For any practical application, you probably want to retrain them, or add custom preprocessing/postprocessing/rule-based matching to better cater for your specific use-case and domain text. |
Beta Was this translation helpful? Give feedback.
Statistical models aren't going to be perfect and I think "5th September" isn't a common date format in the training data (OntoNotes, which contains a lot of American newspaper-type text). One option is to update the model with additional training examples (https://spacy.io/usage/training#ner). For dates with a consistent format like this, you might get better results with less effort using the
EntityRuler
rather than trying to update an NER model.