Replies: 1 comment 5 replies
-
Hi @SingTeng, the current context aware enhancement logic uses single words or tokens, and not phrases (also see #1043). Therefore, passing a context term such as "date of birth" would result in a mismatch. Passing ["date", "birth"] instead should work. Could you please try and let us know if it helps? |
Beta Was this translation helpful? Give feedback.
5 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi,
I did not find a date of birth recognizer in the supported entities (https://microsoft.github.io/presidio/supported_entities/) so I create a custom one.
I tried to use the context word to increase detection accuracy, but the context word does not seem to work for me.
==============
My code:
text = "My date of birth is 2000/01/01. I am thinking of going to the cinema on 06/05/2023 to watch the movie"
my dob recognizer, simply recognize date
dob_pattern = Pattern(name="dob_pattern (weak)",regex=r"(\d{2}(-|/)\d{2}(-|/)\d{4})|(\d{4}(-|/)\d{2}(-|/)\d{2})", score = 0.01)
I add context word, hoping it call differentiate between normal date and dob
dob_recognizer = PatternRecognizer(supported_entity="DOB", patterns = [dob_pattern], context = ["date of birth","dob", "d.o.b."])
calling the analyzer
result = dob_recognizer.analyze(text=text, entities=["DOB"])
print("Result:")
print(result)
looking at decision process
decision_process = result[0].analysis_explanation
pp = pprint.PrettyPrinter()
print("Decision process output:\n")
pp.pprint(decision_process.dict)
==========
My result:
Result:
[type: DOB, start: 20, end: 30, score: 0.01, type: DOB, start: 72, end: 82, score: 0.01]
Decision process output:
{'original_score': 0.01,
'pattern': '(\d{2}(-|\/)\d{2}(-|\/)\d{4})|(\d{4}(-|\/)\d{2}(-|\/)\d{2})',
'pattern_name': 'dob_pattern (weak)',
'recognizer': 'PatternRecognizer',
'score': 0.01,
'score_context_improvement': 0,
'supportive_context_word': '',
'textual_explanation': None,
'validation_result': None}
==============
It seems to have recognize both date as DOB, having the same score.
My expected result would be, the first date should have higher score as DOB than second date.
I might have done something wrong?
Please advise.
Thanks.
Beta Was this translation helpful? Give feedback.
All reactions