Cannot add custom PatternRecognizer for language X? #1165

mitch99 · 2023-09-05T07:35:51Z

mitch99
Sep 5, 2023

Hi,

I'm trying to create a custom PatternRecognizer for detecting Swedish personal identification numbers, but it seems it's not added properly in the end for use by the analyzer? Testing the recognizer itself works fine. Any clues to what I'm doing wrong?

# Install and imports
!pip install presidio_analyzer
!pip install -U spacy_stanza

import stanza
stanza.download("sv")
from presidio_analyzer import AnalyzerEngine, RecognizerRegistry, PatternRecognizer, EntityRecognizer, Pattern, RecognizerResult
from presidio_analyzer.nlp_engine import NlpEngineProvider

# Custom recognizer
pnr_pattern = Pattern(name="pnr_pattern",regex="\d{6}(?:\d{2})?[-\s]?\d{4}", score = 0.8)
pnr_recognizer = PatternRecognizer(supported_entity="PNR", patterns=[pnr_pattern])

# Create configuration, engine etc.
configuration = { "nlp_engine_name": "stanza", "models": [{"lang_code": "sv", "model_name": "sv"}] }
provider = NlpEngineProvider(nlp_configuration=configuration)
nlp_engine = provider.create_engine()
analyzer = AnalyzerEngine(nlp_engine=nlp_engine, supported_languages=["sv"])

# Add PNR-recognizer
analyzer.registry.add_recognizer(pnr_recognizer)

# Testing recognizer alone, works fine
text_to_analyze = "Mvh Adam Svensson 821011-0217. Ring på 073-1212123."
pnr_result = pnr_recognizer.analyze(text=text_to_analyze, entities=["PNR"])
print(pnr_result) 
# prints: '[type: PNR, start: 18, end: 29, score: 0.8]'

# Testing analyzer, doesn't work as expected...
text_to_analyze = "Mvh Adam Svensson 821011-0217. Ring på 073-1212123."
results = analyzer.analyze(text=text_to_analyze, language="sv", entities=['PHONE_NUMBER', 'PNR'])
print(results)
# prints: WARNING:presidio-analyzer:Entity PNR doesn't have the corresponding recognizer in language : sv
#         [type: PHONE_NUMBER, start: 18, end: 29, score: 0.4, type: PHONE_NUMBER, start: 39, end: 50, score: 0.4]

# Listing the recognizers yields the corresponding result (i.e. no PNR-recognizer)
recs = analyzer.get_recognizers(language='sv')
for rec in recs:
  print(f"-{rec.name}: {rec.supported_entities}")
# prints: 
# -IpRecognizer: ['IP_ADDRESS']
# -EmailRecognizer: ['EMAIL_ADDRESS']
# -IbanRecognizer: ['IBAN_CODE']
# -CreditCardRecognizer: ['CREDIT_CARD']
# -MedicalLicenseRecognizer: ['MEDICAL_LICENSE']
# -StanzaRecognizer: ['DATE_TIME', 'NRP', 'LOCATION', 'PERSON']
# -CryptoRecognizer: ['CRYPTO']
# -UrlRecognizer: ['URL']
# -PhoneRecognizer: ['PHONE_NUMBER']
# -DateRecognizer: ['DATE_TIME']

Answered by omri374

Sep 5, 2023

Hi, every recognizer can only support one language, so the only thing missing is to define the language in pnr_recognizer to support sv:

pnr_pattern = Pattern(name="pnr_pattern",regex="\d{6}(?:\d{2})?[-\s]?\d{4}", score = 0.8)
pnr_recognizer = PatternRecognizer(supported_entity="PNR", patterns=[pnr_pattern], supported_language="sv")

Please check and let us know if you still experience issues.

View full answer

omri374 · 2023-09-05T12:21:04Z

omri374
Sep 5, 2023
Maintainer

Hi, every recognizer can only support one language, so the only thing missing is to define the language in pnr_recognizer to support sv:

pnr_pattern = Pattern(name="pnr_pattern",regex="\d{6}(?:\d{2})?[-\s]?\d{4}", score = 0.8)
pnr_recognizer = PatternRecognizer(supported_entity="PNR", patterns=[pnr_pattern], supported_language="sv")

Please check and let us know if you still experience issues.

2 replies

mitch99 Sep 5, 2023
Author

Great, it works as expeced now, many thanks!

dhelly Oct 25, 2024

Thank you very much!!! It helped me too!!!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cannot add custom PatternRecognizer for language X? #1165

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment 2 replies

{{title}}

{{title}}

{{title}}

Select a reply

Cannot add custom PatternRecognizer for language X? #1165

mitch99 Sep 5, 2023

Replies: 1 comment · 2 replies

omri374 Sep 5, 2023 Maintainer

mitch99 Sep 5, 2023 Author

dhelly Oct 25, 2024

mitch99
Sep 5, 2023

Replies: 1 comment 2 replies

omri374
Sep 5, 2023
Maintainer

mitch99 Sep 5, 2023
Author