Enhancing the decision process text when working with images #1361

NuiMrme · 2024-04-22T08:40:40Z

Is your feature request related to a problem? Please describe.
The decision process output prints out the entity_type, start_position, end_position and the score. When working with longer sequences of texts or with images, printing start = 204 end = 217 doesn't really mean anything and it is hard to see where that is.

Describe the solution you'd like
Add an entity_text where the the text in question is also printed: printing start = 204 end = 217 entity_text = "Saint Antonio"

I solved this on my version by adding

entity_text: str,

in recognizer_result.py init function which then affected also image_analzer_engine.py, image_recognizer_results.py, spacy_recognizer.py and pattern_recognizer.py
but the output is rather more readable

The text was updated successfully, but these errors were encountered:

NuiMrme · 2024-04-23T13:18:38Z

While at it, in analyzer_engine.py line:222 I modified the line so that the code prints out every case in a new line , even more readable
json.dumps([str(result.to_dict()) for result in results], indent=2),

omri374 · 2024-04-23T13:51:38Z

@NuiMrme are you asking specifically for images, or for any text?

omri374 · 2024-04-23T13:55:08Z

Does this help? #925 (comment)

NuiMrme · 2024-04-23T14:45:28Z

Sorry that wasn't well explained. I'm not reporting a bug but rather a feature I implemented on my version of Presidio that might help others too. See when you work with images or a lot of text while having your log_decision_process=True , the printed text will be for many many instances where it detected something and the log becomes unreadable. Please remember it prints that automatically no explicit print command is used as in your shared comment above.
If I have one line example thats fine I can look quickly see what these position refer to but when you have many of these stacked together because it is coming from an image of a document , you don't know anymore what is what. So I did the above mentioned modifications to change it a bit to make it more readable

Every new case will begin in a new line and observe that there is now a 'entity_text' which will show that text that is detected (I covered it with red for the obvious reasons), now you don't have to guess what line was that in the image what position etc... This is more readable and help the anlaysis of the annomyization results.

before

after

omri374 · 2024-04-24T13:30:20Z

One of the reasons we intentionally left out the actual identified text, is because it is essentially PII you might not want to log or return. If you have a suggestion on how to allow this, perhaps not asa default setting, we'd be happy to hear.

I totally agree that there are cases, especially with the images module, where returning or logging the actual text is needed.

NuiMrme · 2024-04-25T10:06:57Z

One of the reasons we intentionally left out the actual identified text, is because it is essentially PII you might not want to log or return. If you have a suggestion on how to allow this, perhaps not asa default setting, we'd be happy to hear.

I totally agree that there are cases, especially with the images module, where returning or logging the actual text is needed.

Well they are already printed out in the beginning anyway
[2024-04-24 12:46:05,853][decision_process][INFO][None][nlp artifacts:{"entities": ["Travaux", "Forage D'Eau Du", ...

omri374 · 2024-04-25T13:08:37Z

Good catch. I guess that for return_decision_process=True, it makes sense to be more verbose and return the actual values, but for the production version (where return_decision_process is likely disabled), it makes sense to omit it. Would you be interested in proposing a change through a pull request?

NuiMrme · 2024-04-25T13:52:47Z

Absolutely

NuiMrme mentioned this issue Apr 26, 2024

Enhancing the log by adding an entity_text and creating a new line for every case #1369

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enhancing the decision process text when working with images #1361

Enhancing the decision process text when working with images #1361

NuiMrme commented Apr 22, 2024 •

edited

Loading

NuiMrme commented Apr 23, 2024 •

edited

Loading

omri374 commented Apr 23, 2024

omri374 commented Apr 23, 2024

NuiMrme commented Apr 23, 2024

omri374 commented Apr 24, 2024

NuiMrme commented Apr 25, 2024

omri374 commented Apr 25, 2024

NuiMrme commented Apr 25, 2024

Enhancing the decision process text when working with images #1361

Enhancing the decision process text when working with images #1361

Comments

NuiMrme commented Apr 22, 2024 • edited Loading

NuiMrme commented Apr 23, 2024 • edited Loading

omri374 commented Apr 23, 2024

omri374 commented Apr 23, 2024

NuiMrme commented Apr 23, 2024

omri374 commented Apr 24, 2024

NuiMrme commented Apr 25, 2024

omri374 commented Apr 25, 2024

NuiMrme commented Apr 25, 2024

NuiMrme commented Apr 22, 2024 •

edited

Loading

NuiMrme commented Apr 23, 2024 •

edited

Loading