Running in GPU instance #1185
Replies: 4 comments 8 replies
-
Have you looked at the batch option? Presidio has a Which model are you using? Please also share the code, as 4 minutes for 1mb sounds too much. Are you initializing the |
Beta Was this translation helpful? Give feedback.
-
from presidio_analyzer import AnalyzerEngine
from presidio_anonymizer import AnonymizerEngine
from presidio_anonymizer.entities import RecognizerResult, OperatorConfig
import datetime
import pandas as pd
import spacy
import re
spacy.require_gpu()
analyzer = AnalyzerEngine()
anonymizer = AnonymizerEngine()
data_sample =pd.read_parquet('xxx.parquet')
for index,row in data_sample.iterrows():
content = row['content']
if len(content) < 1000000:
results = analyzer.analyze(text=content,
language='en')
"""for result in results:
print("Type:", result.entity_type)
#print("Value:", result.text)
#print("Score:", result.score)
print(result)""" |
Beta Was this translation helpful? Give feedback.
-
I used batchanalyzer its taking same amount time |
Beta Was this translation helpful? Give feedback.
-
is any other way we can process the data? |
Beta Was this translation helpful? Give feedback.
-
I'm running Persidio on around 60 lakh records. It takes more than 4 minutes to process 1000k data. Is there any way to run it in the fattest way possible?
Beta Was this translation helpful? Give feedback.
All reactions