Skip to content

Commit

Permalink
Develop (#26)
Browse files Browse the repository at this point in the history
* Intial commit of squeaky clean text

* updated the sct.py script with modular code

* updated the sct.py script with pipeline method, which would ideally would help to make changes in the processing easier

* removed unnecessary direction code

* adding to do list

* adding to do list

* added requiremnt.txt file

* added setup.py file

* added test cases

* updated config file

* merging back

* Develop (#2) (#3)

* Intial commit of squeaky clean text

* updated the sct.py script with modular code

* updated the sct.py script with pipeline method, which would ideally would help to make changes in the processing easier

* removed unnecessary direction code

* adding to do list

* adding to do list

* added requiremnt.txt file

* added setup.py file

* added test cases

* updated config file

* merging back

* rebase

* update the license

* added German and Spanish support

* Updated file for pypi

* Updated readme file

* Add GitHub Actions workflow for publishing to PyPI

* Updated readme file

* Updated readme file

* added the username to the publish.yml

* update the API vriable name

* update the API user name

* Bump version to 0.1.1

* updated the readme file

* updated the version

* Update NER Process and added tag removal

* Updated congig file

* updated the code to have the option to not output language

* fixed the bug for NER which was refrencing to the wrong model variable names, add the gpu support

* fixed the Anonomyser Engine

* fixed the Anonomyser Engine

* added the test.yml file

* added the test.yml file

* added the test.yml file

* added the German and Spanish language support in lingua

* added the ability in the config to change the model name

* added the ability in the config to change the model name
  • Loading branch information
rhnfzl authored Aug 17, 2024
1 parent 3812255 commit 94bb6fe
Show file tree
Hide file tree
Showing 4 changed files with 11 additions and 12 deletions.
14 changes: 7 additions & 7 deletions requirements.txt
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
ftfy
nltk
emoji
ftfy==6.1.1
nltk==3.8.1
emoji==2.8.0
torch
Unidecode
transformers
beautifulsoup4
presidio_anonymizer
Unidecode==1.3.6
transformers==4.30.0
beautifulsoup4==4.12.2
presidio_anonymizer==2.2.355
lingua-language-detector
3 changes: 1 addition & 2 deletions sct/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -52,5 +52,4 @@
NER_MODELS_LIST = ["FacebookAI/xlm-roberta-large-finetuned-conll03-english",
"FacebookAI/xlm-roberta-large-finetuned-conll02-dutch",
"FacebookAI/xlm-roberta-large-finetuned-conll03-german",
"FacebookAI/xlm-roberta-large-finetuned-conll03-spanish",
"Babelscape/wikineural-multilingual-ner"]
"FacebookAI/xlm-roberta-large-finetuned-conll02-spanish",
2 changes: 1 addition & 1 deletion sct/utils/ner.py
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@ def __init__(self):
model_name = ["FacebookAI/xlm-roberta-large-finetuned-conll03-english",
"FacebookAI/xlm-roberta-large-finetuned-conll02-dutch",
"FacebookAI/xlm-roberta-large-finetuned-conll03-german",
"FacebookAI/xlm-roberta-large-finetuned-conll03-spanish",
"FacebookAI/xlm-roberta-large-finetuned-conll02-spanish",
"Babelscape/wikineural-multilingual-ner"]

english_model_name = model_name[0]
Expand Down
4 changes: 2 additions & 2 deletions setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,14 +11,14 @@
license='MIT',
packages=find_packages(),
install_requires=[
'lingua-language-detector>=2.0.0,<2.1',
'lingua-language-detector',
'nltk>=3.8,<3.9',
'emoji>=2.8,<2.9',
'ftfy>=6.1,<6.2',
'Unidecode>=1.3,<1.4',
'beautifulsoup4>=4.12,<4.13',
'transformers>=4.30,<4.31',
'torch>=2.0,<2.1',
'torch',
'presidio_anonymizer>=2.2.355,<2.3',
],
extras_require={
Expand Down

0 comments on commit 94bb6fe

Please sign in to comment.