Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Develop #26

Merged
merged 50 commits into from
Aug 17, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
50 commits
Select commit Hold shift + click to select a range
35ac870
Intial commit of squeaky clean text
rhnfzl Jun 15, 2024
e285bc4
updated the sct.py script with modular code
rhnfzl Jun 15, 2024
b80aeae
updated the sct.py script with pipeline method, which would ideally w…
rhnfzl Jun 15, 2024
ee9f47e
removed unnecessary direction code
rhnfzl Jun 15, 2024
a8abf5c
adding to do list
rhnfzl Jun 15, 2024
ab921e2
adding to do list
rhnfzl Jun 15, 2024
64c9851
added requiremnt.txt file
rhnfzl Jun 15, 2024
cabb678
added setup.py file
rhnfzl Jun 15, 2024
5b6e759
added test cases
rhnfzl Jun 15, 2024
b46b2dc
updated config file
rhnfzl Jun 15, 2024
d97a01e
merging back
rhnfzl Jun 16, 2024
f7d8cfd
Merge branch 'main' into develop
rhnfzl Jun 16, 2024
90f743c
Develop (#2) (#3)
rhnfzl Jun 16, 2024
d746606
Merge branch 'main' of https://github.com/rhnfzl/SqueakyCleanText int…
rhnfzl Jun 16, 2024
d03567f
rebase
rhnfzl Jun 16, 2024
8182b94
update the license
rhnfzl Jun 16, 2024
f4c6add
added German and Spanish support
rhnfzl Jun 16, 2024
5f8ab49
Updated file for pypi
rhnfzl Jun 16, 2024
4e65a24
Updated readme file
rhnfzl Jun 16, 2024
e2a0973
Add GitHub Actions workflow for publishing to PyPI
rhnfzl Jun 16, 2024
f1e1cc9
Updated readme file
rhnfzl Jun 16, 2024
c3ed47b
Merge branch 'main' into develop
rhnfzl Jun 16, 2024
54f2714
Updated readme file
rhnfzl Jun 16, 2024
fb90a31
added the username to the publish.yml
rhnfzl Jun 16, 2024
30587e4
update the API vriable name
rhnfzl Jun 16, 2024
883d309
Merge branch 'main' into develop
rhnfzl Jun 16, 2024
1394a97
update the API user name
rhnfzl Jun 16, 2024
2b3d8fb
Bump version to 0.1.1
rhnfzl Jun 16, 2024
5bb8285
Merge branch 'main' into develop
rhnfzl Jun 16, 2024
b7f7ca5
updated the readme file
rhnfzl Jun 16, 2024
f3ef342
updated the version
rhnfzl Jun 16, 2024
2823846
Merge branch 'main' into develop
rhnfzl Jun 16, 2024
a2458d3
Update NER Process and added tag removal
rhnfzl Aug 9, 2024
0687a6f
Updated congig file
rhnfzl Aug 9, 2024
4d67b00
Merge branch 'main' into develop
rhnfzl Aug 9, 2024
d2aeb02
updated the code to have the option to not output language
rhnfzl Aug 16, 2024
8d171e9
fixed the bug for NER which was refrencing to the wrong model variabl…
rhnfzl Aug 17, 2024
ed3ce21
Merge branch 'main' into develop
rhnfzl Aug 17, 2024
6940597
fixed the Anonomyser Engine
rhnfzl Aug 17, 2024
03ef4e0
fixed the Anonomyser Engine
rhnfzl Aug 17, 2024
fb90dcd
added the test.yml file
rhnfzl Aug 17, 2024
9d82e47
Merge branch 'main' into develop
rhnfzl Aug 17, 2024
cc59d71
added the test.yml file
rhnfzl Aug 17, 2024
eeaec6b
Merge branch 'main' into develop
rhnfzl Aug 17, 2024
e905ef1
added the test.yml file
rhnfzl Aug 17, 2024
dede5a2
added the German and Spanish language support in lingua
rhnfzl Aug 17, 2024
17cc400
Merge branch 'main' into develop
rhnfzl Aug 17, 2024
8f849f2
added the ability in the config to change the model name
rhnfzl Aug 17, 2024
b0c3f8b
added the ability in the config to change the model name
rhnfzl Aug 17, 2024
a2a3335
Merge branch 'main' into develop
rhnfzl Aug 17, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 7 additions & 7 deletions requirements.txt
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
ftfy
nltk
emoji
ftfy==6.1.1
nltk==3.8.1
emoji==2.8.0
torch
Unidecode
transformers
beautifulsoup4
presidio_anonymizer
Unidecode==1.3.6
transformers==4.30.0
beautifulsoup4==4.12.2
presidio_anonymizer==2.2.355
lingua-language-detector
3 changes: 1 addition & 2 deletions sct/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -52,5 +52,4 @@
NER_MODELS_LIST = ["FacebookAI/xlm-roberta-large-finetuned-conll03-english",
"FacebookAI/xlm-roberta-large-finetuned-conll02-dutch",
"FacebookAI/xlm-roberta-large-finetuned-conll03-german",
"FacebookAI/xlm-roberta-large-finetuned-conll03-spanish",
"Babelscape/wikineural-multilingual-ner"]
"FacebookAI/xlm-roberta-large-finetuned-conll02-spanish",
2 changes: 1 addition & 1 deletion sct/utils/ner.py
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@ def __init__(self):
model_name = ["FacebookAI/xlm-roberta-large-finetuned-conll03-english",
"FacebookAI/xlm-roberta-large-finetuned-conll02-dutch",
"FacebookAI/xlm-roberta-large-finetuned-conll03-german",
"FacebookAI/xlm-roberta-large-finetuned-conll03-spanish",
"FacebookAI/xlm-roberta-large-finetuned-conll02-spanish",
"Babelscape/wikineural-multilingual-ner"]

english_model_name = model_name[0]
Expand Down
4 changes: 2 additions & 2 deletions setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,14 +11,14 @@
license='MIT',
packages=find_packages(),
install_requires=[
'lingua-language-detector>=2.0.0,<2.1',
'lingua-language-detector',
'nltk>=3.8,<3.9',
'emoji>=2.8,<2.9',
'ftfy>=6.1,<6.2',
'Unidecode>=1.3,<1.4',
'beautifulsoup4>=4.12,<4.13',
'transformers>=4.30,<4.31',
'torch>=2.0,<2.1',
'torch',
'presidio_anonymizer>=2.2.355,<2.3',
],
extras_require={
Expand Down
Loading