Skip to content

Commit

Permalink
Develop (#33)
Browse files Browse the repository at this point in the history
* Intial commit of squeaky clean text

* updated the sct.py script with modular code

* updated the sct.py script with pipeline method, which would ideally would help to make changes in the processing easier

* removed unnecessary direction code

* adding to do list

* adding to do list

* added requiremnt.txt file

* added setup.py file

* added test cases

* updated config file

* merging back

* Develop (#2) (#3)

* Intial commit of squeaky clean text

* updated the sct.py script with modular code

* updated the sct.py script with pipeline method, which would ideally would help to make changes in the processing easier

* removed unnecessary direction code

* adding to do list

* adding to do list

* added requiremnt.txt file

* added setup.py file

* added test cases

* updated config file

* merging back

* rebase

* update the license

* added German and Spanish support

* Updated file for pypi

* Updated readme file

* Add GitHub Actions workflow for publishing to PyPI

* Updated readme file

* Updated readme file

* added the username to the publish.yml

* update the API vriable name

* update the API user name

* Bump version to 0.1.1

* updated the readme file

* updated the version

* Update NER Process and added tag removal

* Updated congig file

* updated the code to have the option to not output language

* fixed the bug for NER which was refrencing to the wrong model variable names, add the gpu support

* fixed the Anonomyser Engine

* fixed the Anonomyser Engine

* added the test.yml file

* added the test.yml file

* added the test.yml file

* added the German and Spanish language support in lingua

* added the ability in the config to change the model name

* added the ability in the config to change the model name

* added the ability in the config to change the model name and fixed spanish model name

* squased some bugs

* added the language passing support

* Refactored the code

* fixed typing issue

* reverted the refactor

* Added the flow diagram of the pacckage in the readme
  • Loading branch information
rhnfzl authored Aug 18, 2024
1 parent 97e3bb0 commit e1f3ada
Show file tree
Hide file tree
Showing 2 changed files with 2 additions and 0 deletions.
2 changes: 2 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,8 @@ SqueakyCleanText simplifies the process by automatically addressing common text
- Supports English, Dutch, German, and Spanish languages.
- Provides text formatted for both Language Model processing and Statistical Model processing.

![Default Flow of cleaning Text](resources/sct_flow.png)

##### Benefits for Statistical Models
When working with statistical models, further optimization is often required, such as removing stopwords, special symbols, and punctuation.
SqueakyCleanText streamlines this process, ensuring your text data is in optimal shape for classification and other downstream tasks.
Expand Down
Binary file added resources/sct_flow.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit e1f3ada

Please sign in to comment.