Skip to content

Commit

Permalink
Merge branch 'main' into develop
Browse files Browse the repository at this point in the history
zsh:1: command not found: wq
  • Loading branch information
rhnfzl committed Nov 13, 2024
2 parents 7d51e44 + 5c4722f commit 990668d
Show file tree
Hide file tree
Showing 4 changed files with 36 additions and 7 deletions.
4 changes: 3 additions & 1 deletion .github/workflows/publish.yml
Original file line number Diff line number Diff line change
Expand Up @@ -4,13 +4,15 @@ on:
push:
branches:
- main
tags:
- 'v*'

jobs:
test:
runs-on: ubuntu-latest
strategy:
matrix:
python-version: ["3.10", "3.11", "3.12"]
python-version: ["3.10"]

steps:
- uses: actions/checkout@v4
Expand Down
4 changes: 2 additions & 2 deletions .github/workflows/python-package.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,9 +5,9 @@ name: Python package

on:
push:
branches: [ "main" ]
branches: [ "main", "develop" ]
pull_request:
branches: [ "main" ]
branches: [ "main", "develop" ]

jobs:
build:
Expand Down
13 changes: 11 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,15 @@
# `SqueakyCleanText`
<div align="center">

[![PyPI](https://img.shields.io/pypi/v/squeakycleantext.svg)](https://pypi.org/project/squeakycleantext/) [![PyPI - Downloads](https://img.shields.io/pypi/dm/squeakycleantext)](https://pypistats.org/packages/squeakycleantext)
# SqueakyCleanText

[![PyPI](https://img.shields.io/pypi/v/squeakycleantext.svg)](https://pypi.org/project/squeakycleantext/)
[![PyPI - Downloads](https://img.shields.io/pypi/dm/squeakycleantext)](https://pypistats.org/packages/squeakycleantext)
[![Python package](https://github.com/rhnfzl/SqueakyCleanText/actions/workflows/python-package.yml/badge.svg)](https://github.com/rhnfzl/SqueakyCleanText/actions/workflows/python-package.yml)
[![Python Versions](https://img.shields.io/badge/Python-3.10%20|%203.11%20|%203.12-blue)](https://pypi.org/project/squeakycleantext/)
[![License](https://img.shields.io/badge/license-MIT-blue.svg)](LICENSE)

A comprehensive text cleaning and preprocessing pipeline for machine learning and NLP tasks.
</div>

In the world of machine learning and natural language processing, clean and well-structured text data is crucial for building effective downstream models and managing token limits in language models.

Expand Down
22 changes: 20 additions & 2 deletions tests/test_sct.py
Original file line number Diff line number Diff line change
Expand Up @@ -60,17 +60,25 @@ class TextCleanerTest(unittest.TestCase):
def setUpClass(cls):
if os.getenv('GITHUB_ACTIONS'):
cls.ner = None
# Initialize empty processing classes for GitHub Actions
cls.ProcessContacts = None
cls.ProcessDateTime = None
cls.ProcessSpecialSymbols = None
cls.NormaliseText = None
cls.ProcessStopwords = None
cls.fake = None
return

try:
with timeout(1200): # 20 minute timeout
config.CHECK_NER_PROCESS = False
# Initialize all the processing classes
cls.ProcessContacts = contact.ProcessContacts()
cls.ProcessDateTime = datetime.ProcessDateTime()
cls.ProcessSpecialSymbols = special.ProcessSpecialSymbols()
cls.NormaliseText = normtext.NormaliseText()
cls.ProcessStopwords = stopwords.ProcessStopwords()
cls.fake = Faker()
cls.fake = Faker() # Initialize Faker

# Override default models with smaller model for testing
test_models = ["dslim/bert-base-NER"] * 5 # Same small model for all languages
Expand Down Expand Up @@ -99,8 +107,18 @@ def setUpClass(cls):
raise

def setUp(self):
"""Set up test fixtures before each test method."""
config.CHECK_NER_PROCESS = True
# Use the class-level NER instance instead of creating a new one
if os.getenv('GITHUB_ACTIONS'):
self.skipTest("Skipping test in GitHub Actions")

# Copy class-level attributes to instance level
self.ProcessContacts = self.__class__.ProcessContacts
self.ProcessDateTime = self.__class__.ProcessDateTime
self.ProcessSpecialSymbols = self.__class__.ProcessSpecialSymbols
self.NormaliseText = self.__class__.NormaliseText
self.ProcessStopwords = self.__class__.ProcessStopwords
self.fake = self.__class__.fake
self.ner = self.__class__.ner

@settings(deadline=None)
Expand Down

0 comments on commit 990668d

Please sign in to comment.