-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
0 parents
commit 6f61fee
Showing
19 changed files
with
1,561 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,34 @@ | ||
name: Upload Python Package | ||
|
||
on: | ||
push: | ||
branches: | ||
- main | ||
|
||
permissions: | ||
contents: read | ||
|
||
jobs: | ||
deploy: | ||
|
||
runs-on: ubuntu-latest | ||
|
||
steps: | ||
- uses: actions/checkout@v4 | ||
- name: Set up Python | ||
uses: actions/setup-python@v3 | ||
with: | ||
python-version: '3.x' | ||
- name: Install dependencies | ||
run: | | ||
python -m pip install --upgrade pip | ||
pip install build | ||
pip install setuptools | ||
pip install twine | ||
- name: Build package | ||
run: python setup.py sdist bdist_wheel | ||
- name: Publish package | ||
uses: pypa/gh-action-pypi-publish@27b31702a0e7fc50959f5ad993c78deac1bdfc29 | ||
with: | ||
user: __token__ | ||
password: ${{ secrets.PYPI_API_TOKEN }} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
build/* | ||
dist/* | ||
*.egg-info |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
recursive-include file_genie *.py |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,87 @@ | ||
## FileGenie SDK | ||
FileGenie SDK is a Python library designed to simplify the parsing of various file formats (eg. TEXT, CSV, EXCEL, ZIP, XML, PDF) with a customizable transforming payloads as required. This SDK offers seamless integration, efficient file handling, and the flexibility to address edge cases with user-defined logic tailored to transforming entries as needed. | ||
|
||
### Features | ||
- **Multi-format Support:** Parse TEXT, CSV, EXCEL, ZIP, XML and PDF files effortlessly from AWS S3. | ||
- **Multi-format Response:** Supports multiple type of response as per user's need. For eg.- DATAFRAME, JSON, FILE | ||
- **Password-Proctected Support:** Parse password protected files. | ||
- **Customizable Edge Case Handling:** Define and apply custom functions to handle specific parsing requirements. There can be multiple edge case to handle while transforming the entries such as sanitise_str_column, convert_amount_as_per_currency, convert_date_format etc. | ||
- **S3 Integration:** Supports fetching files directly from AWS S3 buckets based on IAM role. | ||
- **Simple Configuration:** Initialize with straightforward configurations, avoiding the need for additional setup files. | ||
|
||
### Installation | ||
Install the SDK using pip: | ||
``` | ||
pip install file_genie | ||
``` | ||
|
||
### Prerequisites | ||
- **Your application should be deployed on AWS EKS to enable the SDK to utilize AWS S3 credentials.** | ||
- **Python:** >= '3.6' | ||
- **Pandas:** '2.0.0' | ||
|
||
### Getting Started | ||
- **Define Custom Edge Cases:** | ||
When specific functions are needed during file parsing, the SDK will import edge cases from your project structure as shown below. To implement this, create an edgeCases folder in your project and add a file named user_edge_cases.py. Define your custom functions in this file, and reference them in the edge_case section within the file_config as shown below. | ||
``` | ||
from edgeCases import user_edge_cases | ||
self.edge_cases = user_edge_cases | ||
``` | ||
|
||
- **Define the configuration required for file parsing logic and S3 bucket names** | ||
``` | ||
s3_config: { | ||
upload_bucket: reconciliation-live | ||
download_bucket: reconciliation-live | ||
} | ||
file_config: { | ||
"file_source_1": { | ||
"read_from_s3_func":"read_complete_excel_file", | ||
"parameters_for_read_s3": None, | ||
"file_dtype":{ | ||
"Order_Number": str, | ||
"Added On":str, | ||
"Added By":str | ||
}, | ||
"columns_mapping": { | ||
<!-- "Column Name in file": "Column name required in output" --> | ||
"Transaction Type": "TransactionType", | ||
"Cust Name": "CustomerName", | ||
"Cust ID": "CustomerId", | ||
"Transaction Amount": "Amount", | ||
"OrderNumber": "TransactionReference", | ||
"Reference ID": "CustomerReferenceId", | ||
"Target Date": "TargetDate", | ||
"TransactionDate": "TransactionDate", | ||
"FeeAmount": "ServiceCharge", | ||
"TaxAmount": "ServiceTax", | ||
"NetAmount": "NetAmount" | ||
} | ||
"edge_case": { | ||
<!-- edge case function name which you have defined in user_edge_case.py : params required for that function | ||
there can be different type of params. For eg. - dict, list, str --> | ||
<!-- In this convert_amount_as_per_currency is the edge case function which you want to apply while transforming the entries and "Amount" is the param to this function where you will apply the currency conversion --> | ||
"convert_amount_as_per_currency": "Amount" | ||
} | ||
}, | ||
} | ||
``` | ||
|
||
- **Define a ParsedDataResponseType enum** | ||
``` | ||
import enum | ||
class ParsedDataResponseType(enum.Enum): | ||
DATAFRAME="DATAFRAME" | ||
FILE="FILE" | ||
JSON="JSON" | ||
``` | ||
|
||
- **Import and initialise the file genie** | ||
``` | ||
from file_genie import FileGenie | ||
file_genie = FileGenie(config={s3_config: s3_config, file_config: file_config}) | ||
parsed_data = file_genie.parse("s3://your-bucket-name/path/to/your/file.csv", file_source, ParsedDataResponseType.DATAFRAME.value) | ||
//By default SDK will provide response as DATAFRAME | ||
``` | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,26 @@ | ||
# src/__init__.py | ||
from .service.file_parser import FileParser | ||
|
||
class FileGenie: | ||
def __init__(self, config): | ||
""" | ||
Initialize the SDK with configuration. | ||
Args: | ||
config (dict): Configuration dictionary (e.g., file_config, s3_config). | ||
""" | ||
self.config = config | ||
self.parser = FileParser(config) | ||
|
||
def parse(self, file_path: str, file_source: str = None, response_type: str = None): | ||
""" | ||
Parse and transform file as per mapping defined in configuration | ||
Args: | ||
file_path (str): Path to the file to be parsed. | ||
file_source: file source name in configuration | ||
Returns: | ||
Result of the parsing data as ParsedDataResponseType | ||
Defualt: Dataframe | ||
""" | ||
return self.parser.parse_file(file_path, file_source, response_type) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
import enum | ||
|
||
class FilterType(enum.Enum): | ||
EQUALS="equals" | ||
STARTSWITH="startswith" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,10 @@ | ||
import enum | ||
|
||
class LogLevel(enum.Enum): | ||
INFO="INFO" | ||
WARNING="WARNING" | ||
WARN="WARN" | ||
ERROR="ERROR" | ||
EXCEPTION="EXCEPTION" | ||
CRITICAL="CRITICAL" | ||
DEBUG="DEBUG" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,6 @@ | ||
import enum | ||
|
||
class ParsedDataResponseType(enum.Enum): | ||
DATAFRAME="DATAFRAME" | ||
FILE="FILE" | ||
JSON="JSON" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,62 @@ | ||
class FileParserException(Exception): | ||
"""Base exception for file parser errors.""" | ||
title = "UNKNOWN_ERROR" | ||
code = 520 | ||
|
||
def __init__(self, message="An unknown error occurred"): | ||
super().__init__(message) | ||
self.message = message | ||
|
||
class ConfigMissingException(FileParserException): | ||
"""Raised when configuration is missing.""" | ||
title = "CONFIG_EXCEPTION" | ||
code = 500 | ||
|
||
def __init__(self, message="Configuration is missing"): | ||
super().__init__(message) | ||
|
||
|
||
class FileProcessFailException(FileParserException): | ||
"""Raised when file processing fails.""" | ||
|
||
def __init__(self, message="File processing failed"): | ||
super().__init__(message) | ||
|
||
|
||
class FileReadException(FileParserException): | ||
"""Raised when file reading fails.""" | ||
|
||
def __init__(self, message="File reading error occurred"): | ||
super().__init__(message) | ||
|
||
|
||
class ResourceNotFoundException(FileParserException): | ||
"""Raised when a resource is not found.""" | ||
title = "NOT_FOUND" | ||
code = 404 | ||
|
||
def __init__(self, message="Requested resource not found"): | ||
super().__init__(message) | ||
|
||
|
||
class S3Exception(FileParserException): | ||
"""Raised for S3-related issues.""" | ||
title = "S3_EXCEPTION" | ||
code = 500 | ||
|
||
def __init__(self, message="An S3 error occurred"): | ||
super().__init__(message) | ||
|
||
class MissingResourceException(Exception): | ||
"""Exception raised when a required resource is missing, such as configuration or file source.""" | ||
|
||
def __init__(self, resource_name, message="Required resource is missing"): | ||
self.resource_name = resource_name | ||
self.message = f"{message}: {resource_name}" | ||
super().__init__(self.message) | ||
|
||
class NoTemplateFoundForFile(FileParserException): | ||
"""Raised when no template is found.""" | ||
|
||
def __init__(self, message="No template found for processing"): | ||
super().__init__(message) |
Oops, something went wrong.