GitHub - CoffeeITWorks/csv_normalizer: Ensure your csv file always has same amount of columns/format

Info

This is just a simple script that ensures all your .csv files have always same columns and in the same order. Probably one of the most common issues with .csv files:

Some system doesn't respects the columns orders
Some system doesn't adds a column when there is no data for such column

The script/program resolves both cases in a simple way, process:

Process:

Normalize: ensure order of columns is always same, add missing columns with empty data.

Example, you have a meteorologic station that should always generate a .csv with the following columns

Temperature, Humidity, Radiation, Wind, Wind gust

But sometimes one of the sensors doesn't have data and instead of sending all the columns to the .csv it generates partial .csv

Temperature, Humidity, Wind, Wind gust

In this case the software that process the .csv could fail, so you can use the csv_normalizer to ensure the .csv file is always

Temperature, Humidity, Radiation, Wind, Wind gust

In this case the csv_normalizer will add the missing column with empty data. Also the csv_normalizer will ensure the order of the columns is always the same.

Returns always a dict/json like, with the 'ok' or 'fail' list of processed files. examples:

{'failed': [],
'ok': [
    {'export_path': 'C:\\temp\\csv_export\\business-financial-data-jun-2021-quarter.csv',
        'import_path': 'C:\\temp\\csv_import\\business-financial-data-jun-2021-quarter.csv'}
        ]
}

# Example when nothing was processed:
{'failed': [],
'ok': []}

Example config:

[common]
csv_import_folder = C:/temp/csv_import
csv_export_folder = C:/temp/csv_export
csv_export_headers = 'Series_reference', 'Period', 'ELEE'
csv_delimiter = ;
csv_encoding = utf-8
# You can use column types, like int64, np.float64 if you want to specify
 # Or you can use type object if you don't want conversion or avoid NaN errors
# https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html
# example: {'column name': 'object'}
'dtype' = {}

Usage

usage: csv_normalizer [-h] [-c [CONFIG_INI]] [--version [VERSION]] [--no_rename [NO_RENAME_OLD]] [--write_config]

optional arguments:
-h, --help            show this help message and exit
-c [CONFIG_INI],      --config_ini [CONFIG_INI]
                        csv_normalizer ini configuration file
--version [VERSION]   Print version and exit
--no_rename [NO_RENAME_OLD]
                        Do not rename to .old the original file
--write_config        Write configuration with default values, useful to get a config file to modify

Example usage on Linux

csv_normalizer -c .\csv_normalizer.ini

On windows:

csv_normalizer.exe -c .\csv_normalizer.ini

Adding option to not rename the original files:

csv_normalizer -c .\csv_normalizer.ini --no_rename

By default csv_normalizer will rename the original files to .old so if you run the program again, it will not process same files again.

Install

pip install --user csv_normalizer

# or for root account

pip install csv_normalizer

Author: Pablo Estigarribia

Project site: https://github.com/CoffeeITWorks/csv_normalizer

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
.vscode		.vscode
src		src
.gitignore		.gitignore
CHANGELOG		CHANGELOG
LICENSE		LICENSE
README.md		README.md
csv_normalizer_process.plantuml		csv_normalizer_process.plantuml
csv_normalizer_process.png		csv_normalizer_process.png
plantuml_notes.txt		plantuml_notes.txt
requirements.txt		requirements.txt
setup.py		setup.py
upload.sh		upload.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Releases

Packages

Languages

License

CoffeeITWorks/csv_normalizer

Folders and files

Latest commit

History

Repository files navigation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages