Skip to content

Data cleaning is the process of fixing or removing incorrect, corrupted, incorrectly formatted, duplicate, or incomplete data within a dataset. When combining multiple data sources, there are many opportunities for data to be duplicated or mislabeled.

Notifications You must be signed in to change notification settings

Zargham1214/Data-Cleaning

Repository files navigation

Data-Cleaning

Data cleaning is the process of fixing or removing incorrect, corrupted, incorrectly formatted, duplicate, or incomplete data within a dataset. When combining multiple data sources, there are many opportunities for data to be duplicated or mislabeled.

Handling Missing Values

Dropping missing values, or filling them in with an automated workflow.

check here

Character Encodings

Avoiding UnicoodeDecodeErrors when loading CSV files.

check here

Scaling and Normalization

Transforming numeric variables to have helpful properties.

check here

Parsing Dates

Recognizing dates as composed of day, month, and year.

check here

Inconsisten Data Entry

Efficiently fixing typos in your data.

check here

About

Data cleaning is the process of fixing or removing incorrect, corrupted, incorrectly formatted, duplicate, or incomplete data within a dataset. When combining multiple data sources, there are many opportunities for data to be duplicated or mislabeled.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published