Data Cleaning, Feature Engineering, and Dimensionality Reduction in Machine Learning

This is a project to showcase different data cleaning, feature preprocessing, and feature selection in machine learning. Each jupyter notebook itself is a standalone illustration of the technique covered in that notebook.

Dependencies

This project requires python and the following python libraries.

pandas
numpy
seaborn
matplotlib
scikit-learn

It also requires a software that can open and execute a Jupyter Notebook.

Installation

Clone the repo.
Download the necessary data from the Data section below for the required technoque.
Navigate to the top-level project directory that contains this readme file.
Go to Source_Codes directory.
Run the following command:
```
    jupyter notebook
 
```
This will open a tab on a web browser.
Click on the file for the dimensionality reduction technque that you are interested in.

Methods

Missing Values Imputation Techniques
Handling Categorical Data
Zero-Variance Feature Removal
Multicollinearity Removal
Tokenization, Stemming, and Lemmatization
Forward Elimination/ Bakward Elimination/ Stepwise Elimination

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
For_Numerical_Features/Source_Codes		For_Numerical_Features/Source_Codes
For_Text_Features/Source_Codes		For_Text_Features/Source_Codes
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Data Cleaning, Feature Engineering, and Dimensionality Reduction in Machine Learning

Dependencies

Installation

Methods

About

Releases

Packages

Languages

being-aerys/Data_Processing_and_Feature_Engineering_in_Machine_Learning

Folders and files

Latest commit

History

Repository files navigation

Data Cleaning, Feature Engineering, and Dimensionality Reduction in Machine Learning

Dependencies

Installation

Methods

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages