This is a project to showcase different data cleaning, feature preprocessing, and feature selection in machine learning. Each jupyter notebook itself is a standalone illustration of the technique covered in that notebook.
This project requires python and the following python libraries.
- pandas
- numpy
- seaborn
- matplotlib
- scikit-learn
It also requires a software that can open and execute a Jupyter Notebook.
- Clone the repo.
- Download the necessary data from the Data section below for the required technoque.
- Navigate to the top-level project directory that contains this readme file.
- Go to Source_Codes directory.
- Run the following command:
jupyter notebook
- This will open a tab on a web browser.
- Click on the file for the dimensionality reduction technque that you are interested in.
- Missing Values Imputation Techniques
- Handling Categorical Data
- Zero-Variance Feature Removal
- Multicollinearity Removal
- Tokenization, Stemming, and Lemmatization
- Forward Elimination/ Bakward Elimination/ Stepwise Elimination