Skip to content

This is an attempt to summarize feature engineering methods that I have learned over the course of my graduate school.

Notifications You must be signed in to change notification settings

being-aerys/Data_Processing_and_Feature_Engineering_in_Machine_Learning

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 

Repository files navigation

Data Cleaning, Feature Engineering, and Dimensionality Reduction in Machine Learning

This is a project to showcase different data cleaning, feature preprocessing, and feature selection in machine learning. Each jupyter notebook itself is a standalone illustration of the technique covered in that notebook.

Dependencies

This project requires python and the following python libraries.

  1. pandas
  2. numpy
  3. seaborn
  4. matplotlib
  5. scikit-learn

It also requires a software that can open and execute a Jupyter Notebook.

Installation

  1. Clone the repo.
  2. Download the necessary data from the Data section below for the required technoque.
  3. Navigate to the top-level project directory that contains this readme file.
  4. Go to Source_Codes directory.
  5. Run the following command:
        jupyter notebook
     
    
  6. This will open a tab on a web browser.
  7. Click on the file for the dimensionality reduction technque that you are interested in.

Methods

  1. Missing Values Imputation Techniques
  2. Handling Categorical Data
  3. Zero-Variance Feature Removal
  4. Multicollinearity Removal
  5. Tokenization, Stemming, and Lemmatization
  6. Forward Elimination/ Bakward Elimination/ Stepwise Elimination

Releases

No releases published

Packages

No packages published