Skip to content

This repo features Supervised Machine Learning and its use in data analytics by analyzing credit card risk. Logistic Regression, Decision Tree, Random Forest, and Support Vector machine algorithms, in addition to ensemble and resampling techniques, were used.

Notifications You must be signed in to change notification settings

melindamalone/Credit_Risk_Analysis

Repository files navigation

Credit Risk Analysis

Overview of the Credit Risk Analysis

Purpose

In Module Seventeen, Supervised Machine Learning is introduced for use in data analytics. Logistic Regression, Decision Tree, Random Forest, and Support Vector machine algorithms, in addition to ensemble and resampling techniques, were implemented in this module to analyze credit card risk.

Results

Resampling

  • Naive Random Oversampling:
    • Balanced Accuracy Score 69%
    • F1 Score 75%

  • SMOTE Oversampling:
    • Balanced Accuracy Score 66%
    • F1 Score 81%

  • ClusterCentroids Resampler:
    • Balanced Accuracy Score 64%
    • F1 Score 72%

  • SMOTEENN:
    • Balanced Accuracy Score 64%
    • F1 Score 72%

Ensemble

  • Balanced Random Forest Classifier:
    • Balanced Accuracy Score 79%
    • F1 Score 93%

  • Easy Ensemble AdaBoost Classifier:
    • Balanced Accuracy Score 93%
    • F1 Score 97%

Summary

The resampling machine learning models produced moderate results ranging in F1 Scores of 70% - 80%. The ensemble machine learning models, BalancedRandomForestClassifier and Easy Ensemble AdaBoost Classifier, produced superior results at rates of 90% or above in comparison to the resampling methods. The ensemble machine learning method, BalancedRandomForestClassifier, also produced important features for the credit risk analysis. Therefore my recommendation for the credit risk analysis would be to implement the ensemble machine learning models, specifically AdaBoost Classifier, because of its compounded machine learning algorithms and higher F1 scores.

About

This repo features Supervised Machine Learning and its use in data analytics by analyzing credit card risk. Logistic Regression, Decision Tree, Random Forest, and Support Vector machine algorithms, in addition to ensemble and resampling techniques, were used.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published