Skip to content

MitsuSarkar/LendingClub-Data-Analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 

Repository files navigation

LendingClub-Data-Analysis

LendingClub is a peer-to-peer lending platform that connects borrowers with investors. The platform has become popular, so it's important to identify borrowers who are likely to default. This report evaluates models to predict default for borrowers on the LendingClub platform. The report analyzes data and identifies variables that impact the likelihood of default. The best model is presented and explained.

The data used for this study was collected from LendingClub's dataset. The dataset included information on over 1 million loans between 2007 and 2015. The data was analyzed to extract the most relevant features, and several machine learning models were used to predict loan defaults. The models were evaluated based on their accuracy, precision, recall, and F1 score.

A variety of machine learning algorithms were used to train and test models on the dataset. The algorithms included logistic regression and random forest. The models were trained on a subset of the data and then tested on the remaining data. The accuracy of each model was measured using accuracy, precision, and recall.

The analysis revealed that the most significant factors that impact loan default are the loan terms, loan amount, interest rate, and debt-to-income ratio. Other relevant features that contribute to the prediction of loan defaults include loan grade, loan subgrade, and home ownership status.

The confusion matrix shows how well the random forest algorithm performed in predicting loan defaults. The true positives are the loans that were correctly predicted as default, while the false positives are the loans that were incorrectly predicted as default.

The ROC curve shows the performance of the random forest algorithm in predicting loan defaults. The true positive rate is the percentage of loans that were correctly predicted as default, while the false positive rate is the percentage of loans that were incorrectly predicted as default. The random forest algorithm achieved a high true positive rate and a low false positive rate, which means that it was able to accurately predict default while minimizing the number of false positives.

The study found that the most significant variables that impact the likelihood of loan defaults on the LendingClub platform are loan terms, loan amount, interest rate, and debt-to-income ratio. The random forest algorithm was the best model for predicting loan defaults, with good out-of-sample predictive power. The results of the study can be used to improve the risk management process at LendingClub.

Screenshot 2023-04-20 031430

Screenshot 2023-04-20 032638 Screenshot 2023-04-20 025023

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published