Loans are the core business of loan companies. The main profit comes directly from the loan’s interest. The loan companies grant a loan after an intensive process of verification and validation. However, they still don’t have assurance if the applicant is able to repay the loan with no difficulties.
In this tutorial, we’ll build a predictive model to predict if an applicant is able to repay the lending company or not. We will prepare the data using Watson Studio's Refinery and then build a model in two ways: using SPSS Modeler and using the new AutoAI feature of Watson Studio. Finally, we will deploy a web application that will use either of these two models.
After completing this tutorial, you’ll understand how to:
- Add and prepare your data
- Build a machine learning model using two different techniques
- Save & Deploy the models
- Use the models from a web application
In order to complete this tutorial, you will need:
- IBM Cloud account.
- Object Storage Service.
- Watson Studio Service.
- Machine Learning Service.
Services will be deployed in the next steps.
The overall time of reading and following this tutorial is approximately one hour.
The dataset is taken from from Analytics Vidhya but data is added to the data
folder for your convenience.
The format of the training data in train_loan.csv
is:
- Loan_ID Unique Loan ID
- Gender Male/ Female
- Married Applicant married (Y/N)
- Dependents Number of dependents
- Education Applicant Education (Graduate/ Under Graduate)
- Self_Employed Self employed (Y/N)
- ApplicantIncome Applicant income
- CoapplicantIncome Coapplicant income
- LoanAmount Loan amount in thousands
- Loan_Amount_Term Term of loan in months
- Credit_History Credit history meets guidelines
- Property_Area Urban/ Semi Urban/ Rural
- Loan_Status Loan approved (Y/N) --> This is the target to predict.
test_loan.csv
file does not provide this field.
- Create a project in Watson Studio
- Upload the dataset to Watson Studio
- Refine the train dataset, using Watson Studio Refinery capability
- Build a visual flow model and deploy it as a web service with no coding
- Build an alternative model using AutoAI capabilities of Watson Studio & Watson Machine Learning
- Deploy a client Web Application
If it does not exist yet, go to the IBM Cloud Catalog and create an instance of Watson Studio, selecting the Lite plan and a location such as London
or Dallas
, for example.
Navigate to either https://eu-gb.dataplatform.cloud.ibm.com
or https://dataplatform.cloud.ibm.com
, depending if you want to work in London or Dallas, and log in with your IBM Cloud credentials.
From that Watson Studio main page, click on New project
. Choose Create an empty project
. Once you enter your project name, if there is no Cloud Object Storage
associated to the project, click on Add
to select a new storage service.
A new tab will open into the new Cloud Object Storage (COS) service. Select New
, ensure the selected plan is Lite
and press Create
. A new pop up will show the creation confirmation. You may change the name of the service instance, and then press Create
The tab will be closed and you will return to the Watson Studio project creation screen. Just press Refresh
to load the newly created COS instance. Finally, press the Create
button to create the project in Watson Studio.
In the new project screen, select the Assets
menu, on the top.
It will open the Find and add data
section on the right-side panel. In the Load
area, drag and drop the two dataset files (train_loan.csv
and test_loan.csv
files) from your computer, under the data
directory of the git clone, to that area.
- On the asset page, click on the
train_loan.csv
data asset. A new screen will open
<br/
You can see that all the columns have been identified asstrings
. Although there are different ways to fix it, let's use the Refinery capability to adjust the data without the need of any programming.
- Press the
Refine
button, on the top right. The tool will start, analyze the data set and present a new screen. You will note that a list of transformation steps has been created and an initial step ofConvert column type
has been automatically added. This step has adjusted the types of all columns to its best-fit type. So it has convertedApplication Income
toInteger
,CoaplicantIncome
toDecimal
andLoanAmount
,Loan_Amount_Term
andCredit_History
allInteger
. You can re-visit how the file is transformed on each step. - Now, select the column
Loan_Status
, which is the target column we will predict, and press theOperations
button. - Scroll down under
Organize
and selectConditional Replace
- Add two conditions. One where the field
is equal to
Y
and replace by1
. Replace any remaining values with value0
. PressApply
button, at the bottom of the page. A new step is created. It says:Replaced values for Loan_Status: Loan_Status where value contains "Y" as "1". Replaced all remaining values with "0".
- Finally, convert the column type for
Loan_Status
fromstring
tointeger
. To do that, press the context button of the column, then selectConvert Column Type
and select the suggested typeInteger
There will be 3 steps recorded. - Press the
save
button, and then theedit
button, to adjust the output options. - In the new screen, press the
edit
icon to change the name of the output file. Puttrain_shaped.csv
. We will use this file later. You can also see the tool supports different file formats. We will leave CSV as it is. - Press
save
button to save the new names and then press theDone
button to return back to the Refinery screen. - Press the
save
button again, and then theplay
button (selectSave and create a job
) - A new
job creation
screen will open. Here it is possible to configure a single run or schedule a recurrent run of the transformation flow. Put a name for the job and finally pressCreate and Run
in the bottom right part of the screen. - A new job execution window will show. It will show the job is running. Just return to the project assets page. You will see a new data asset named
train_shaped.csv
. You can preview it and validate that all the changes defined have been applied.
- On the same Assets page, select
Add to Project
and from the different options selectModeler flows
. - Under the
New Modeler Flow
screen, name your modeler flow asLoan Eligibility Predictive model
, and ensure the selected runtime isIBM SPSS Modeler
- Click Create.
- Add data to the canvas using the
Data Asset
node. - Double click on the node and click
Change Data Asset
to open the Asset Browser. Selecttrain_shaped.csv
then clickOK
andSave
.
Let’s look into the summary statistics of our data using the Data Audit
node.
- Drag and drop the
Data Audit
node, and connect it with theData Asset
node. After running the node you can see your audit report on right side panel.
We can see that some columns have missing values. Let’s remove the rows that have null values using the Select
node.
- Drag and drop the
Select
node, connect it with theData Asset
node and right click on it and open the node. - Select discard mode and provide the below condition to remove rows with null values.
(@NULL(Gender) or @NULL(Married) or @NULL(Dependents) or @NULL(Self_Employed) or @NULL(LoanAmount) or @NULL(Loan_Amount_Term) or @NULL(Credit_History))
Now our data is clean, and we can proceed with building the model.
- Drag and Drop the
Type
node to configure variables type, fromField Operations
palette. - Double click the node or right click to open it.
- Choose
Configure Types
toread
the metadata. - Change the Role from the drop down menu of [Loan_Status] from
Input
toTarget
. - Change the Role drop down menu of [LoanID] from
none
toRecord ID
. - Click
Save
.
The model predicts the loan eligibility of two classes (Either Y:Yes or N:No). Thus, the choice of algorithms fell into Bayesian networks since it’s known to give good results for predicting classification problems.
-
Split data into training and testing sets using the
Partition
node, fromField Operations
palette. (we are not going to use thetest_loan.csv
as that file does not contain a target to validate the training) -
Double click the
Partition
node to customize the partition size into80:20
, change the ratio in theTraining Partition
to80
andTesting Partition
to20
. -
Drag and drop the
Bayes Net
node from theModeling
Palette. -
Double click the node to have a look to the settings. This time we are not going to touch anything.
-
Run your
Bayesian Network
node, then you’ll see your model in an orange colored node.
- Right click on the orange colored node, then click on
View
. - Now you can see the
Network Graph
and other model information here.
- Drag and drop the
Analysis
node from theOutput
section, and connect it with the model. - After running the node, you can see your analysis report on the right side panel.
The analysis report shows we have achieved 75.22% accuracy
(it might be different) on our test data partition
with this model. At the end, you can build more models within the same canvas until you get the result you want.
Let's build the inference flow with the data structure that will be used during inference, which is different to that used for training.
- As in step 4.2, add data to the canvas using the
Data Asset
node. - Double click on the new node and click
Change Data Asset
to open the Asset Browser. Selecttest_loan.csv
then clickOK
andSave
. - Delete the connection from
Partition
to the modelLoan Status
(yellow node), by selecting it and opening the contextual menu pressing the right mouse button and pressingdelete
. - Connect the new
Data Asset
node to theLoan Status
model (yellow node). - Drag and drop the
Table
node from theOutput
section, and connect it with the model (yellow node).
Right-click on the Table
node and select Save branch as a model
. If a Watson Machine Learning instance does not exist in the project, then the following screen may appear. Click on Create a new Watson Machine Learning service instance
.
A new tab will open to create a new instance of Watson Machine Learning.
- Select
New
and select the Lite plan and a location such asLondon
orDallas
, ideally the same you chose for Watson Studio. - Then press
Create
. A new dialog pops up. - Review the options, change the
Service name
if you want, and pressConfirm
.
The tab will close and you will go back to the SPSS Flow screen. Just repeat the right-click on theTable
node and selectSave branch as a model
. TheSave Model
screen will open. - Put a meaningfull model name, review that the
Table
branch is selected and finally, pressSave
You will see a confirmation message and when accepted you will return back to the SPSS Flow Editor. Just click on the project name on the top and return to the Asset page. In the Asset page underWatson Machine Learning models
you can access your saved model. - Select the model.
A new screen opens. There you can see some model information, as well as the input and output schema.
- Select the
Deployments
tab. - Select the
Add Deployment
link. - Add a name for the deployment. (ej.
SPSS Deploy model
) - Click on
Save
It will take some time, and you may need to refresh the screen before showing:
YES! IT HAS FAILED!! Why?
Well, guessing it is part of the exercise!
Let's see the error message... - Select the `SPSS Deploy model` (or whatever the name you gave to the model) link - In the new screen, select the `Details` tab, and check the error message:
Think on it and answer the questions in the exercise guide!
Step 5: Build an alternative model using AutoAI capabilities of Watson Studio & Watson Machine Learning
- On the Assets page, select
Add to Project
and from the different options selectAutoAI experiment
. The following screen appears - Put a name for the model
- Check the WML instance created in the previous step is selected and press
Create
In the new screen, selectSelect from Project
to select a data asset as the data to train the model. Currently this is limited to a single CSV file. Soon it will also support selecting a database connection, including table joins. A new pop up dialog appears. - Select
train_shaped.csv
- Press
Select Asset
The file is added as a data source, as shown in the next screen. - In the right column, select the column name that contains the target to predict (
Loan_Status
) - Press
Experiment Settings
to further customize the experiment. - In the configuration screen, adjust the train / test split to 85/15
- Uncheck the Loan_ID as it is not a valid feature. It is only the record id.
- Press
Prediction
to see the available configuration, although do not change anything. - Press
General
to see the available configuration, although do not change anything. - Finally, press
Save settings
When the browser returns to the previous screen, pressRun experiment
This is the initial screen, when AutoAI starts its calculation.
After some minutes, the animation will evolve and when it finishes will be similar to this other picture
- Click on
Pipeline comparison
to visualize more metrics comparing the four experiments. - Click on
Holdout
to visualize the metrics with the 15% of holdout data instead of crosvalidation. (results are slightly worse) - Explore the comparison and then click on the first and best experiment (
Pipeline 3
in this case). Another screen will open. - Check the different sections. The picture above shows the Confusion Matrix
- Finally select
Save as
andModel
A pop up dialog will appear. Accept the content as it is and press "Save" Click to return to the "Asset" windows. You will see the new model there.
- Click on the model name to open it.
- Click on
Deployments
- In the new screen select
Add deployment
link. - In the new screen put a deployment name, such as
AutoAI deployed model
and click onSave
As before wait a minute, and refresh the screen - Check the status is
ready
- Click on the deployment name (
AutoAI deployed model
in this case) - In the new screen, click on
Implementation
on the top. - In the
Code Snippets
tab select thePython
as language. - Copy the example code in
Python
. Save the code aside to be reused later.
We will use a notebook to validate that the deployed model works fine as a web service.
- Go to the main asset page, click on
Add to Project
button and selectNotebook
. A pop up screen will appear. - Press the
From URL
tab. - Add the following URL
https://raw.githubusercontent.com/jaumemir/watson_studio/master/assets/ScoringSimulation.ipynb
which points to a file in this same github repository. Name it, and select a runtime environment (the free one is enougth). - Press
Create Notebook
- Watson Studio will instantiate an environment and the Jupyter Notebook screen will open.
- Press the
data
button. A side window on the right will show the data assets. - Ensure the first empty cell is selected and under the
test_loan.csv
filename, expand theinsert to code
droplist and selectinsert pandas dataframe
. - Ensure the last variable name generated is
df_data_1
. Rename anydf_data_2
todf_data_1
if needed. - Save the notebook pressing the save button and then run the cell. You should get a table showing the first 5 rows.
- Run the second cell. It will show a description of the types of each columns of the dataset.
- Run third cell, that will sample some records in a python list format, ready for next step.
- Run 4th cell, that will prepare the payload message that will be sent for scoring.
- In the 5th cell, copy the WML credentials from the IBM Cloud WML instance to the cell and run it. If you don't have the credentials in hand, go to IBM Cloud resource list and find the WML instance. Click on it and navegate to the
Credentials
menu, where you will find them.
Once done, you should have a notebook like the one in the figure. - Run 6th cell, which will retrieve the authentication token from the IBM Cloud IAM service.
- In the 7th cell put the code you saved from step 5.3.
- From the pasted code, remove or comment out the line starting with
payload_scoring =
- Execute that cell.
You should get the scoring results, with an array, where each element contains the predicted target and the confidence (or probability) for each of the two possible classes (0,1).
In this step, you will deploy a web application that will call the published AutoAI web service endpoint to get the loan granting decision.
Click Deploy to IBM Cloud
button above to deploy the application to IBM Cloud.
The IBM Cloud DevOps will open and a new Toolchain will be created for you.
- Just click to
Delivery Pipeline
and then toNew
key to generate a new key. PressOK
in the emerging dialog. - Region, organization and space are populated automatically. If organization and space are not populated, change the region until you find all fields are populated. Check they are correct and then press
Create
on the top right
The toolchain and delivery pipeline will be created and launched its execution.
Click onDelivery Pipeline
to access and monitor how the toolchain builds the application and deploys it as an IBM Cloudfoundry node.js application.
When it completes, click onView console
or if there is any problem, just go toResource List
, find the applicationwatson_studio-202001nnnnnnnnn
(beingnnn
digits) and open it. - Click on the
Runtime
menu - Click on
Environment Variables
and scroll down - Fill in the three environment variables. The needed values can be found in the notebook from Step 5
- Find
APIKEY
in thewml_credentials
dictionary, in cell 5 - Find
ML_INSTANCE_ID
also in thewml_credentials
dictionary, in cell 5 - Find
WML_URL
in cell 7, in the code:response_scoring = requests.post(
https://eu-gb.ml.cloud.ibm.com/v4/deployments/b67c9df3-535f-4b98-ba55-71dc811e36f5/predictions, json=payload_scoring, headers=header)
. The selected URL is the value you need. - Press
Save
The application will restart.
Once restarted, click onVisit App URL
. The application screen will open - Put some values at your criteria
- Press
Send Data to Watson
You will see the results of the prediction.
You have learned how to create a complete predictive model without programming: from importing the data, preparing the data, to training and saving the model. You also learned how to use SPSS Modeler and AutoAI and export the model from AutoAI to Watson Machine Learning, where you deployed the model as a web service. Then you created a notebook to test the model as a web service implementation and finally you have deployed a web application that consumes the web service and show the model results.
- Adapted from the original tutorial from Hissah AlMuneef | Published January 18, 2019