Skip to content

Segmenting Customer into clusters based on their interaction with Business

Notifications You must be signed in to change notification settings

satishrath185/Customer-Segmentation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Customer-Segmentation

Business Value

Customer segmentation is the process of grouping customers together based on some common characteristics, based on their interactions with the business. In most cases this interaction is in terms of their purchase behavior and patterns. These groups are beneficial for marketing campaigns, in identifying potential profitable customers and in developing customer loyalty.

Problem Statement

To identify clusters of customer based on their purchase behaviour by taking into account the recency, frequency and monetary value of their transactions.

Data

Each row of data represents a transaction and each column contains a transaction's attributes.

InvoiceNo : A unique identifier for the invoice. An invoice number shared across rows means that those transactions were performed in a single invoice (multiple purchases).

StockCode : Identifier for items contained in an invoice.

Description : Textual description of each of the stock item.

Quantity : The quantity of the item purchased.

InvoiceDate : Date of purchase.

UnitPrice : Value of each item.

CustomerID : Identifier for customer making the purchase.

Country : Country of customer.

Approach

  • Loading Dependencies

  • Loading Data

  • Data Exploration

  • Data Processing

  • Focussing on One Market (UK in this case)

  • Building Recency Feature

  • Calculating Frequency and Monetary Values

  • Customer Segmentation Kmeans Algorithm Silhouette Score Metric

  • Visualize Customer Segments

We use the silhouette score for finding out the optimal number of clusters during our clustering process.

Data Exploration

Sales By Counntry

Top 15 customers contributing to 10.5% of total sales

Sales Recency

Processed Data

Data with Recency, Frequency and Monetary feature

Model Building and Clustering

Developed and tested 3,4 and 5 number of clusters for their silhouette score. The results are as follows:

Clusters 3

  • There is a stark difference in Monetary vallue of customer

  • Cluster 2 is the cluster with high value customers who shop frequently and is certainly an important segement for each business.

  • Cluster 0 and 1 has customer groups with low spend and medium spends

Clusters 4

  • The high value customers are subdivided into two groups, one with lower spends and lower frequency (represented by cluster 0) and another with high amount and higher frquency but lower recency represented by cluster 1.

Clusters 5

  • With 5 clusters too we have two subgroup for higher spend customers and 3 subgroup for customers with lower spend but varying frequency and recency.

Visualizing Clusters

Amount vs Frequency

Recency vs Amount

Recency vs Frequency

Conclusion

Going by mathematical metrics we see the silhouette score for 3 clusters is max suggesting that 3 clusters is the optimal number of clusters for this dataset. However we need to include business metrics and domain insights in our modelling process to obtain the best suited data-focussed solution for the bsuiness problem at hand :-)

About

Segmenting Customer into clusters based on their interaction with Business

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published