Skip to content

Exploring various business cases by using a wide range of statistical methods and machine learning techniques

License

Notifications You must be signed in to change notification settings

jajokine/Business-Cases

Repository files navigation

Business-Cases

Exploring various business cases by using a wide range of statistical methods, visualization and machine learning techniques to analyze and solve each problem.

Hypothesis Testing

The first case involves A/B testing which consists of a randomized and controlled experiment with two variants, A and B, that allow to form a causal relationship with high probability in order to determine which of the two variants is more effective. For a business, this sort of testing gives valuable information when something new is being implemented and we want to know whether this implementation is worthwhile or we should stick to the old version or perhaps try something else. This could be, for example, when we try to understand user engagement and user satisfaction of online features such as a new feature or product. This analysis that utilizes statistical methods of hypothese testing allows companies to better understand growth, increase revenue, and optimize customer satisfaction. They are therefore an indispensable tool for many large companies, but also for smaller companies and startups by combining with Agile software development or through Minimum Viable Products (MVPs). The file a_b_testing.ipynb is the Jupyter Notebook that contains all the code, visualizations and analysis of the project.

Customer Churn Prediction

The second case deals with customer churn prediction with a data set that is from a telecom company. The churn rate is a critical metric for customer satisfaction and for measuring growth, as it tells you how many existing customers are leaving your business. By managing to lower the churn rate, we can have an impact on the business revenues and growth potential. Predicting the customer churn is especially important for Software as a Service (SaaS) business models that have subscription- and membership-based customers that use the services of a company frequently, but are in a sector where there are other companies that offer similar services, so predicting the customer satisfaction through the customer churn becomes a critical factor in order to succeed. The file customer_churn_prediction.ipynb is the Jupyter Notebook that contains all the code, visualizations and analysis of the project.

Customer Segmentation

The third case explores customer segmentation with a data set from an eCommerce platform. The customers will be segmented into groups with a marketing technique called the RFM analysis (Recency, Frequency and Monetary value) which is a behavior-based approach that groups customers based on their purchase history in order to try to segment the customers from the most valuable to the least valuable. This helps a company to better understand its customers by meeting their expectations. This analysis will then be combined with an unsupervised machine learning technique that provides knowledge on what kind of goods we should have in our warehouse for the upcoming year so that we could maximize our revenues. For businesses it is important to be able to segment the customers and the products into different groups so that they know better what the preferences of the customers are. The more knowledge you have of your current or future and potential customers, the better chances you have of meeting their expectations. Knowing and identifying the most potential and best customers has an impact on customer satisfaction and retention, and makes it easier to acquire new ones. These all lead to increased revenues and growth through better quality service that comes from better understanding the needs of each customer segment. The file customer_segmentation.ipynb is the Jupyter Notebook that contains all the code, visualizations and analysis of the project.

Time Series Forecasting

The fourth, fifth and sixth cases focus on time series machine learning forecasting. In the fourth, we will see the Light Gradient Boosting Machine (LGBM) model to make a 28 day sales forecast for various products sold in Walmart, the biggest retail company and the largest company by revenue in the world. Moreover, the data set covers hierarchical sales data from 10 stores located in 3 States in the U.S. over a timespan of 5 and half years. The dataset that can be found from Kaggle (M5 Forecasting - Accuracy), includes explanatory variables such as price, promotions, day of the week, and special events that should be used in order to forecast the sales of all products in the stores for the next 28 days. The file sales_forecasting.ipynb is the Jupyter Notebook that contains all the code, visualizations and analysis of the project.

The fifth case has a dataset of 5 years of sales from various products sold in 10 different eCommerce stores which will be modeled with different types of deep neural networks, such as Recurrent Neural Network (RNN), Long Short Term Memory (LSTM) network, Gated Recurrent Unit (GRU) network, and hybrid variants that include aspects from Convolutional Neural Network (CNN). The file deep_learning_forecasting.ipynb is the Jupyter Notebook that contains all the code, visualizations and analysis of the project.

The sixth case moves from forecasting one time step ahead in time to multiple steps, or one year ahead of time with a data set from products sales from a store over a timespan of 5 years. The file advanced_deep_learning_forecasting.ipynb is the Jupyter Notebook that contains all the code, visualizations and analysis of the project.