Skip to content

dlon450/Retail-Store-Sales-Forecasting

Repository files navigation

Retail-Store-Sales-Forecasting

There are a multitude of factors that impact the sales of retail stores, including but not limited to holidays and significant events. As a result, an important question to ask is how stores located in various regions can improve their sales by $1 million over the next two years through the usage of prominent holidays and promotional markdown events.

Data

The data used was retrieved from here. 'Features data set.csv' contains data relating to the store, department and regional activity for different dates, 'sales data-set.csv' contains the sales data for different stores and departments from 2010-02-05 to 2012-11-01 and 'stores data-set.csv' contains data about each store type and size.

Getting Started

Open retail_workflow.yxmd using Alteryx Designer and set the datasets to their corresponding locations. Run the workflow to merge datasets and obtain various figures for exploratory analysis and forecasting.

Details on Implementation

Note that any figures can be found within the 'Figures' directory.

Preprocessing

We need to first preprocess and clean the given data so that it can be reliably used in our data analysis and forecasting. We found that null values were only present in the features data set, particularly the consumer price index (CPI) and unemployment rate columns which only contained null values from May 2013 onwards. We decided to impute the mean depending on the store from the previous year as they seemed to be highly dependent on the store number. Null values were also in the markdown columns, so we chose to impute each markdown’s median (as the markdowns were right skewed) and include imputed value indicator columns to provide a way for the analytical algorithms to identify which values were imputed. The temperature and fuel price attributes may contain a fair amount of noise due to there being significant variation throughout the week. Overfitting to this noise would cause a worse performance for our model - binning these attributes into small intervals was used to attempt to reduce this noise. We also removed an observation which had an abnormally high MarkDown5 value and was not close to the rest of the data. The three datasets were then joined on the store number and date to be inputted into our analytical algorithms as a full dataset.

Forecasting

Key statistics were also recorded during the data analysis process. We found that store 20 had the highest weekly sales, at around 2.11 million USD (assuming the data is from US stores since the holidays exist in the US), while store 33 had the lowest weekly sales, at around 260 thousand USD. Holidays also affected the store sales, with the average weekly sales 7% greater on holidays. The type of store also had a major influence on the weekly sales, with type A, B and C having an average sales of 20, 12 and 9 thousand USD respectively. To predict future sales for these stores, we first used common algorithms such as linear regression and random forests. However, we found that the mean squared prediction error (MSPE) was too high for both models, likely due to the erratic nature of the store sales. We then decided to use time series analysis and compared two models, an exponential smoothing method (ETS) and an autoregressive integrated moving average (ARIMA), using a holdout validation set. We found that the ARIMA model performed much better than the ETS one, and so used it to forecast future sales for each store. Note that an extension to this method would be a form of multivariate time series analysis to allow for the use of other attributes.

Recommendations

As a result of the previous analysis we recommend that stores provide average markdowns no greater than 5000 during the holiday period. This is because increasing average markdowns beyond 5000 does not actually have a significant impact on the average weekly sales when compared to the effects of average markdowns between 0-5000. Hence it is better for a store to provide smaller markdowns for customers, as they are likely to receive sales similar to a larger markdown’s sale. Likewise, we found that sales plateau at a relatively low value for the first half of the year before slowly climbing from July onwards, peaking in December. Consequently, focusing promotional efforts on the holiday months (in particular December), will enable a business to improve their sales.

About

Forecasting of retail store sales using Alteryx Designer.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published