Demand forecasting in retail - panel data

Title: Store Sales Forecasting
Author: Maria Starovska, Tangent Works
Industry: Retail
Area: Sales
Type: Forecasting

Description

Imagine your business has tens (or even hundreds) of stores spread across multiple locations. Regularly predicting sales for each store can be a challenging task.

This solution template will demonstrate how to use TIM with panel dataset. We will build one general model for all stores. If you are interesting more in modeling each store individually, check the solution template Store sales forecasting

Setup

Credentials and logging

(Do not forget to fill in your credentials in the credentials.json file)

Dataset

Description

The dataset contains weekly data from 45 stores from 2010-01-30 until 2012-10-20 (143 samples per Store). The target variable is weekly sales, and the dataset contains the following additional predictors: an indicator of whether the calendar week contains holiday, temperature, fuel price, cpi, unemployment rate, size of the store type of the store (A, B, C) and special holiday.

Sampling period

Data are sampled weekly.

Data

Column name Description Type Availability
Store The store number Group key column
Date Timestamp Timestamp column
Weekly_Sales Binary value indicating if the calendar week contains holiday Target t+0
IsHoliday Indicating if marketing campaign was running, binary values 0 or 1 Predictor t+4
Temperature Average temperature in the region Predictor t+4
Fuel_Price Cost of fuel in the region Predictor t+4
CPI The consumer price index Predictor t+4
Unemployment The unemployment rate Predictor t+4
Size Size of store Predictor t+4
isA Binary indicator type of store A Predictor t+4
isB Binary indicator type of store B Predictor t+4
isC Binary indicator type of store C Predictor t+4
SpecialHoliday Binary indicator indicates whether it is week before Christmas and black Friday or normal day Predictor t+4

Forecasting situations

Our goal is to predict the next four sales samples (4 weeks). Data alignments of the last observations for each store indicate that all predictors are available throughout the whole prediction horizon.

Source

The dataset is from the kaggle. In the original data, each store contains weekly sales for individual department. Since data on department level are gappy, we aggregated them to the store level.

CSV file used in this experiment can be downloaded here.

Read and preview dataset

Upload dataset to the TIM DB

To work with panel data, we have to specify group keys when uploading data. In our case, it is only the column Store. We must also set the timestamp column since it is in the second column in our data but is the first column by default.

Data visualization

Experiment

Description

We will omit three stores from the training data (1,3,6). It has two reasons. First is, we would like to demonstrate zero history models. If data have short or no history, we can use a model trained on similar datasets. Secondly, a model does not have to be trained on all groups to be accurate. Suppose the amount of data and generated features would exceed the worker's available RAM. In that case, TIM will get rid of the oldest observations, and it may switch off polynomial features. Training only on a subset shortens model building process and may improve accuracy.

We will set predictionTo to 4 samples ahead and rollingWindow to 1 sample because we would like to obtain for each timestamp prediction of each quality (S+1 - S+4). Out-of-sample period will by "2011-11-19" - "2012-10-20" i.e. last 45 samples from the last target timestamp. The rest will be in-sample. We will simulate short history models, therefore we will allow looking back at only 8 samples (weeks): we will not include moving averages to features, and we will set offsetLimit to -10.

Build-model job configuration

Build-model job execution

Predict job configuration and registration

Endpoint to register predict job was not implemented in the time of writing this notebook. Therefore, we will firstly authorize the user and then directly call the API. We will register predict job using model from the job with id _build_jobid.

Visualize Results

Forecasts Visualization

Visualization of stores on which model was not trained

Visualisation of stores on which model was trained

Importances

Variable Importances

Feature Importances

Accuracies

Accuracies per store