Retail Sales

Problem description

Forecasting sales of a product or service plays an important role in the lifecycle of almost every retail company. There are multiple reasons why it is essential to know the future sales. It can drive plenty of management decisions such as efficient inventory management, preventing or early detection of potential issues, price setting, marketing and other.

Data recommendation template

The dataset can contain various factors that company tracks or estimates which could help forecasting future sales. These could be internal factors such as number of stores, number of employees, level of product/service, product portfolio size or external factors such as economic and industry conditions, rate of inflation and other. Of course, only those should be selected that are relevant for company’s business case and affect sales of the product. The second condition is not so strict – TIM engine can deal with irrelevant predictors, simply they won’t be included in model building. Some of these metrics can also be estimated into the future. For example, company probably knows price of the product or number of stores in a near future.

TIM setup

TIM requires no setup of TIM’s mathematical internals and works well in business user mode. All that is required from a user is to let TIM know forecasting routine and desired prediction horizon. It is often the case that there are some seasonal patterns in product sales time series, usually daily weekly or yearly. These will be detected by TIM engine automatically if the WeekRest, PeriodicComponents and TimeOffsets dictionaries are turned on (by default they are).

Demo using Python API Client

Set up Python Libraries

In [1]:
import logging
import pandas as pd
import plotly as plt
import plotly.express as px
import plotly.graph_objects as go
import numpy as np
import json

import tim_client

Credentials and logging

(Do not forget to fill in your credentials in the credentials.json file)
In [2]:
with open('credentials.json') as f:
    credentials_json = json.load(f)                     # loading the credentials from credentials.json

TIM_URL = 'https://timws.tangent.works/v4/api'          # URL to which the requests are sent

SAVE_JSON = False                                       # if True - JSON requests and responses are saved to JSON_SAVING_FOLDER
JSON_SAVING_FOLDER = 'logs/'                            # folder where the requests and responses are stored

LOGGING_LEVEL = 'INFO'
In [3]:
level = logging.getLevelName(LOGGING_LEVEL)
logging.basicConfig(level=level, format='[%(levelname)s] %(asctime)s - %(name)s:%(funcName)s:%(lineno)s - %(message)s')
logger = logging.getLogger(__name__)
In [4]:
credentials = tim_client.Credentials(credentials_json['license_key'], credentials_json['email'], credentials_json['password'], tim_url=TIM_URL)
api_client = tim_client.ApiClient(credentials)

api_client.save_json = SAVE_JSON
api_client.json_saving_folder_path = JSON_SAVING_FOLDER
[INFO] 2020-10-30 10:02:26,483 - tim_client.api_client:save_json:74 - Saving JSONs functionality has been disabled
[INFO] 2020-10-30 10:02:26,486 - tim_client.api_client:json_saving_folder_path:89 - JSON destination folder changed to logs

Specify configuration

To let TIM know that we want to forecast one week ahead we can set the "prediction to" to 7 samples. Model will be built using a range between 2014-01-01 00:00:00 – 2015-12-31 00:00:00. Out-of-sample forecasts are made on the rest - from 2016-01-01 00:00:00 – 2016-07-24 00:00:00. (the last 206 samples). To get better insights from our model we will also want extended importance and prediction intervals to be returned.

In [5]:
configuration_backtest = {
    'usage': {                                 
        'predictionTo': { 
            'baseUnit': 'Sample',                # units that are used for specifying the prediction horizon length (one of 'Day', 'Hour', 'QuarterHour', 'Sample')
            'offset': 7                       # number of units we want to predict into the future (24 hours in this case)
        },
        'backtestLength': 206                 # number of samples that are used for backtesting (note that these samples are excluded from model building period)
    },
    "predictionIntervals": {
        "confidenceLevel": 90                  # confidence level of the prediction intervals (in %)
    },
    'extendedOutputConfiguration': {
        'returnExtendedImportances': True      # flag that specifies if the importances of features are returned in the response
    }
}

Data description

Dataset used in this example has daily sampling rate and contains data from 2014-01-01 to 2016-07-31.

Target

Target variable represents daily sales of a product.

Predictor candidates

There are two predictor candidates – stock price and product price.

Forecasting scenario

In this example we will simulate a week ahead scenario (7 samples - 7 days). Each day we wish to have forecasts for 7 days ahead starting from the next day. We suppose that the sales of the preceding day is already known. We assume that the last known value of the stock price predictor is from the last day and the last value of the target is from the current day. The value of the product price predictor is known for the entire forecasting horizon.

In [6]:
data = tim_client.load_dataset_from_csv_file('data.csv', sep=',')                                  # loading data from data.csv
data                                                                                               # quick look at the data
Out[6]:
date product_sales stock_price product_price
0 2014-01-01 00:00:00 0.0 4972.0 1.29
1 2014-01-02 00:00:00 70.0 4902.0 1.29
2 2014-01-03 00:00:00 59.0 4843.0 1.29
3 2014-01-04 00:00:00 93.0 4750.0 1.29
4 2014-01-05 00:00:00 96.0 4654.0 1.29
... ... ... ... ...
938 2016-07-27 00:00:00 NaN NaN 2.39
939 2016-07-28 00:00:00 NaN NaN 2.39
940 2016-07-29 00:00:00 NaN NaN 2.39
941 2016-07-30 00:00:00 NaN NaN 2.39
942 2016-07-31 00:00:00 NaN NaN 2.39

943 rows × 4 columns

Run TIM

In [7]:
backtest = api_client.prediction_build_model_predict(data, configuration_backtest)                 # running the RTInstantML forecasting using data and defined configuration
backtest.status                                                                                    # status of the job
Out[7]:
'FinishedWithWarning'

Visualize backtesting

In [8]:
fig = plt.subplots.make_subplots(rows=1, cols=1, shared_xaxes=True, vertical_spacing=0.02)      # plot initialization

fig.add_trace(go.Scatter(x = data.loc[:, "date"], y=data.loc[:, "product_sales"],
                         name = "target", line=dict(color='black')), row=1, col=1)              # plotting the target variable

fig.add_trace(go.Scatter(x = backtest.prediction.index, 
                         y = backtest.prediction.loc[:, 'Prediction'],
                         name = "production forecast", 
                         line = dict(color='purple')), row=1, col=1)                            # plotting production prediction

fig.add_trace(go.Scatter(x = backtest.prediction_intervals_upper_values.index,
                         y = backtest.prediction_intervals_upper_values.loc[:, 'UpperValues'],
                         marker = dict(color="#444"),
                         line = dict(width=0),
                         showlegend = False), row=1, col=1)                           
fig.add_trace(go.Scatter(x = backtest.prediction_intervals_lower_values.index,
                         y = backtest.prediction_intervals_lower_values.loc[:, 'LowerValues'],
                         fill = 'tonexty',
                         line = dict(width=0),
                         showlegend = False), row=1, col=1)                                     # plotting confidence intervals

fig.add_trace(go.Scatter(x = backtest.aggregated_predictions[0]['values'].index, 
                         y = backtest.aggregated_predictions[0]['values'].loc[:, 'Prediction'],
                         name = "in-sample MAPE: " + str(round(backtest.aggregated_predictions[0]['accuracyMetrics']['MAPE'], 2)), 
                         line=dict(color='goldenrod')), row=1, col=1)                           # plotting in-sample prediction

fig.add_trace(go.Scatter(x = backtest.aggregated_predictions[1]['values'].index, 
                         y = backtest.aggregated_predictions[1]['values'].loc[:, 'Prediction'],
                         name = "out-of-sample MAPE: " + str(round(backtest.aggregated_predictions[1]['accuracyMetrics']['MAPE'], 2)), 
                         line = dict(color='red')), row=1, col=1)                               # plotting out-of-sample-sample prediction

fig.update_layout(height=600, width=1000, 
                  title_text="Backtesting, modelling difficulty: " 
                  + str(round(backtest.data_difficulty, 2)) + "%" )                             # update size and title of the plot

fig.show()

Visualize predictor and feature importances

In [9]:
simple_importances = backtest.predictors_importances['simpleImportances']                                                                # get predictor importances
simple_importances = sorted(simple_importances, key = lambda i: i['importance'], reverse=True)                                           # sort by importance
extended_importances = backtest.predictors_importances['extendedImportances']                                                            # get feature importances
extended_importances = sorted(extended_importances, key = lambda i: i['importance'], reverse=True)                                       # sort by importance

si_df = pd.DataFrame(index=np.arange(len(simple_importances)), columns = ['predictor name', 'predictor importance (%)'])                 # initialize predictor importances dataframe
ei_df = pd.DataFrame(index=np.arange(len(extended_importances)), columns = ['feature name', 'feature importance (%)', 'time', 'type'])   # initialize feature importances dataframe
In [10]:
for (i, si) in enumerate(simple_importances):
    si_df.loc[i, 'predictor name'] = si['predictorName']                   # get predictor name
    si_df.loc[i, 'predictor importance (%)'] = si['importance']            # get importance of the predictor
    
for (i, ei) in enumerate(extended_importances):
    ei_df.loc[i, 'feature name'] = ei['termName']                          # get feature name
    ei_df.loc[i, 'feature importance (%)'] = ei['importance']              # get importance of the feature
    ei_df.loc[i, 'time'] = ei['time']                                      # get time of the day to which the feature corresponds
    ei_df.loc[i, 'type'] = ei['type']                                      # get type of the feature
In [11]:
si_df.head()                                                               # predictor importances data frame
Out[11]:
predictor name predictor importance (%)
0 product_price 59.92
1 product_sales 31.71
2 stock_price 8.37
In [12]:
fig = go.Figure(go.Bar(x=si_df['predictor name'], y=si_df['predictor importance (%)']))      # plot the bar chart
fig.update_layout(height=400,                                                                # update size, title and axis titles of the chart
                  width=600, 
                  title_text="Importances of predictors",
                  xaxis_title="Predictor name",
                  yaxis_title="Predictor importance (%)")
fig.show()
In [13]:
ei_df.head()                                                               # first few of the feature importances
Out[13]:
feature name feature importance (%) time type
0 product_sales(t-1) 42.7 [1] TargetAndTargetTransformation
1 DoW(t) ≤ Fri & product_price 41.48 [3] Interaction
2 DoW(t) ≤ Fri & product_price 41.48 [4] Interaction
3 DoW(t) ≤ Fri & product_price 40.49 [6] Interaction
4 product_sales(t-5) & product_price(t-6) 39.71 [5] Interaction
In [14]:
time = '[1]'                                                                            # time for which the feature importances are visualized
fig = go.Figure(go.Bar(x=ei_df[ei_df['time'] == time]['feature name'],                       # plot the bar chart
                       y=ei_df[ei_df['time'] == time]['feature importance (%)']))
fig.update_layout(height=700,                                                                # update size, title and axis titles of the chart
                  width=1000,
                  title_text="Importances of features (for {}-sample ahead model)".format(time),
                  xaxis_title="Feature name",
                  yaxis_title="Feature importance (%)")
fig.show()