Motor Vehicle Sales

Problem description

A common task in retail is to forecast demand (sales) couple of months, even years ahead. Without proper forecast, it can be really difficult to have the right amount of stock on hand at any given time and vice versa, too much merchandise in the warehouse means more capital tied up in inventory. This might push customers to seek merchandise elsewhere. We will take motor vehicle industry as an example. We will try to forecast aggregated demand for all passenger cars, including station wagons for one year ahead.

Data Recommendation Template

Our dataset is gathered monthly and contains domestic U.S. sales of vehicles assembled in the U.S., Canada, and Mexico. We will use all samples gathered from the beginning of the year 1967 until the August of 2019. More data can be downloaded here. For the sake of simplicity, the data is univariate and not enhanced with any additional predictors. But if you look closely, you can see that the economic recessions had a huge impact on the demand. Therefore it is reasonable to expect that combining this dataset with interesting economic indicators could enhance the accuracy significantly.

TIM Setup

TIM requires no setup of its mathematical internals and works well in the business user mode. All that is required from a user is to let TIM know the desired prediction horizon.

Demo using Python API Client

Set up Python Libraries

In [1]:
import logging
import pandas as pd
import plotly as plt
import plotly.express as px
import plotly.graph_objects as go
import numpy as np
import json

import tim_client

Credentials and logging

(Do not forget to fill in your credentials in the credentials.json file)
In [2]:
with open('credentials.json') as f:
    credentials_json = json.load(f)                     # loading the credentials from credentials.json

TIM_URL = 'https://timws.tangent.works/v4/api'          # URL to which the requests are sent

SAVE_JSON = False                                       # if True - JSON requests and responses are saved to JSON_SAVING_FOLDER
JSON_SAVING_FOLDER = 'logs/'                            # folder where the requests and responses are stored

LOGGING_LEVEL = 'INFO'
In [3]:
level = logging.getLevelName(LOGGING_LEVEL)
logging.basicConfig(level=level, format='[%(levelname)s] %(asctime)s - %(name)s:%(funcName)s:%(lineno)s - %(message)s')
logger = logging.getLogger(__name__)
In [4]:
credentials = tim_client.Credentials(credentials_json['license_key'], credentials_json['email'], credentials_json['password'], tim_url=TIM_URL)
api_client = tim_client.ApiClient(credentials)

api_client.save_json = SAVE_JSON
api_client.json_saving_folder_path = JSON_SAVING_FOLDER
[INFO] 2020-10-29 22:41:32,425 - tim_client.api_client:save_json:74 - Saving JSONs functionality has been disabled
[INFO] 2020-10-29 22:41:32,429 - tim_client.api_client:json_saving_folder_path:89 - JSON destination folder changed to logs

Specify configuration

To let TIM know that we want to forecast one year ahead we can set the "prediction to" to 12 samples. Model will be built using a range between 1967-01 and 2016-12. Out-of-sample forecasts are made on the rest - from 2017-01 to 2019-08 (the last 20 samples). The proper way of emulating the real accuracy of TIM when used month by month would be to create more configuration files, where the training period would be rolling as the economy might change a lot in one month. But for the simplicity, we will keep the building period static. To get better insights from our model we will also want extended importance and prediction intervals to be returned.

In [5]:
configuration_backtest = {
    'usage': {                                 
        'predictionTo': { 
            'baseUnit': 'Sample',                # units that are used for specifying the prediction horizon length (one of 'Day', 'Hour', 'QuarterHour', 'Sample')
            'offset': 12                       # number of units we want to predict into the future (24 hours in this case)
        },
        'backtestLength': 20                 # number of samples that are used for backtesting (note that these samples are excluded from model building period)
    },
    "predictionIntervals": {
        "confidenceLevel": 90                  # confidence level of the prediction intervals (in %)
    },
    'extendedOutputConfiguration': {
        'returnExtendedImportances': True      # flag that specifies if the importances of features are returned in the response
    }
}

Data description

Dataset used in this example has monthly sampling rate and contains data from 1967-01 to 2019-08.

Target

Aggregated demand of motor vehicles labeled DAUTONSA.

Predictor candidates

No predictors included.

Timestamp

Timestamp is the first column and each value of the timestamp is the period it corresponds to i.e. ‘DAUTONSA’ in the row with timestamp 2011-01 corresponds to the whole demand during period between 2011-01-01 and 2011-01-31.

Forecasting scenario

In this example we will simulate a year ahead scenario (12 samples - 12 months). Each month we wish to have forecasts for 12 months ahead starting from the next month. We suppose that the demand of the preceding month is already known.

In [6]:
data = tim_client.load_dataset_from_csv_file('data.csv', sep=',')                                  # loading data from data.csv
data                                                                                               # quick look at the data
Out[6]:
DATE DAUTONSA
0 1967-01 564.100
1 1967-02 509.100
2 1967-03 670.400
3 1967-04 710.200
4 1967-05 744.800
... ... ...
627 2019-04 295.805
628 2019-05 340.160
629 2019-06 322.584
630 2019-07 275.677
631 2019-08 325.387

632 rows × 2 columns

Run TIM

In [7]:
backtest = api_client.prediction_build_model_predict(data, configuration_backtest)                 # running the RTInstantML forecasting using data and defined configuration
backtest.status                                                                                    # status of the job
Out[7]:
'Finished'

Visualize backtesting

In [8]:
fig = plt.subplots.make_subplots(rows=1, cols=1, shared_xaxes=True, vertical_spacing=0.02)      # plot initialization

fig.add_trace(go.Scatter(x = data.loc[:, "DATE"], y=data.loc[:, "DAUTONSA"],
                         name = "target", line=dict(color='black')), row=1, col=1)              # plotting the target variable

fig.add_trace(go.Scatter(x = backtest.prediction.index, 
                         y = backtest.prediction.loc[:, 'Prediction'],
                         name = "production forecast", 
                         line = dict(color='purple')), row=1, col=1)                            # plotting production prediction

fig.add_trace(go.Scatter(x = backtest.prediction_intervals_upper_values.index,
                         y = backtest.prediction_intervals_upper_values.loc[:, 'UpperValues'],
                         marker = dict(color="#444"),
                         line = dict(width=0),
                         showlegend = False), row=1, col=1)                           
fig.add_trace(go.Scatter(x = backtest.prediction_intervals_lower_values.index,
                         y = backtest.prediction_intervals_lower_values.loc[:, 'LowerValues'],
                         fill = 'tonexty',
                         line = dict(width=0),
                         showlegend = False), row=1, col=1)                                     # plotting confidence intervals

fig.add_trace(go.Scatter(x = backtest.aggregated_predictions[0]['values'].index, 
                         y = backtest.aggregated_predictions[0]['values'].loc[:, 'Prediction'],
                         name = "in-sample MAPE: " + str(round(backtest.aggregated_predictions[0]['accuracyMetrics']['MAPE'], 2)), 
                         line=dict(color='goldenrod')), row=1, col=1)                           # plotting in-sample prediction

fig.add_trace(go.Scatter(x = backtest.aggregated_predictions[1]['values'].index, 
                         y = backtest.aggregated_predictions[1]['values'].loc[:, 'Prediction'],
                         name = "out-of-sample MAPE: " + str(round(backtest.aggregated_predictions[1]['accuracyMetrics']['MAPE'], 2)), 
                         line = dict(color='red')), row=1, col=1)                               # plotting out-of-sample-sample prediction

fig.update_layout(height=600, width=1000, 
                  title_text="Backtesting, modelling difficulty: " 
                  + str(round(backtest.data_difficulty, 2)) + "%" )                             # update size and title of the plot

fig.show()

Visualize predictor and feature importances

In [9]:
simple_importances = backtest.predictors_importances['simpleImportances']                                                                # get predictor importances
simple_importances = sorted(simple_importances, key = lambda i: i['importance'], reverse=True)                                           # sort by importance
extended_importances = backtest.predictors_importances['extendedImportances']                                                            # get feature importances
extended_importances = sorted(extended_importances, key = lambda i: i['importance'], reverse=True)                                       # sort by importance

si_df = pd.DataFrame(index=np.arange(len(simple_importances)), columns = ['predictor name', 'predictor importance (%)'])                 # initialize predictor importances dataframe
ei_df = pd.DataFrame(index=np.arange(len(extended_importances)), columns = ['feature name', 'feature importance (%)', 'time', 'type'])   # initialize feature importances dataframe
In [10]:
for (i, si) in enumerate(simple_importances):
    si_df.loc[i, 'predictor name'] = si['predictorName']                   # get predictor name
    si_df.loc[i, 'predictor importance (%)'] = si['importance']            # get importance of the predictor
    
for (i, ei) in enumerate(extended_importances):
    ei_df.loc[i, 'feature name'] = ei['termName']                          # get feature name
    ei_df.loc[i, 'feature importance (%)'] = ei['importance']              # get importance of the feature
    ei_df.loc[i, 'time'] = ei['time']                                      # get time of the day to which the feature corresponds
    ei_df.loc[i, 'type'] = ei['type']                                      # get type of the feature
In [11]:
si_df.head()                                                               # predictor importances data frame
Out[11]:
predictor name predictor importance (%)
0 DAUTONSA 100
In [12]:
fig = go.Figure(go.Bar(x=si_df['predictor name'], y=si_df['predictor importance (%)']))      # plot the bar chart
fig.update_layout(height=400,                                                                # update size, title and axis titles of the chart
                  width=600, 
                  title_text="Importances of predictors",
                  xaxis_title="Predictor name",
                  yaxis_title="Predictor importance (%)")
fig.show()
In [13]:
ei_df.head()                                                               # first few of the feature importances
Out[13]:
feature name feature importance (%) time type
0 DAUTONSA(t-12) 100 [10] TargetAndTargetTransformation
1 DAUTONSA(t-12) 100 [11] TargetAndTargetTransformation
2 DAUTONSA(t-12) 100 [12] TargetAndTargetTransformation
3 DAUTONSA(t-12) 96.42 [9] TargetAndTargetTransformation
4 DAUTONSA(t-12) 48.38 [8] TargetAndTargetTransformation
In [14]:
time = '[1]'                                                                            # time for which the feature importances are visualized
fig = go.Figure(go.Bar(x=ei_df[ei_df['time'] == time]['feature name'],                       # plot the bar chart
                       y=ei_df[ei_df['time'] == time]['feature importance (%)']))
fig.update_layout(height=700,                                                                # update size, title and axis titles of the chart
                  width=1000,
                  title_text="Importances of features (for {}-sample ahead model)".format(time),
                  xaxis_title="Feature name",
                  yaxis_title="Feature importance (%)")
fig.show()