Skip to content

Backtesting

Backtesting is the act of building a model on a set of historical data - called the in-sample period - and evaluating it on another part of the historical data - called the out-of-sample period - to get a feeling of how the model would perform in real production scenarios. In this section, the parameters that influence the shape of the backtesting and its methodology are discussed.

Data alignment

When producing a forecast, TIM always starts from the end (i.e. the last timestamp) of the target variable. Its length and the number of observations to skip can be controlled by the 'predictionTo' and 'predictionFrom' parameters.

TIM also takes the shape of other predictors into consideration. Some predictors might be available past the availability of the target, and some might have their observations delayed compared to the target. TIM will recognize this and create models that are perfectly fit to use all the available data. A more detailed explanation can be found in the model building section of this documentation.

When backtesting, data do not have to be aligned as they will be later in production. A common case is when all predictors have the same availability as target variable during backtesting. An examplary situation is displayed in the picture below.

image.png

The actual situation which should be backtested may be with Predictor 1 available for the whole prediction horizon, as displayed on the second picture below.

image.png

To allow for this, it is possible to set the data alignment in the job configuration. User may shift the last target observation. This will influence for which timestamps a production forecast will be calculated and how the rolling window will be applied. There is also a possibility to set the alignment for all other (non-target) variables. These alignments are given relative to the last target observation. If no alignment is given for some variable it will be copied from the original data. That means the relative distance between the last target observation in the original data (not the one set) and the last variable observation in the original data will be used.

Note: If a new alignment is set, it may happen that no forecasts will be calculated for some timestamps because the data are not actually available at the end of the dataset.

Copying the "situation"

When performing forecasting on the historical part of a dataset, the relative shape of predictors and targets potentially does not correctly represent a real situation, since for every target observation, all predictor observations are typically available as well (not considering missing observations). TIM is built to recognize this and recreates the same situation as observed at the end of the dataset. That way the historical forecast is ensured to produce a similar accuracy to the one expected to achieve when the production forecast can be evaluated.

image.png

To make this work, TIM exploits the fact that each of the samples in the forecasting horizon is forecasted with a different model from the Model Zoo. TIM remembers these models, and when creating a forecast on the historical part of the data, uses the same models in the same order. If a model cannot be evaluated, NaN is returned. This might happen if there are missing data around that part of the dataset.

image.png

Using the rolling window parameter

A rolling window controls the points from which the forecast on historical data is made. The first point is exactly "rollingWindow" observations distanced from the last target timestamp. It then "rolls" back over the data with this same distance until it reaches the start of the dataset. If the rolling window is smaller than the prediction horizon, the in-sample forecasts will overlap - this will cause the output table to contain multiple forecasts for the same timestamp.

image.png

The default rolling window for nondaily cycle data matches the prediction horizon (predictionTo). The default rolling window for daily cycle data is one day.

Backtesting with daily cycle data

Changing the rolling window for daily cycle data is tricky, because models used for producing the production forecast are built for a specific time of day. This means that they will most likely not be able to evaluate forecasts, if the rolling window would be changed to some value that is not a divisor or a multiple of a day. This also means that if a daily cycle dataset ends at 13:00, but the goal is to backtest how TIM would perform at 12:00, the last observations of the dataset should be removed accordingly. This applies still even if the Model Zoo already contains models built to forecast from 12:00, because the backtesting prioritizes using models that the production forecast was produced with, and the production forecast would be done from 13:00 in this case.

Understanding the in-sample and out-of-sample settings

These settings serve a user to mark parts of the data that the model should be built on, and parts on which the model should be evaluated. This does not influence the position of the rolling windows; hey will stay the same, but will be cut in places where the forecast (either in-sample or out-of-sample) is not desired.

image.png