Skip to content

Backtesting

Backtesting is the act of building a model on a set of historical data - called the in-sample period - and evaluating it on another part of the historical data - called the out-of-sample period - to get a feeling of how the model would perform in real production scenarios. In this section, the parameters that influence the shape of the backtesting and its methodology are discussed.

Copying the "situation"

When producing a forecast, TIM always starts from the end (i.e. the last timestamp) of the target variable. Its length and the number of observations to skip can be controlled by the 'predicitonTo' and 'predictionFrom' parameters.

TIM also takes the shape of other predictors into consideration. Some might be available passed the availability of the target, and some might have their observations delayed compared to the target. TIM will recognize this and create models that are perfectly fit to use all the available data. However, when performing forecasting on the historical part of a dataset, this relative shape of predictors and targets would not be the same, since for every target observation, all predictor observations are typically available as well (not considering missing observations). TIM is built to recognize this and recreates the same situation as observed at the end of the dataset. That way the historical forecast is ensured to produce a similar accuracy to the one expected to achieve whe the production forecast can be evaluated.

image.png

To make this work, TIM exploits the fact that each of the samples in the forecasting horizon is forecasted with a different model from the Model Zoo. TIM remembers these models, and when creating a forecast on the historical part of the data, uses the same models in the same order. If a model cannot be evaluated, NaN is returned. This might happen if there are missing data around that part of the dataset.

image.png

Using the rolling window parameter

A rolling window controls the points from which the forecast on historical data is made. The first point is exactly "rollingWindow" observations distanced from the last target timestamp. It then "rolls" back over the data with this same distance until it reaches the start of the dataset. If the rolling window is smaller than the prediction horizon, the in-sample forecasts will overlap - this will cause the output table to contain multiple forecasts for the same timestamp.

image.png

The default rolling window for nondaily cycle data matches the prediction horizon (predictionTo). The default rolling window for daily cycle data is one day.

Backtesting with daily cycle data

Changing the rolling window for daily cycle data is tricky, because models used for producing the production forecast are built for a specific time of day. This means that they will most likely not be able to evaluate forecasts, if the rolling window would be changed to some value that is not a divisor or a multiple of a day. This also means that if a daily cycle dataset ends at 13:00, but the goal is to backtest how TIM would perform at 12:00, the last observations of the dataset should be removed accordingly. This applies still even if the Model Zoo already contains models built to forecast from 12:00, because the backtesting prioritizes using models that the production forecast was produced with, and the production forecast would be done from 13:00 in this case.

Understanding the in-sample and out-of-sample settings

These settings serve a user to mark parts of the data that the model should be built on, and parts on which the model should be evaluated. This does not influence the position of the rolling windows; hey will stay the same, but will be cut in places where the forecast (either in-sample or out-of-sample) is not desired.

image.png