TIM forecasting output
A typical forecasting output might look like the following table:
Noticably, this table contains multiple different forecast values for a single timestamp, indicating that different predictions were made for this point in time. The next section explains why this happens.
Multiple forecasts for a single timestamp¶
When TIM is deployed in production, it looks at the current situation and selects the most appropriate model from the Model Zoo. TIM then applies this model to calculate the desired forecast.
For example, consider a Model Zoo prepared to forecast at 07:00 and 10:00, for one day ahead and two days ahead. Suppose a user wants to make a forecast at "2019-03-02 07:05:35" for "2019-03-03 15:00:00". In that case, TIM will automatically recognize that the data availability corresponds to the 07:00 scenario and that the desired forecast corresponds to the one day ahead scenario.
TIM's ability to recognize the current situation is essential; the desired forecast could also have been made in other situations, for example using the data availability scenario at 10:00 the previous day with the two days ahead scenario. This would result in less accurate forecasts, as this mode would ignore the most recent available data. TIM's ability to recognize situations thus empowers TIM to seamlessly select the best possible model to ensure the best possible forecasts are made. When using TIM to regularly use deployed models to forecast, TIM can automate this tedious situation awareness and optimal model selection.
However, the situation gets more complicated when doing so-called backtesting. A user might be interested in a model's performance before deploying it in production. Therefore, models are often tested on historical data before they are deployed. Every historical timestamp could be "forecasted" from different perspectives because the data is available to support these different views. That is why each timestamp might have multiple forecasts in multiple qualities.
The information in the output table¶
- datetime: the timestamp of the record for which the forecast was made. In cases where a user wants forecasts for records that are not part of his forecasting routine (e.g. forecast for 3 PM with a model trained to do 1 sample ahead at midnight), the timestamp won't be listed.
- date from: a specific date from which the forecast would have been made (backtesting) or was made (production).
- time from: a specific time from which the forecast would have been made (backtesting) or was made (production).
- target: the actual value. If this information is not available to TIM, the value missing is used.
- forecast: the forecast for the given record.
- forecast type: the type of the forecast; there are 3 possible values. 0 indicates that the forecast was made for a record that was used for model building (in-sample backtest), 1 indicates that the forecast was made for a record that was not used for model building but is still a known historical value (out-of-sample backtest), and 2 indicates that the forecast belongs to the desired forecasting horizon (production forecast). This indicates whether this is the best forecast possible for the given timestamp, which can only be true for out-of-sample records (see below).
- relative distance: the relative distance of the record from the date and time it was forecast from. In cases where data is sampled with a sampling period from the interval
, this is expressed in days (e.g. D+7); in cases where it is not, this is expressed in samples (e.g. S+5).
- model index: the index of the model from the Model Zoo that was used to create the forecast.
- samples ahead: the distance of the forecast from the last available target sample, measured in samples. "Available" here means "could be used in the simulated/current forecasting situation".
- lower bound: the lower bound of the prediction interval for the given forecast.
- upper bound: the upper bound of the prediction interval for the given forecast.
- bin: serves to divide forecasts into continuously plottable signals.
Extracting information from the output table¶
To meaningfully measure accuracy or plot the output structure, many filters can be used. Plots can be created only for filters with no duplicate datetime values. In general, it is preferable to distinguish between "out-of-sample" and "in-sample" values. This section lists potential filters with their limits and strengths.
- All forecasts with forecast type Production: This option returns the forecasts and records that should be looked at when doing a real-time production forecast.
- All forecasts: This filter can not be plotted; however, it can give a good idea about how the model performs in general by averaging its accuracy fields.
- Forecasts for a specific bin: This is the easiest filter to plot a continuous signal with. It is concatenated from different raw forecasts (see below) and split to different parts of the forecasting horizon so that the signals do not overlap. Earlier bins (bins with lower samples ahead) cover as many samples as possible so that the signal is continuous.
- Forecasts for a specific amount of samples ahead: This filtr can be plotted. It provides a basic idea about how the accuracy deteriorates with rising forecasting horizon. It's a good filter to use for backtesting.
- Forecasts with a specific time from and relative distance: This filter can be plotted. It is especially useful in daily repeating scenarios where multiple days ahead are forecasted at once. It provides answers to questions like "What would my day-ahead accuracy be like if I forecast every day at 7AM?". It's a good filter to use for backtesting.
- Forecasts from a specific date from and time from: This filter can be plotted. It shows how one particular forecast in the past would have looked like. The image below shows what this means visually.
- Forecasts with a specific model index: This filter can be plotted. It gives insight into the model creation and why the forecast looks like it does.It's excellent for discovering the origins of inaccurate forecasts.