Raw and Aggregated Predictions

Predictions

When TIM is deployed in production, it looks at the current situation and selects the most appropriate model from the Model Zoo.

For example, consider a Model Zoo prepared to forecast at both 07:00 and 10:00, for both one day ahead and two days ahead. If a user wants to make a forecast at "2019-03-02 07:05:35" for "2019-03-03 15:00:00", TIM will automatically recognize that the data availability corresponds to the 07:00 scenario and that the desired forecast corresponds to the one day ahead scenario. TIM stores this information, as well as the corresponding prediction, in a JSON-file called "prediction".

TIM's ability to recognize the current situation is important; the desired forecast could also have been made in other situations, for example using the data availability scenario at 10:00 the previous day with the two days ahead scenario. This would result in less accurate forecasts, as this mode would ignore the most recent available data. TIM's ability to recognize situations thus allows to seamlessly select the best possible model to ensure the best possible forecasts are made. When using TIM to regularly make forecasts based on deployed models, TIM is able to automate all of this.

However, the situation gets more complicated when doing so-called backtesting. A user might be interested in a model's performance, before actually deploying it in production. Therefore, models are often tested on historical data before they are deployed.

In the situation described above, for each timestamp four forecasts would be made: a two day ahead forecast using the data availability from 07:00 two days earlier, a two day ahead forecast using the data availability from 10:00 two days earlier, a one day ahead forecast using the data availability from 07:00 the previous day and a one day ahead forecast using the data availability from 10:00 the previous day.

The most accurate prediction for each of the timestamps will again be stored in the JSON-file called "prediction", although this does not have much value during backtesting, since the actual value is also known.

How this looks in JSON

"prediction": {
"values": {
"2009-02-01T00:00:00.000Z": 39.2595,
"2009-02-01T01:00:00.000Z": 37.8478,
"2009-02-01T02:00:00.000Z": 36.9604,
"2009-02-01T03:00:00.000Z": 36.6577,
"2009-02-01T04:00:00.000Z": 36.9604,
"2009-02-01T05:00:00.000Z": 37.8478,
"2009-02-01T06:00:00.000Z": 39.2595,
"2009-02-01T07:00:00.000Z": 41.0993,
"2009-02-01T08:00:00.000Z": 43.2418,
"2009-02-01T09:00:00.000Z": 45.541,
"2009-02-01T10:00:00.000Z": 47.8402,
"2009-02-01T11:00:00.000Z": 49.9826
}
}

The following image can provide a convenient overview of the scenarios discussed as examples below.

Aggregated and Raw.png

Raw predictions

Raw predictions are stored in the same way they would be stored in real production. For each time of forecasting, each possible forecast will be stored. A unique key for each time of forecasting is used to assign all the respective forecasts to it.

For example, reconsider the previously described Model Zoo. The result from the 07:00 forecast, would be stored as illustrated in the following JSON example.

How this looks in JSON

"rawPredictions": [
    {
      "predictionDateTime": "2018-10-30T07:00:00.000Z",
      "values": {
        "2018-11-01T00:00:00.000Z": -0.548916,
        "2018-11-01T01:00:00.000Z": -0.557343,
        "2018-11-01T02:00:00.000Z": -0.553979,
        "2018-11-01T03:00:00.000Z": -0.567279,
        "2018-11-01T04:00:00.000Z": -0.575839
    }
]

Aggregated predictions

Aggregate predictions are stored in a more advanced way, with the use of two keys: one representing the relative time to the forecast in days and one representing the time of forecasting.

In the previous example this would result in one day ahead forecasts made at 07:00, two days ahead forecasts made at 07:00, one day ahead forecasts made at 10:00 and two day ahead forecasts made at 10:00.

This way of storing forecasts is often the most convenient, as it allows to easily answer common questions such as "What is the day ahead accuracy?" The following JSON example illustrates how the result from the 07:00 forecast for the two day ahead scenario would be stored.

How this looks in JSON

"aggregatedPredictions": [
    {
      "day": 2,
      "predictionTime": "07:00:00",
      "values": {
        "2018-11-01T00:00:00.000Z": -0.548916,
        "2018-11-01T01:00:00.000Z": -0.557343,
        "2018-11-01T02:00:00.000Z": -0.553979,
        "2018-11-01T03:00:00.000Z": -0.567279,
        "2018-11-01T04:00:00.000Z": -0.575839
    }
]

How to enable this in TIMConnector

TIM will only return these JSON-files if the user enables this in the task file. In TIMConnector, this is done as illustrated below:

forecasting:
  configuration:
    extendedOutputConfiguration:
      returnAggregatedPredictions: true
      returnRawPredictions: true