Skip to content

Data updates

To get the most out of the data, TIM needs to understand what data will be available when forecasting. When building your model on historical data, usually all data is available for all timestamps. The same is true for backtesting tasks, because the validation set is also known in advance. This might differ for production tasks, as some of the values of predictors might not immediately be available for use.

When building a model, TIM tries to understand what the data availability situation would be when the model will be put into production. TIM then translates this into constrains to put onto the models. This ensures no models are trained on data that would be unavailable when using the model in production. As a result, backtesting scenarios also more closely approximate production scenarios, making sure users harbor correct expectations. If ever more data turns out to be available than expected, the model can still be used. The newly obtained data will however not be taken into account, as the model was not built to do so.

To allow TIM to correctly understand the data availability in all possible situations, for each predictor the user must indicate when it updates and the scope of these updates (until which timestamp it updates). This information is communicated with a cron notation and a relative time notation, respectively. If a user fails to deliver this information, TIM will use default values. These default values assume that all historical values up until the moment the forecast is made, are available.

The following examples illustrate how users should go about indicating the right information conserving data updates.

Example 1 – Electricity consumption

In this example, the data has a sampling rate of one hour. Every day at 08:20, a dispatcher wants to forecast the electricity consumption for the next day (i.e. for 24 hours from 00:00 to 23:00). The latest record of electricity consumption is of 06:00 that day. The dataset contains only one predictor – temperature – and its forecast is available up until the end of the following day. Both consumption and temperature update right before the forecast is made.

Predictor Updates at Updates until
Consumption * 8 20 0 Hour-2
Temperature * 8 20 0 Day+1

Example 2 – System imbalance

This exemplary dataset has a sampling rate of 15 minutes. Every hour from 05:50 to 15:50 a trader wants to forecast the system imbalance during the next 2 hours. This results in 8 forecasts, starting with 06:00. At the time of forecasting, the latest available record of system imbalance in the database gets updated up until 20 minutes before (thus 5:30 for the first forecasting time). There are two additional predictors: temperature and windspeed. The temperature forecasts are always available for the entire forecasting horizon and get updated 3 minutes before forecasting. The most recent record of windspeed is available until 23:00 the previous day and gets updated every day at midnight.

Predictor Updates at Updates until
System imbalance * 5-15 50 0 QuarterHour-1
Temperature * 5-15 47 0 QuarterHour+8
Windspeed * 0 0 0 QuarterHour-4

Example 3 – Full scale forecasting

The data in this example has a sampling rate of one hour. Every hour from 06:00 to 16:00, the expected electricity load during the following 10 days needs to be predicted, future samples within the same day included. The latest available record of electricity consumption represents the last hour before making the forecast and updates hourly. There is one predictor included in the data – temperature – and its forecast is available up until the end of the forecast horizon. This predictor value is updated every day at 5am.

Predictor Updates at Updates until
Consumption * * 0 0 Hour-1
Temperature * 5 0 0 Day+10

Example 4 - Solar production

This example deals with data with a sampling rate of 15 minutes. Every hour a forecast has to be made of the production of photovoltaic plant (a solar system; PV_prod) starting 3 hours ahead until the end of the day, two days ahead. Values of PV_prod in previous days become available every day at 11:15. As predictors, forecasts of GHI, DNI and DIF are used and get updated every six hours, starting at midnight. These values are available for the next 84 hours. A forecast of TEMP can also be used and gets updated at same times as the other predictors. These forecasts are available for the next 3 days (until the end of the day, 3 days ahead).

Predictor Updates at Updates until
PV_Prod * 11 15 * Day-1
GHI * 0-23/6 0 0 Hour+84
DNI * 0-23/6 0 0 Hour+84
DIF * 0-23/6 0 0 Hour+84
TEMP * 0-23/6 0 0 Day+3

How does it look in TIM API request

"data": [
    {
        "uniqueName": "SoldItems",
        "type": "Target",
        "updateTime": [
        {
            "type": "Day",
            "value": "*"
        },
        {
            "type": "Hour",
            "value": "7,11"
        },
        {
            "type": "Minute",
            "value": "0"
        }
        ],
        "updateUntil": {
            "baseUnit": "Sample",
            "offset": 0
        },
        "values": {
            ...
        }
    }
]

Default values

By default, all predictors have updateTime the same as usageTime and updateUntil as predictionTo which basically means that all the values are always available for the whole forecast horizon in the moment of making a forecast. For target, the updateTime default is the same as usageTime, but default for updateUntil is S-1 which means that we have all historical values of target up until the point of making a forecast.

Lower sampling rates

For datasets that are sampled monthly or less frequently, the only valid base unit is Sample, because Days, Hours and Minutes do not make sense in this case. The whole part of when the predictor gets updated is ignored and is supposed to be at the moment of making a forecast.