# Data updates

To get the most out of the data, TIM needs to understand what data will be available when forecasting. When building your model on historical data, usually all data is available for all timestamps. The same is true for backtesting tasks, because the validation set is also known in advance. This might differ for production tasks, as some of the values of predictors might not immediately be available for use.

When building a model, TIM tries to understand what the data availability situation would be when the model will be put into production. TIM then translates this into constrains to put onto the models. This ensures no models are trained on data that would be unavailable when using the model in production. As a result, backtesting scenarios also more closely approximate production scenarios, making sure users harbor correct expectations. If ever more data turns out to be available than expected, the model can still be used. The newly obtained data will however not be taken into account, as the model was not built to do so.

To allow TIM to correctly understand the data availability in all possible situations, for each predictor the user must indicate when it updates and the scope of these updates (until which timestamp it updates). This information is communicated with a **cron notation** and a **relative time notation**, respectively. If a user fails to deliver this information, TIM will use default values. These default values assume that all historical values up until the moment the forecast is made, are available.

The following examples illustrate how users should go about indicating the right information conserving data updates.

## Example 1 – Electricity consumption¶

In this example, the data has a sampling rate of one hour. Every day at 08:20, a dispatcher wants to forecast the electricity consumption for the next day (i.e. for 24 hours from 00:00 to 23:00). The latest record of electricity consumption is of 06:00 that day. The dataset contains only one predictor – temperature – and its forecast is available up until the end of the following day. Both consumption and temperature update right before the forecast is made.

Predictor | Updates at | Updates until |
---|---|---|

Consumption | * 8 20 0 | Hour-2 |

Temperature | * 8 20 0 | Day+1 |

## Example 2 – System imbalance¶

This exemplary dataset has a sampling rate of 15 minutes. Every hour from 05:50 to 15:50 a trader wants to forecast the system imbalance during the next 2 hours. This results in 8 forecasts, starting with 06:00. At the time of forecasting, the latest available record of system imbalance in the database gets updated up until 20 minutes before (thus 5:30 for the first forecasting time). There are two additional predictors: temperature and windspeed. The temperature forecasts are always available for the entire forecasting horizon and get updated 3 minutes before forecasting. The most recent record of windspeed is available until 23:00 the previous day and gets updated every day at midnight.

Predictor | Updates at | Updates until |
---|---|---|

System imbalance | * 5-15 50 0 | QuarterHour-1 |

Temperature | * 5-15 47 0 | QuarterHour+8 |

Windspeed | * 0 0 0 | QuarterHour-4 |

## Example 3 – Full scale forecasting¶

The data in this example has a sampling rate of one hour. Every hour from 06:00 to 16:00, the expected electricity load during the following 10 days needs to be predicted, future samples within the same day included. The latest available record of electricity consumption represents the last hour before making the forecast and updates hourly. There is one predictor included in the data – temperature – and its forecast is available up until the end of the forecast horizon. This predictor value is updated every day at 5am.

Predictor | Updates at | Updates until |
---|---|---|

Consumption | * * 0 0 | Hour-1 |

Temperature | * 5 0 0 | Day+10 |

## Example 4 - Solar production¶

This example deals with data with a sampling rate of 15 minutes. Every hour a forecast has to be made of the production of photovoltaic plant (a solar system; PV_prod) starting 3 hours ahead until the end of the day, two days ahead. Values of PV_prod in previous days become available every day at 11:15. As predictors, forecasts of GHI, DNI and DIF are used and get updated every six hours, starting at midnight. These values are available for the next 84 hours. A forecast of TEMP can also be used and gets updated at same times as the other predictors. These forecasts are available for the next 3 days (until the end of the day, 3 days ahead).

Predictor | Updates at | Updates until |
---|---|---|

PV_Prod | * 11 15 * | Day-1 |

GHI | * 0-23/6 0 0 | Hour+84 |

DNI | * 0-23/6 0 0 | Hour+84 |

DIF | * 0-23/6 0 0 | Hour+84 |

TEMP | * 0-23/6 0 0 | Day+3 |

## How does it look in TIM API request¶

```
"data": [
{
"uniqueName": "SoldItems",
"type": "Target",
"updateTime": [
{
"type": "Day",
"value": "*"
},
{
"type": "Hour",
"value": "7,11"
},
{
"type": "Minute",
"value": "0"
}
],
"updateUntil": {
"baseUnit": "Sample",
"offset": 0
},
"values": {
...
}
}
]
```

## Default values¶

By default, all predictors have `updateTime`

the same as `usageTime`

and `updateUntil`

as `predictionTo`

which basically means that all the values are always available for the whole forecast horizon in the moment of making a forecast. For target, the `updateTime`

default is the same as `usageTime`

, but default for `updateUntil`

is S-1 which means that we have all historical values of target up until the point of making a forecast.

## Lower sampling rates¶

For datasets that are sampled monthly or less frequently, the only valid base unit is `Sample`

, because `Days`

, `Hours`

and `Minutes`

do not make sense in this case. The whole part of when the predictor gets updated is ignored and is supposed to be at the moment of making a forecast.