Skip to content

Overview

We have put a lot of effort to create a fully automatic model building engine but sometimes, even against our best efforts, some of the models do not get you accuracy as high as they could. By toying with the exposed parametrization of the algorithm you can ensure that even the toughest dataset can be modeled properly.

The following subsections go through all the available settings of TIM Forecasting. Below is a table with all the available configuration parameters for TIM RTInstantML Forecasting.

Parameter type Configuration parameter Default
Experiment setup Prediction to Sample +1
Prediction from Sample + 1
In sample rows All records except Out-of-sample
Out of sample rows No records
Rolling window 1 day (daily cycle) / Prediction to (nondaily cycle)
Rebuilding policy New situations
Tuning parameters Model quality Combined (Very High for D+0 and D+1, High otherwise)
Features Polynomial, Time offsets, Identity, Intercept, Rest of week, Piecewise linear, Exponential moving average, Periodic
Normalization true
Model complexity automatic
Daily cycle automatic
Prediction intervals 90%
Prediction boundaries automatic
Memory limit check true
Dataset manipulation Target column First non-timestamp column
Holiday column none
Columns all
Imputation Linear for gaps no longer than 6
Time scale Originally estimated from dataset

Prediction to

Serves to define the forecasting horizon. It consists of a baseUnit (one of Month, Day, Hour, Minute, Second and Sample) and a value (non-negative integer). If not set, TIM will default to one Sample ahead.

"predictionTo": {
  "baseUnit": "Day",
  "value": 2
}

Defining PredictionTo with Samples

The easiest way to define the forecasting horizon. Engine will try to forecast all value samples starting from the last target observation in the dataset and using gaps that are equal to the sampling period estimated from the dataset (or stored in the model).

Defining PredictionTo with Month, Day, Hour, Minute and Second

It is often the case that one wishes to forecast the whole following day but does not want to count how many samples this represents (it changes based on where your last target observation currently is). This notation functions relative to the last target observation. If user sets the "predictionTo" to Day+1, TIM will recognize that it should forecast up until the last observation of the following day - ignoring where within the current day your target currently ends (parts of the datetime of the target end that are measured in a smaller granularity than baseUnit are ignored). This logic works similarly for baseUnit Hour and QuarterHour - see the table below with examples.

PredictionTo Last target observation Denotes all samples up until
D+1 28-01-2012 22:13:56 29-01-2012 23:59:59
D+0 28-01-2012 22:13:56 28-01-2012 23:59:59
H+1 28-01-2012 22:13:56 28-01-2012 23:59:59
H+0 28-01-2012 22:13:56 28-01-2012 22:59:59
Q+1 28-01-2012 22:13:56 28-01-2012 22:29:59
Q+0 28-01-2012 22:13:56 28-01-2012 22:14:59

Prediction from

Complements 'predictionTo' and allows skipping first samples in the forecasting horizon. If not set, TIM will default to one Sample ahead - not skipping anything.

"predictionFrom": {
  "baseUnit": "Sample",
  "value": 3
}

In-sample rows

Defines which samples should be used for the model building. User can specify in-sample timestamps as an array of timestamp ranges. If not set, all timestamps but the ones defined in the 'outOfsample' will be used.

"inSample": [
  {
   "from": "2009-06-01 00:00:00",
   "to": "2009-06-10 23:00:00"
  },
  {
   "from": "2009-05-01 00:00:00",
   "to": "2009-05-10 23:00:00"  
  }
]

Alternatively, this notation can be used:

"outOfSample": {
  "baseUnit": "Day",
  "value": 2
}
  • Integer number n with base unit (one of Month, Day, Hour, Minute, Second and Sample). Defines the time range starting from the end of the dataset (the newest observations of the target variable) going backwards.

Out-of-sample rows

Defines which samples should be used to backtest the Model Zoo. These will not be used during the model building and therefore the forecasts' accuracy on this region is closer to the real production setup. If not set, none will be used. There are two options how to set out-of-sample timestamps. - Array of timestamp ranges.

"outOfSample": [
  {
   "from": "2020-06-01 00:00:00",
   "to": "2020-06-10 23:00:00"
  },
  {
   "from": "2020-05-01 00:00:00",
   "to": "2020-05-10 23:00:00"  
  }
]
  • Integer number n with base unit (one of Month, Day, Hour, Minute, Second and Sample). Defines the time range starting from the end of the dataset (the newest observations of the target variable) going backwards.
"outOfSample": {
  "baseUnit": "Day",
  "value": 2
}

Rolling window

When TIM evaluates the models built on the in sample and out of sample data, it starts rolling backwards from where the target ends until the start of the dataset and forecasts the whole length of the forecasting horizon each time. User can specify the length of this rolling window to control the size of the output (using any number of months, days, hours, minutes, seconds and samples). By default, the datasets that have daily cycle use rolling window of 1 day. The rest use rolling window of 1 sample.

"rollingWindow": {
  "baseUnit": "Day",
  "value": 2
}

Rebuilding policy

Rebuilding policy controls which model in the attached Model Zoo should be rebuilt and dropped. There are 4 different options:

  • all - all models in the current Model Zoo are dropped and new models are added
  • none - no new models are added to the existing Model Zoo
  • newSituations - only models that are needed for the samples in the forecasting horizon that the current Model Zoo can not handle are built and added to it
  • olderThan - same as newSituation, but models that are older than "time" are deemed as useless and replaced with newly built ones. The only option where the "time" parameter does anything. User can specify any number of days, hours, quarterhours and samples.
"rebuildingPolicy": {
  "type": "OlderThan",
  "time": {
    "baseUnit": "Day",
    "value": 7
  }
}

Model quality

Controls the model complexity / training time tradeoff. The higher the quality is, the longer it takes to build the Model Zoo. If not set, quality Combined will be used.

  • "Low" - dummy quality, these models can be used even without any data provided
  • "Medium" - models without offsets of target
  • "High" - model usage with only limited amount of offsets of target
  • "VeryHigh" - every model uses closest target offset possible
  • "UltraHigh" - every model uses closest offset possible for every single predictor
  • "Combined" - VeryHigh for the intraday and day-ahead forecasts,High quality for further forecasting horizons
"modelQuality": "High"

Features

TIM tries to enhance the model building process with new artificially created features derived from the original predictors. There are different transformations available (those in bold are used by default):

If you want to, you can try omitting some of them by listing only those you want to use.

"features": ["TimeOffsets", "Identity", "PiecewiseLinear", "ExponentialMovingAverage",
             "SimpleMovingAverage", "Periodic", "Fourier", "RestOfWeek", "DayOfWeek",
             "PublicHolidays", "Month", "Trend", "Intercept", "Polynomial"]

Normalization

When normalization is on, predictors are scaled by mean and standard deviation. Switching off may help to model data with structural changes. If not provided or set to automatic, TIM will decide on its own.

"normalization": true

Model complexity

Determines maximal possible number of terms in each model in the Model Zoo. Difficult datasets might require lower model complexity. If not set, TIM will calculate automatic complexity based on the sampling period of the dataset.

"maxModelComplexity": 50

Daily cycle

Is a boolean value that decides whether to use individual model building approach for different times within a day. It is especially useful if the dynamics of the underlying problem changes during the day. Switching it off leads to common model building approach for all timestamps. If the parameter is not provided, TIM will decide automatically. Learn more about the importance of this parameter here.

"dailyCycle": false

Prediction intervals

The prediction interval expresses the uncertainty in prediction by creating an interval where the prediction should probably occur. The value equals to the probability that the prediction will be inside of the symmetrical prediction interval. Therefore, with increasing value the prediction intervals are getting wider.

"predictionIntervals": 95

Prediction boundaries

For some datasets, values outside of certain boundaries do not make sense - e.g. negative values for energy production. TIM tries to figure these out automatically but there is an option to override these detected values. Both lower and upper boundary should be real values. It might be useful to turn them off for datasets with a visible trend.

"predictionBoundaries": {
  "type": "Explicit",
  "maxValue": 1000,
  "minValue": 0
}

Memory limit check

TIM tries to estimate whether the current worker it currently operates on has enough memory to finish the model building and forecasting process. If not and the memory preprocessing is turned on, it will drop some of the rows and columns of the dataset and turn off some of the transformations. By default, it is turned on. If turned off, this may lead to the crash of the operation for big datasets.

"memoryLimitCheck": false

Target column

Position of the column which contains the target variable.

"targetColumn": 2

Holiday column

Name of the column which contains the holiday variable. If not provided, TIM will assume there is none provided.

"holidayColumn": 5

Columns

List of all columns (given either by their names or order) that should be used in the model building. If not provided, TIM will use all. The target column will always be included.

"columns": [5, "y"]

Imputation

Imputation setting applies if there are missing values in the dataset. Using this setting, TIM will impute all gaps in the data that are not longer than the maxLength parameter. There are two imputation methods/types available - Linear and LOCF for linear interpolation and imputation with the last non-missing observation carried forward respectively. The last type None turns imputation off. The default setting is Linear with maxLength 6.

"imputation": {
  "type": "Linear",
  "maxLength": 1
}

Time scale

Determines the rescaling of the original dataset to a new sampling period that should be used instead. The baseUnit of the rescaling is limited to one of Day, Hour, Minute and Second). If not set, the originally estimated sampling period will be used. Timescaling only works from lower sampling periods to higher and does not work for data sampled in months.

"timeScale": {
  "baseUnit": "Day",
  "value": 2
}