# Overview

We have put a lot of effort to create a fully automatic model building engine but sometimes, even against our best efforts, some of the models do not get you accuracy as high as they could. By toying with the exposed parametrization of the algorithm you can ensure that even the toughest dataset can be modeled properly.

The following subsections go through all the available settings of TIM Forecasting. Below is a table with all the available configuration parameters for TIM RTInstantML Forecasting.

Parameter type | Configuration parameter | Default |
---|---|---|

Experiment setup | Prediction to |
Sample +1 |

Prediction from |
Sample + 1 | |

In sample rows |
All records except Out-of-sample | |

Out of sample rows |
No records | |

Rolling window |
1 day (daily cycle) / Prediction to (nondaily cycle) | |

Rebuilding policy |
New situations | |

Tuning parameters | Model quality |
Combined (Very High for D+0 and D+1, High otherwise) |

Features |
Polynomial, Time offsets, Identity, Intercept, Rest of week, Piecewise linear, Exponential moving average, Periodic | |

Normalization |
true | |

Model complexity |
automatic | |

Daily cycle |
automatic | |

Prediction intervals |
90% | |

Prediction boundaries |
automatic | |

Memory limit check |
true | |

Dataset manipulation | Target column |
First non-timestamp column |

Holiday column |
none | |

Columns |
all | |

Imputation |
Linear for gaps no longer than 6 | |

Time scale |
Originally estimated from dataset |

## Prediction to¶

Serves to define the forecasting horizon. It consists of a **baseUnit** (one of *Month, Day, Hour, Minute, Second* and *Sample*) and a **value** (non-negative integer). If not set, TIM will default to one Sample ahead.

```
"predictionTo": {
"baseUnit": "Day",
"value": 2
}
```

### Defining PredictionTo with Samples¶

The easiest way to define the forecasting horizon. Engine will try to forecast all **value** samples starting from the last target observation in the dataset and using gaps that are equal to the sampling period estimated from the dataset (or stored in the model).

### Defining PredictionTo with Month, Day, Hour, Minute and Second¶

It is often the case that one wishes to forecast the whole following day but does not want to count how many samples this represents (it changes based on where your last target observation currently is). This notation functions relative to the last target observation. If user sets the "predictionTo" to Day+1, TIM will recognize that it should forecast up until the last observation of the following day - ignoring where within the current day your target currently ends (parts of the datetime of the target end that are measured in a smaller granularity than **baseUnit** are ignored). This logic works similarly for **baseUnit** *Hour* and *QuarterHour* - see the table below with examples.

PredictionTo | Last target observation | Denotes all samples up until |
---|---|---|

D+1 | 28-01-2012 22:13:56 | 29-01-2012 23:59:59 |

D+0 | 28-01-2012 22:13:56 | 28-01-2012 23:59:59 |

H+1 | 28-01-2012 22:13:56 | 28-01-2012 23:59:59 |

H+0 | 28-01-2012 22:13:56 | 28-01-2012 22:59:59 |

Q+1 | 28-01-2012 22:13:56 | 28-01-2012 22:29:59 |

Q+0 | 28-01-2012 22:13:56 | 28-01-2012 22:14:59 |

## Prediction from¶

Complements 'predictionTo' and allows skipping first samples in the forecasting horizon. If not set, TIM will default to one Sample ahead - not skipping anything.

```
"predictionFrom": {
"baseUnit": "Sample",
"value": 3
}
```

## In-sample rows¶

Defines which samples should be used for the model building. User can specify in-sample timestamps as an array of timestamp ranges. If not set, all timestamps but the ones defined in the 'outOfsample' will be used.

```
"inSample": [
{
"from": "2009-06-01 00:00:00",
"to": "2009-06-10 23:00:00"
},
{
"from": "2009-05-01 00:00:00",
"to": "2009-05-10 23:00:00"
}
]
```

Alternatively, this notation can be used:

```
"outOfSample": {
"baseUnit": "Day",
"value": 2
}
```

- Integer number n with base unit (one of
*Month, Day, Hour, Minute, Second*and*Sample*). Defines the time range starting from the end of the dataset (the newest observations of the target variable) going backwards.

## Out-of-sample rows¶

Defines which samples should be used to backtest the Model Zoo. These will not be used during the model building and therefore the forecasts' accuracy on this region is closer to the real production setup. If not set, none will be used. There are two options how to set out-of-sample timestamps. - Array of timestamp ranges.

```
"outOfSample": [
{
"from": "2020-06-01 00:00:00",
"to": "2020-06-10 23:00:00"
},
{
"from": "2020-05-01 00:00:00",
"to": "2020-05-10 23:00:00"
}
]
```

- Integer number n with base unit (one of
*Month, Day, Hour, Minute, Second*and*Sample*). Defines the time range starting from the end of the dataset (the newest observations of the target variable) going backwards.

```
"outOfSample": {
"baseUnit": "Day",
"value": 2
}
```

## Rolling window¶

When TIM evaluates the models built on the in sample and out of sample data, it starts rolling backwards from where the target ends until the start of the dataset and forecasts the whole length of the forecasting horizon each time. User can specify the length of this rolling window to control the size of the output (using any number of months, days, hours, minutes, seconds and samples). By default, the datasets that have daily cycle use rolling window of 1 day. The rest use rolling window of 1 sample.

```
"rollingWindow": {
"baseUnit": "Day",
"value": 2
}
```

## Rebuilding policy¶

Rebuilding policy controls which model in the attached Model Zoo should be rebuilt and dropped. There are 4 different options:

- all - all models in the current Model Zoo are dropped and new models are added
- none - no new models are added to the existing Model Zoo
- newSituations - only models that are needed for the samples in the forecasting horizon that the current Model Zoo can not handle are built and added to it
- olderThan
- same as newSituation, but models that are older than "time" are deemed as useless and replaced with newly built ones. The only option where the "time" parameter does anything. User can specify any number of days, hours, quarterhours and samples.

```
"rebuildingPolicy": {
"type": "OlderThan",
"time": {
"baseUnit": "Day",
"value": 7
}
}
```

## Model quality¶

Controls the model complexity / training time tradeoff. The higher the quality is, the longer it takes to build the Model Zoo. If not set, quality *Combined* will be used.

- "Low" - dummy quality, these models can be used even without any data provided
- "Medium" - models without offsets of target
- "High" - model usage with only limited amount of offsets of target
- "VeryHigh" - every model uses closest target offset possible
- "UltraHigh" - every model uses closest offset possible for every single predictor
- "Combined" -
*VeryHigh*for the intraday and day-ahead forecasts,*High*quality for further forecasting horizons

```
"modelQuality": "High"
```

## Features¶

TIM tries to enhance the model building process with new artificially created features derived from the original predictors. There are different transformations available (those in bold are used by default):

**Piecewise linear****Periodic components****Weekrest**- Day of week
**Intercept****Polynomial****Exponential Moving Average**- Simple moving average
**Time Offsets****Identity**- Fourier
- Trend
- Month
- Public Holidays

If you want to, you can try omitting some of them by listing only those you want to use.

```
"features": ["TimeOffsets", "Identity", "PiecewiseLinear", "ExponentialMovingAverage",
"SimpleMovingAverage", "Periodic", "Fourier", "RestOfWeek", "DayOfWeek",
"PublicHolidays", "Month", "Trend", "Intercept", "Polynomial"]
```

## Normalization¶

When normalization is on, predictors are scaled by mean and standard deviation. Switching off may help to model data with structural changes. If not provided or set to automatic, TIM will decide on its own.

```
"normalization": true
```

## Model complexity¶

Determines maximal possible number of terms in each model in the Model Zoo. Difficult datasets might require lower model complexity. If not set, TIM will calculate automatic complexity based on the sampling period of the dataset.

```
"maxModelComplexity": 50
```

## Daily cycle¶

Is a boolean value that decides whether to use individual model building approach for different times within a day. It is especially useful if the dynamics of the underlying problem changes during the day. Switching it off leads to common model building approach for all timestamps. If the parameter is not provided, TIM will decide automatically. Learn more about the importance of this parameter **here**.

```
"dailyCycle": false
```

## Prediction intervals¶

The prediction interval expresses the uncertainty in prediction by creating an interval where the prediction should probably occur. The value equals to the probability that the prediction will be inside of the symmetrical prediction interval. Therefore, with increasing value the prediction intervals are getting wider.

```
"predictionIntervals": 95
```

## Prediction boundaries¶

For some datasets, values outside of certain boundaries do not make sense - e.g. negative values for energy production. TIM tries to figure these out automatically but there is an option to override these detected values. Both lower and upper boundary should be real values. It might be useful to turn them off for datasets with a visible trend.

```
"predictionBoundaries": {
"type": "Explicit",
"maxValue": 1000,
"minValue": 0
}
```

## Memory limit check¶

TIM tries to estimate whether the current worker it currently operates on has enough memory to finish the model building and forecasting process. If not and the memory preprocessing is turned on, it will drop some of the rows and columns of the dataset and turn off some of the transformations. By default, it is turned on. If turned off, this may lead to the crash of the operation for big datasets.

```
"memoryLimitCheck": false
```

## Target column¶

Position of the column which contains the target variable.

```
"targetColumn": 2
```

## Holiday column¶

Name of the column which contains the holiday variable. If not provided, TIM will assume there is none provided.

```
"holidayColumn": 5
```

## Columns¶

List of all columns (given either by their names or order) that should be used in the model building. If not provided, TIM will use all. The target column will always be included.

```
"columns": [5, "y"]
```

## Imputation¶

Imputation setting applies if there are missing values in the dataset. Using this setting, TIM will impute all gaps in the data that are not longer than the `maxLength`

parameter. There are two imputation methods/types available - `Linear`

and `LOCF`

for linear interpolation and imputation with the last non-missing observation carried forward respectively. The last type `None`

turns imputation off. The default setting is Linear with maxLength 6.

```
"imputation": {
"type": "Linear",
"maxLength": 1
}
```

## Time scale¶

Determines the rescaling of the original dataset to a new sampling period that should be used instead. The baseUnit of the rescaling is limited to one of *Day, Hour, Minute and Second*). If not set, the originally estimated sampling period will be used. Timescaling only works from lower sampling periods to higher and does not work for data sampled in months.

```
"timeScale": {
"baseUnit": "Day",
"value": 2
}
```