Skip to main content

Configuration

We have put a lot of effort into creating a fully automatic model building engine. Still, even against our best efforts, sometimes some models do not get the highest possible accuracy. However, users can ensure that even the toughest dataset can be modeled properly by toying with the algorithm's exposed parametrization.

The following subsections go through all of the available settings of TIM Forecasting. The table below shows all configuration parameters available for different job types.

Overall configuration​

Configuration parameterbuild-modelrebuild-modelpredictdefault
Prediction toβ˜‘β˜‘β˜‘Sample +1
Prediction fromβ˜‘β˜‘β˜‘Sample + 1
Model qualityβ˜‘β˜‘β˜Combined (Very High for D+0 and D+1, High otherwise)
Normalizationβ˜‘β˜‘β˜true
Model complexityβ˜‘β˜‘β˜automatic
Featuresβ˜‘β˜‘β˜Polynomial, Time offsets, Identity, Intercept, Rest of week, Piecewise linear, Exponential moving average, Periodic
Daily cycleβ˜‘β˜β˜automatic
Allow offsetsβ˜‘β˜‘β˜true
Offset limitβ˜‘β˜‘β˜automatic
Memory limit checkβ˜‘β˜‘β˜true
Rebuilding policyβ˜β˜‘β˜New situations
Prediction intervalsβ˜‘β˜β˜90%
Prediction boundariesβ˜‘β˜‘β˜‘automatic
Rolling windowβ˜‘β˜‘β˜‘1 day (daily cycle) / Prediction to (nondaily cycle)
Backtestβ˜‘β˜‘β˜All

β˜‘ available in a given method
☐ not available in a given method

Prediction to​

This setting serves to define the forecasting horizon. It consists of a baseUnit (one of Month, Day, Hour, Minute, Second and Sample) and a value (non-negative integer). If not set, TIM will default to one Sample ahead.

"predictionTo": {
"baseUnit": "Day",
"value": 2
}

Defining PredictionTo with Samples​

This is the easiest way to define the forecasting horizon. TIM will try to forecast value samples starting from the last target observation in the dataset and using a step size equal to the sampling period estimated from the dataset (or stored in the model).

Defining PredictionTo with Month, Day, Hour, Minute and Second​

Often, a user wishes to forecast the entire following day, but does not want to count how many samples this represents (it changes based on where the last target observation currently is). This notation functions relative to the last target observation. Suppose the user sets the "predictionTo" to Day+1. In that case, TIM will recognize that it should forecast up until the last observation of the following day - ignoring where within the current day your target currently ends (parts of the datetime of the target end that are measured in a smaller granularity than baseUnit are ignored). This logic works similarly for baseUnit Hour and QuarterHour - see the table below with examples.

PredictionToLast target observationDenotes all samples up until
D+128-01-2012 22:13:5629-01-2012 23:59:59
D+028-01-2012 22:13:5628-01-2012 23:59:59
H+128-01-2012 22:13:5628-01-2012 23:59:59
H+028-01-2012 22:13:5628-01-2012 22:59:59
Q+128-01-2012 22:13:5628-01-2012 22:29:59
Q+028-01-2012 22:13:5628-01-2012 22:14:59

Prediction from​

This setting complements 'predictionTo' and allows skipping the first samples in the forecasting horizon. If not set, TIM will default to one Sample ahead, not skipping anything.

"predictionFrom": {
"baseUnit": "Sample",
"value": 3
}

Model quality​

This setting is deprecated and was replaced by target offsets and predictor offsets. It controls the model complexity versus training time tradeoff. The higher the model quality, the longer it takes to build the Model Zoo. If not set, Combined will be used. Options are:

  • Low:
    • dummy quality, these models can be used even without any data provided,
    • is replaced by setting target offsets to None and selecting only target column under columns in the data configuration and leaving the predictor offsets unset or set to Common,
  • Medium:
    • models without offsets of the target variable,
    • is replaced by setting target offsets to None and leaving the predictor offsets unset or set to Common,
  • High:
    • models using only a limited amount of offsets of the target variable,
    • is replaced by setting target offsets to Common and leaving the predictor offsets unset or set to Common,
  • VeryHigh:
    • every model uses the closest target offset possible,
    • is replaced by setting target offsets to Close and leaving the predictor offsets unset or set to Common,
  • Combined:
    • VeryHigh quality for intra-day and day-ahead forecasts, High quality for further forecasting horizons,
    • is replaced by setting target offsets to Combined and leaving the predictor offsets unset or set to Common, and
  • UltraHigh:
    • every model uses the closest offset possible for every single predictor,
    • is replaced by setting predictor offsets to Close and leaving target offsets unset or set to Close.

Note: For the qualities Medium, High and VeryHigh, a selection of the offsets within a day is optimized to minimize training time. This may cause scenarios where two identical situations within two different prediction horizons can have slightly different models; e.g. models for S+1 may be different if the prediction horizon is in one case set to S+5 and in the other case to S+10.

"modelQuality": "High"

Target offsets​

This setting controls offsets of the target variable used in the model building process. Options are:

  • None: models without offsets of the target variable,
  • Common: models for situations within one day using only common target offsets,
  • Close: every model uses the closest target offset possible, and
  • Combined: Close offsets for the first two days from the last target timestamp and Common offsets for the rest of the forecasting horizon.

The more specific offsets for individual situations are used, the longer the training time takes. For longer time horizons, using the closest possible target offsets stops improving accuracy and unnecessarily slows down model building. If the forecasting horizon is too far, then the target offsets may not be used at all. Sometimes usage of target offsets is given by use case, e.g., soft sensors simulate the target variable only from the predictors.

This setting is linked to other settings. If predictor offsets are set to Close, then the target offsets can be only None or Close. If allow offsets is set to false, then it can be only None.

Default setting is determined by TIM in the most appropiete way. If allow offsets is set to false, then default is None. If predictor offsets are set to Close, then default is Close. It is Combined in all other cases.

Note: For the Common, Close and Combined target offsets, a selection of the offsets within a day is optimized to minimize training time. This may cause scenarios where two identical situations within two different prediction horizons can have slightly different models, e.g. models for S+1 may be different if the prediction horizon is in one case set to S+5 and in the other case to S+10.

"targetOffsets": "Close"

Predictor offsets​

Default setting is Common. Which means feature selection from predictors is done in batches by days. Setting this setting to Close will cause that each situation from the forecasting horizon will be trained individually with the closest possible offsets for predictors and target (if target and predictor offsets are allowed). However, it increases model building time. Close can be appropriate in the case of a short prediction horizon, if not all predictors are available during the entire prediction horizon. Options are:

  • Common: models for situations within one day using only common predictor offsets, and
  • Close: every model uses the closest predictors offsets possible.
"predictorOffsets": "Common"

Note: Due to the individual training, Close can affect models even if only the target variable is available.

Normalization​

When normalization is on, predictors are scaled by their mean and standard deviation. Switching normalization off may help to model data with structural changes. If not provided or set to automatic, TIM will decide automatically.

"normalization": true

Model complexity​

This setting determines the maximal possible number of terms in each model in the Model Zoo. Challenging datasets might require a lower model complexity. If not set, TIM will calculate the model complexity automatically based on the sampling period of the dataset.

"maxModelComplexity": 50

Features​

TIM tries to enhance the model building process with new artificially created features derived from the original predictors. The following different transformations are available (those in bold are used by default):

It is possible to change the selection of features TIM can use by explicitly sending a list of the features to use (potentially also omitting features that are by default included).

"features": ["TimeOffsets", "Identity", "PiecewiseLinear", "ExponentialMovingAverage",
"SimpleMovingAverage", "Periodic", "Fourier", "RestOfWeek", "DayOfWeek",
"PublicHolidays", "Month", "Trend", "Intercept", "Polynomial"]

Daily cycle​

This setting is a boolean value determining whether or not to use an individual model building approach for different times within a day. Doing so is beneficial if the dynamics of the underlying problem change during the day. Switching it off leads to a common model building approach for all timestamps. If the parameter is not provided, TIM will decide automatically. Learn more about the importance of this parameter in the dedicated section on daily cycle.

"dailyCycle": false

Allow offsets​

Allow offsets is a boolean value that determines whether to use offsets of predictors in the model. If allow offsets is set to false, no time offsets, exponential moving average or simple moving average will be used in the model; they should not be explicitly deselected in the feature configuration. The piecewise linearity transformation will be made only from predictors that are available at the forecasted timestamp. If allow offsets is set to false, the explicit offset limit parameter cannot be set to anything other than 0. This setting is applied for all predictors including the target variable. Therefore, setting model quality to High, VeryHigh or Combined while setting allow offsets to false will return the same result as setting model quality to Medium. Calendar features may still occur in the model with offsets, since these are engine features and are obtained only from the forecasted timestamp.

"allowOffsets": false

Offset limit​

Offset limit can be set as an explicit value; if it is not set, the value will be determined automatically. This value is a negative number defining how far into the past offsets can go. This setting is mainly used to generate time offsets. Only offsets from the range defined by the offset limit and the closest available offset of a variable will be considered in the model building process. The features exponential moving average, simple moving average and piecewise linearity will be calculated from a variable only if the closest available offset of the variable is closer to the dataset end than the offset limit. The features public holidays, weekrest and weekday will not be affected by this setting, since they are determined separately.

If allow offsets is set to false, the explicit offset limit cannot be set to anything other than 0. The offset limit that was used in model building can be found in the job log.

"offsetLimit": {
"type": "Explicit",
"value": -10
}

Memory limit check​

TIM tries to estimate whether the worker it currently operates on has enough memory to finish the model building and forecasting process. If not, and the memory preprocessing is turned on, it will drop some of the rows and columns of the dataset and turn off some of the transformations. By default, it is turned on. If turned off, this may lead to a crash of the operation for big datasets.

"memoryLimitCheck": false

Rebuilding policy​

The rebuilding policy controls which model(s) of the given parent job's Model Zoo should be rebuilt and which should be dropped. There are three different options:

  • all: all models in the current Model Zoo are dropped, and new models are added;
  • newSituations: only models that are needed for the given forecasting horizon that the current Model Zoo cannot handle are built and added to the Model Zoo;
  • olderThan timestamp: the same behavior as newSituations, but models older than "timestamp" are deemed useless and replaced with newly built ones too. This is the only option where it makes sense to include the time parameter. The user can specify any number of days, hours, quarter-hours or samples.
"rebuildingPolicy": {
"type": "OlderThan",
"time": {
"baseUnit": "Day",
"value": 7
}
}

Prediction intervals​

The prediction intervals setting expresses the uncertainty in prediction by creating an interval where the prediction should probably occur. The value of this setting expresss the probability that the prediction will be inside the symmetrical prediction interval. Therefore, with increasing value, the prediction intervals widen.

"predictionIntervals": 95

Prediction boundaries​

For some datasets, values outside certain boundaries do not make sense - e.g. negative values for energy production. TIM tries to figure these out automatically, but there is an option to override these detected values. Both the lower and upper boundaries should be real values. It might be useful to turn prediction boundaries off for datasets with a visible trend.

"predictionBoundaries": {
"type": "Explicit",
"maxValue": 1000,
"minValue": 0
}

Rolling window​

When TIM evaluates the models built on the in-sample and out-of-sample data, it starts rolling backwards from where the target variable ends until the start of the dataset and forecasts the whole length of the forecasting horizon each time. The user can specify the length of this rolling window to control the size of the output (using any number of months, days, hours, minutes, seconds or samples). By default, for daily cycle datasets a rolling window of 1 day is used and for nondaily cycle datasets a rolling window of 1 sample is used.

"rollingWindow": {
"baseUnit": "Day",
"value": 2
}

Backtest​

This setting determines which types of forecasts should be returned. The Production option only returns the production forecast, the OutOfSample option also produces out-of-sample forecasts, and the All option also delivers in-sample forecasts.

"backtest": "All"

Data configuration​

Configuration parameterbuild-modelrebuild-modelpredictdefault
In-sample rowsβ˜‘β˜‘β˜All records except Out-of-sample
Out-of-sample rowsβ˜‘β˜‘β˜‘No records
Imputationβ˜‘β˜‘β˜‘Linear for gaps no longer than 6
Columnsβ˜‘β˜‘β˜all
Target columnβ˜‘β˜β˜First non-timestamp column
Holiday columnβ˜‘β˜β˜none
Time scaleβ˜‘β˜β˜Originally estimated from dataset
Aggregationβ˜‘β˜β˜Mean (numerical variables) / Maximum(boolean variables)
Alignmentβ˜‘β˜‘β˜‘Determined from dataset end
Preprocessorsβ˜‘β˜‘β˜‘No preprocessors

β˜‘ available in a given method
☐ not available in a given method

In-sample rows​

This setting defines which samples should be used for model building (training). The user can specify the in-sample timestamps as an array of timestamp ranges. If not set, all timestamps but the ones defined in the 'outOfsample' rows will be used.

"inSampleRows": [
{
"from": "2009-06-01 00:00:00",
"to": "2009-06-10 23:00:00"
},
{
"from": "2009-05-01 00:00:00",
"to": "2009-05-10 23:00:00"
}
]

Alternatively, a relative notation can be used, expressed as an integer number n with its base unit (one of Month, Day, Hour, Minute, Second and Sample). This defines the length of the time range. The type of the relative range defines the start and the direction from which it is calculated. The Last starts from the last non-missing target observation (the newest observation of the target variable) going backwards and the First starts from the first non-missing target observation (the oldest observation) going forward. If no type is specified, default value is Last.

"inSampleRows": {
"type": "Last",
"baseUnit": "Day",
"value": 2
}

If there is an intersection of the insSampleRowsΒ° with the outOfSampleRows, observations in the intersection will be considered as follows:

  • by default, observations in the intersection will be considered as outOfsample,
  • when outOfSampleRows are defined as a relative range starting from the first target timestamp (type First), the observations in the intersection will be considered as inSample; the reasoning here is that for out-of-sample validation data towards the end of the dataset are more relevant.

Out-of-sample rows​

This setting which samples should be used to backtest (validate) the Model Zoo. These observations will not be used during model building (training), and therefore the forecasts' accuracy on this region more closely resembles that of the real production setup. If not set, none will be used.

There are two ways to configure the out-of-sample rows:

  • as an array of timestamp ranges:
"outOfSampleRows": [
{
"from": "2020-06-01 00:00:00",
"to": "2020-06-10 23:00:00"
},
{
"from": "2020-05-01 00:00:00",
"to": "2020-05-10 23:00:00"
}
]
  • as an integer number n with base unit (one of Month, Day, Hour, Minute, Second and Sample), defining the relative time range and the type of the relative range defining the start and direction (First and Last calculated from the first / last non-missing target observation, default is Last).
"outOfSampleRows": {
"type": "Last",
"baseUnit": "Day",
"value": 2
}

If there is an intersection of the insSampleRowsΒ° with the outOfSampleRows, observations in the intersection will be considered as follows:

  • by default, observations in the intersection will be considered as outOfsample,
  • when outOfSampleRows are defined as a relative range starting from the first target timestamp (type First), the observations in the intersection will be considered as inSample; the reasoning here is that for out-of-sample validation data towards the end of the dataset are more relevant.

Imputation​

The imputation setting applies if there are missing values in the dataset. Using this setting, TIM will impute all gaps in the data that are not longer than the maxLength parameter (in amount of samples). There are two available imputation methods or types: Linear (for linear interpolation) and LOCF (for Last Observation Carried Forward or imputation with the last non-missing observation). The type None turns off imputation. The default setting is Linear with maxLength 6.

"imputation": {
"type": "Linear",
"maxLength": 1
}

Columns​

This setting lists all columns (given either by their names or numbers) that should be used for model building. If not provided, TIM will use all available columns. The target column should always be included.

"columns": [5, "y"]

Target column​

This setting defines the column (given either by its name or number) that contains the target variable.

"targetColumn": 2

Holiday column​

This setting defines the column (given either by its name or number) that contains the holiday variable. If not provided, TIM will assume there is none provided.

"holidayColumn": 5

Time scale​

This setting determines the rescaling of the original dataset to another sampling period. The baseUnit of the rescaling is limited to one of Day, Hour, Minute or Second). If not set, the original estimated sampling period will be used. Time scaling only works from lower sampling periods to higher sampling periods, and does not work for data sampled monthly.

"timeScale": {
"baseUnit": "Day",
"value": 2
}

Aggregation​

This setting defines the aggregation function used for the target variable; predictor variables are always aggregated by the default aggregation function. Available aggregation types are Mean, Sum, Minimum and Maximum. The default aggregation is Mean for numerical variables and the Maximum for boolean variables. It is related to the time scale parameter, as the sampling period to aggregate to is defined there.

"aggregation": "Mean"

Alignment​

The alignment setting provides the possibility to set the alignment at the end of the dataset, which is useful for backtesting. This setting enables setting the timestamp of the last target observation (i.e., lastTargetTimestamp) from which the rolling window is applied and production forecasts are calculated. If not given, the last non-missing target timestamp from the original data is used. The last target timestamp cannot be lower than any out-of-sample record. Availabilities of all other variables (except target) may be given relatively to the last non-missing target timestamp. If the alignment is not provided for some variable, the alignment from the original data is taken. This means that difference between the last non-missing timestamp of a variable in the data and the last non-missing target timestamp in the data is used. For more details check the data alignment section.

"alignment": {
"lastTargetTimestamp": "2021-01-31 00:00:00Z",
"dataUntil": [
{
"column": "Sales",
"baseUnit": "Hour",
"offset": -2
}
]
}

Preprocessors​

This setting provides an array of filters and transformations that will be applied on the data in the given order. (Currently only one preprocessor is defined.)

"preprocessors": [
{
"type": "CategoryFilter",
"value": {
"column": "ColumnName_1",
"categories": [1, 2, 3]
}
}
]
Typebuild-modelrebuild-modelpredictdefault
CategoryFilterβ˜‘β˜‘β˜‘All records

β˜‘ available in a given method
☐ not available in a given method

Category filter​

The category filter filters the data to select only those rows with specified values - i.e. belonging to a specific category or set of categories. Currently, this fitler is applied only on columns containing group keys. For more details check the documentation section about category filters. By default, all rows will be selected.

{
"type" : "CategoryFilter",
"value": {
"column": "ColumnName_1",
"categories": [1, 2, 3]
}
}