Skip to content

Overview

We have put a lot of effort to create a fully automatic model building engine but sometimes, even against our best efforts, some of the models do not get you accuracy as high as they could. By toying with the exposed parametrization of the algorithm you can ensure that even the toughest dataset can be modeled properly.

The following subsections go through all the available settings of TIM (RT)InstantML Forecasting. Each subsection is comprised of a label specifying if it is related to InstantML or RTInstantML, a text description, and TIM API notation. To learn more about the architectural differences between the InstantML and RTInstantML TIM Forecasting see this overview.

Below is a table with all the available configuration parameters for TIM (RT)InstantML Forecasting.

Configuration parameter RTInstantML InstantML model building InstantML forecasting
Usage time xx
Prediction from xx
Prediction to xx xx
Backtesting x
Quality x x
Dictionaries x x
Allow offsets x x
Data normalization x x
Model complexity x x
Time specific x x
Interpolation x x
Prediction intervals x
Boundaries x x
Prediction scope x

xx - required
x - optional

Usage

Relates to InstantML model building phase and RTInstantML

It is important to clearly communicate when forecasts need to be made and to what horizon they will relate. TIM needs information on both aspects in order to build the most appropriate models. Moreover, the quality of the models can also be configured.

InstantML model building

"usage": {
  "usageTime": [
    ...
  ],
  "predictionFrom": {
    ...
  },
  "predictionTo": {
    ...
  },
  "modelQuality": [
    ...
  ]
}

RTInstantML

"usage": {
  "predictionTo": {
    ...
  },
  "backtestLength": ...,
  "modelQuality": [
    ...
  ]
}

Usage time

Relates to InstantML model building phase

A cron-like notation that allows to specify what time will be the model used for forecasting.

  "usageTime": [
    {
      "type": "Day",
      "value": "*"
    },
    {
      "type": "Hour",
      "value": "8,12"
    },
    {
      "type": "Minute",
      "value": "30"
    }
  ]

Prediction from

Relates to InstantML model building phase

A relative time notation that specifies a positive offset from which the prediction starts relatively to the usage time.

  "predictionFrom": {
    "baseUnit": "Sample",
    "offset": 1
  }

Prediction to

Relates to InstantML model building phase and RTInstantML

A relative time notation that specifies a positive offset where the prediction ends relatively to the usage time (InstantML) or to last target value (RTInstantML).

  "predictionTo": {
    "baseUnit": "Day",
    "offset": 2
  }

Backtesting

Relates to RTInstantML

If you want to perform a backtest not following the typical scheme "build model on one set of data, predict on another" (TIM InstantML approach), but wish to use RTInstantML right away, you can change the backtestLength parameter. This triggers your model building to ignore last backtestLength samples and calculate your predictions on those instead.

"backtestLength": 168

Quality

Relates to InstantML model building phase and RTInstantML

This parameter indicates how accurate models for forecasts in a particular day should be. The "day" and "quality" parameters specify the day and the quality of all models from ModelZOO for that day respectively. For example, "day": 0 would affect the quality of all intraday models whereas "day": 1 would affect day-ahead models.

  • "Automatic" - automatic choice of quality
  • "Low" - dummy quality, these models can be used even without any data provided
  • "Medium" - models without offsets of target
  • "High" - model usage with only limited amount of offsets of target
  • "SuperHigh" - every model uses closest target offset possible
  • "UltraHigh" - every model uses closest offset possible for every single predictor

The higher the quality is, the longer time it usually takes to build the ModelZOO and accuracy should get higher, but it is not a rule. Automatic setting sets quality "UltraHigh" for Day+0 and Day+1 and quality "High" for all other days. If you set the quality for a day that the model will not be built for it is simply be ignored.

"modelQuality": [
  {
    "day": 0,
    "quality": "Automatic"
  },
  {
    "day": 1,
    "quality": "High"
  }
]

Note: "UltraHigh" quality should only be used in cases where predictors are not available for the exact timestamp that we want to forecast or for experimental purposes. Most of the time the amount of computing time is not worth the gain in accuracy.

Dictionaries

Relates to InstantML model building phase and RTInstantML

TIM tries to enhance the model building process with new artificially created features derived from the original predictors. There are different transformations available (those in bold are used by default):

If you want to, you can try omitting some of them by listing only those you want to use.

features: [MovingAverage, DayOfWeek, PeriodicComponents, Intercept, 
           PiecewiseLinear, TimeOffsets, Polynomial, Identity, 
           SimpleMovingAverage, Month, Trend, ExactDayOfWeek, 
           Fourier, PublicHolidays]

Allow offsets

Relates to InstantML model building phase and RTInstantML

Sometimes it is desirable to create models that do not use any offsets, only the value of a predictor for a given timestamp. This is especially useful in many anomaly detection applications. When set to false, TIM automatically disables dictionaries that would otherwise create offset features.

allowOffsets: true

Data normalization

Relates to InstantML model building phase and RTInstantML

When normalization is on, predictors are scaled by mean and standard deviation. Switching off may help for data with structural changes. For difficult datasets with structural changes (change in mean of the target time-series) disabling normalization helps TIM Engine to settle in different working regimes faster which results in an improved accuracy. If not provided or set to automatic, TIM will decide on its own.

dataNormalization: true

Model complexity

Relates to InstantML model building phase and RTInstantML

Determines maximal complexity of models. It is given as an integer and specifies maximal number of terms in a model. Difficult datasets might require lower model complexity. If not provided or set to automatic, TIM will decide on its own.

maxModelComplexity: 50

Time specific

Relates to InstantML model building phase and RTInstantML

Is a boolean value that decides whether to use individual model building approach for different times within a day. It is especially useful if the dynamics of the underlying problem changes during the day. Switching it off leads to common model building approach for all timestamps. If the parameter is not provided or set to automatic, TIM will decide automatically.

timeSpecificModels: false

Note: Building only one model for all timestamps can be achieved by switching off the time specific models parameter and using other than "UltraHigh" model quality.

Interpolation

Relates to InstantML model building phase and RTInstantML

The interpolation setting applies if there are missing values in the dataset. Using this setting, TIM will interpolate all gaps in the data that are not longer than what the maxLength parameter indicates. There are two interpolation methods/types available - Linear and LastValue - for linear interpolation and interpolation with the last non-missing value respectively.

"interpolation": {
  "type": "Linear",
  "maxLength": 1
}

Prediction intervals

Relates to RTInstantML

The prediction interval expresses the uncertainity in prediction by creating an interval where the prediction should probably occur. The confidenceLevel value equals the probability that the prediction will be inside of the prediction interval. Therefore, with increasing confidenceLevel value the prediction intervals are getting wider.

"predictionIntervals": {
  "confidenceLevel": 95
}

Boundaries

Relates to InstantML model building phase and RTInstantML

For some datasets, values outside of certain boundaries do not make sense - e.g. negative values for energy production. TIM tries to figure these out automatically but there is an option to override these detected values. Both lower and upper boundary should be real values. It might be useful to turn them off for datasets with a visible trend.

"extendedOutputConfiguration": {
  "predictionBoundaries": {
    "type": "Explicit",
    "maxValue": 1000,
    "minValue": 0
  }
}

Prediction scope

Relates to InstantML forecasting phase

Prediction scope is used as an input of the forecasting phase in the InstantML approach. It denotes a period where a prediction is returned by evaluating an already pre-built model. There are more ways how to set the prediction scope:

  • "type": "Ranges" - a vector of range objects ("ranges": {["from": "2020-01-01T00:00:00Z", "to": "2020-01-01T12:00:00Z"]}) where all timestamps from each range are included in the prediction scope.

  • "type": "CountFrom" - specific number of timestamps ("count": 48), starting from a specified timestamp ("from": "2020-03-31T09:00:00Z").

  • "type": "Count" - the number of timestamps ("count": 24) that are included in the scope. The initial timestamp is automatically calculated based on the forecasting horizon (prediction from and prediction to parameters) specified during the model building phase.

  • "type": "From" - the timestamp for which to start forecasting. The length of the forecast will be automatically calculated based on the forecasting horizon (prediction from and prediction to parameters).

  • "type": "Timestamps" - explicitly listed timestamps ("timestamps": ["2020-01-01T00:00:00Z", "2020-01-01T12:00:00Z"]) for which a prediction is desired.

"predictionScope": {
  "type": "CountFrom",
  "count": 48,
  "from": "2020-03-31T09:00:00Z"
}