Skip to main content

Model Zoo

In this section we will go through all information that can be found and extracted from the Model Zoo.

Sampling period

TIM estimates the sampling period of the data, or a user can set it manually. The sampling period used when building the Model Zoo is stored in the ISO 8601 duration format. For the predict method, the stored sampling period will be used instead of recalculating a new one.

Average training length

The average of the length of all datasets used to build models contributing to this Model Zoo. This information allows TIM to warn the user in case of trying to forecast a new situation with few data; a new model would have to be built, and its accuracy would not be on par with the originally built models.

Difficulty

The difficulty is a simple measure representing the difficulty of modeling the given data. It ranges from 0 to 100 percent and is calculated as 1 minus the ratio of the explained variance to the original variance when using a simple regression model. Completely random data will have difficulty close to 100, whereas highly correlated data will have a difficulty close to 0.

Target name

The name of the target variable.

Holiday name

The name of the variable that was used to enhance the modeling with the public holiday dictionary.

Upper and lower boundaries

The numbers TIM uses to trim the forecast to be within a specific corridor. This setting can be configured in prediction boundaries.

Daily cycle

A dataset and Model Zoo property defining whether TIM should distinguish between situations based on the time within a day.

Confidence level

This represents the confidence level of the prediction intervals.

In-sample rows

This represents the range of data that was used for the model building (training).

Out-of-sample rows

This represents the range of data that was not used for the model building, but where the Model Zoo was used to forecast on (backtest; validate).

Variable properties

Variable properties contains all releva&nt information about each of the predictors identified by their names.

Min and max

This represents the minimum and maximum of all values of the predictor observed so far.

Importance

The importance is a measure of how much the individual predictors entering the model building contribute to the model and whether it is worth collecting them. The individual importances sum up to 100 percent.

Data from

This field indicates how much history is needed and thus helps to avoid the need to plug all (available) data into TIM when making predictions. For each of the original predictors, an integer value is returned that signifies how far into the predictor's history the model potentially looks (expressed in samples) when predicting a specific timestamp. This indicates what the model needs, and avoids the need to transfer unnecessary loads of data when predicting with TIM.

Aggregation

Aggregation defines how the variables entering model building were aggregated; predictor variables always have the default aggregation function. Available aggregation types are Mean, Sum, Minimum and Maximum. The default aggregation is Mean for numerical variables and Maximum for boolean variables. This is related to time scale, as the sampling period to aggregate to is defined there.

Models

A collection of models, each built to deal with specific situations.

Index

This represents the index of a given model.

Cases

Cases contains a collection of situations the model is equipped to deal with. Each case is defined by a combination of a specific daytime (or "all" in case of all possible daytimes, for nondaily cycle data) and a list of the closest usable predictors' data points.

Prediction intervals

Prediction intervals provide an interval of uncertainty around the prediction that would be produced by this model. This can be configured manually.

Daytime and Samples ahead

This pair is a unique model identifier. It should be read as "This model can forecast any number of the samplesAhead samples ahead from the end of the target as long as the timestamp of the forecast has the same time as the dayTime". Daytime is "All" in the case of a dataset without a daily cycle (i.e. nondaily cycle).

The SamplesAhead property is counted from the point where the target ends, and not from the point in time when you forecast. There is a simple reasoning behind this: the model complexity and its form stem from the data. When forecasting one day ahead, but with latest data from three months ago, the forecast is actually made for 3 months ahead rather than 1 day ahead. The models should reflect this, e.g. by being more simple.

Variable offsets

TIM uses this information to determine how much continuous history of each predictor's data points it needs to be able to evaluate the model.

Model quality

This is an indicator of the quality of the model. The model quality is stored in the model as a numerical enumeration corresponding to qualities as follows:

  • 0 - Combined
  • 1 - Low
  • 2 - Medium
  • 3 - High
  • 4 - VeryHigh
  • 5 - UltraHigh

Last target timestamp

TIM uses this information to assess how "old" a model really is. This is useful, especially when choosing the "older than" option in the rebuilding policy for the rebuild-model method.

RInv, g and mx

These mathematical parameters are used for the computation of the root cause analysis.

Terms

This is the mathematical transcript of the model. It can be read as a linear regression. Each transformation (called "part") is multiplied with its beta coefficient, and all are added together to produce a forecast. If there are two transformations inside one term, their values should first be multiplied - it's a so-called interaction. Each term has its importance to see how much it contributes to the forecast. This metric is calculated from the transformed ratio of the variance explained by this term in the model building process.