In this section we will go through all information that can be found and extracted from the Model Zoo.
TIM estimates the sampling period of the data, or a user can set it manually. The sampling period used when building the Model Zoo is stored in the ISO 8601 duration format. For the predict method, the stored sampling period will be used instead of recalculating a new one.
Average training length
The average of the length of all datasets used to build models contributing to this Model Zoo. This information allows TIM to warn the user in case of trying to forecast a new situation with few data; a new model would have to be built, and its accuracy would not be on par with the originally built models.
The difficulty is a simple measure representing the difficulty of modeling the given data. It ranges from 0 to 100 percent and is calculated as 1 minus the ratio of the explained variance to the original variance when using a simple regression model. Completely random data will have difficulty close to 100, whereas highly correlated data will have a difficulty close to 0.
The name of the target variable.
The name of the variable that was used to enhance the modeling with the public holiday dictionary.
Upper and lower boundaries
The numbers TIM uses to trim the forecast to be within a specific corridor. This setting can be configured in prediction boundaries.
A dataset and Model Zoo property defining whether TIM should distinguish between situations based on the time within a day.
This represents the confidence level of the prediction intervals.
This represents the range of data that was used for the model building (training).
This represents the range of data that was not used for the model building, but where the Model Zoo was used to forecast on (backtest; validate).
Variable properties contains all releva&nt information about each of the predictors identified by their names.
Min and max
This represents the minimum and maximum of all values of the predictor observed so far.
The importance is a measure of how much the individual predictors entering the model building contribute to the model and whether it is worth collecting them. The individual importances sum up to 100 percent.
This field indicates how much history is needed and thus helps to avoid the need to plug all (available) data into TIM when making predictions. For each of the original predictors, an integer value is returned that signifies how far into the predictor's history the model potentially looks (expressed in samples) when predicting a specific timestamp. This indicates what the model needs, and avoids the need to transfer unnecessary loads of data when predicting with TIM.
Aggregation defines how the variables entering model building were aggregated; predictor variables always have the default aggregation function. Available aggregation types are Mean, Sum, Minimum and Maximum. The default aggregation is Mean for numerical variables and Maximum for boolean variables. This is related to time scale, as the sampling period to aggregate to is defined there.
A collection of models, each built to deal with specific situations.
This represents the index of a given model.
Cases contains a collection of situations the model is equipped to deal with. Each case is defined by a combination of a specific daytime (or "all" in case of all possible daytimes, for nondaily cycle data) and a list of the closest usable predictors' data points.
Daytime and Samples ahead
This pair is a unique model identifier. It should be read as "This model can forecast any number of the samplesAhead samples ahead from the end of the target as long as the timestamp of the forecast has the same time as the dayTime". Daytime is "All" in the case of a dataset without a daily cycle (i.e. nondaily cycle).
The SamplesAhead property is counted from the point where the target ends, and not from the point in time when you forecast. There is a simple reasoning behind this: the model complexity and its form stem from the data. When forecasting one day ahead, but with latest data from three months ago, the forecast is actually made for 3 months ahead rather than 1 day ahead. The models should reflect this, e.g. by being more simple.
TIM uses this information to determine how much continuous history of each predictor's data points it needs to be able to evaluate the model.
Model quality - deprecated
Removed in model version v5.6 and higher.
This is an indicator of the quality of the model. The model quality is stored in the model as a numerical enumeration corresponding to qualities as follows:
- 0 - Combined
- 1 - Low
- 2 - Medium
- 3 - High
- 4 - VeryHigh
- 5 - UltraHigh
Last target timestamp
RInv, g and mx
These mathematical parameters are used for the computation of the root cause analysis.
This is the mathematical transcript of the model. It can be read as a linear regression. Each transformation (called "part") is multiplied with its beta coefficient, and all are added together to produce a forecast. If there are two transformations inside one term, their values should first be multiplied - it's a so-called interaction. Each term has its importance to see how much it contributes to the forecast. This metric is calculated from the transformed ratio of the variance explained by this term in the model building process.