Model Zoo and how to read it
In this section we will go through all outputs that can be found and extracted from the Model Zoo JSON file.
TIM-estimated sampling period from the data. It will be either given in seconds or months (because the conversion between these two units is ambiguous). If a Model Zoo is attached to the forecasting call, the stored sampling period will be used instead of recalculating a new one.
Average training length¶
The average of all datasets' lengths that were used to build models that contribute to this Model Zoo. This information serves TIM to warn the user in case of trying to forecast a new situation, but with few data. A new model would have to be built and its accuracy would not be on par with the originally built models.
A simple measure of how difficult it was to model the given data. It ranges from 0 to 100 percent and is calculated as 1 minus ratio of the explained variance to the original variance when using a simple regression model. Completely random data will have difficulty close to 100 and vice versa.
The name of the target variable.
The name of the variable that was used to enhance the modelling with the public holiday dictionary.
Upper and lower boundaries¶
Numbers that TIM uses to trim the forecast to be in a specific corridor. Can be set in prediction boundaries.
A dataset and Model Zoo property defining whether TIM should distinguish between situations on a basis of time within a day.
A collection of all necessary information about each of the predictors listed by their names.
Min and max¶
Minimum and maximum of all values of the predictor observed so far.
A measure of how much the individual predictors entering the model building contribute to the model and whether it is worth to collect them. The individual importances sum up to 100 percent.
To avoid always plugging all data to TIM when making predictions, this field tells you how much of history is really needed. For each of the original predictors you get an integer value that tells you how far in the history the model potentially looks (in number of samples) when predicting a specific timestamp. This way you know what your model needs and you do not have to transfer unnecessary loads of data when predicting with TIM.
In sample rows¶
Range of data that was used for the model building.
Out of sample rows¶
Range of data that was not used for the model building but the Model Zoo was used to forecast on this part (backtest).
A collection of models each built to deal with specific situations.
A collection of situations the model is equipped to deal with. Each is defined by a combination of a specific daytime (or "all" in case of all possible daytime - non-daily-cycle data) and list of closest usable predictors' datapoints.
Day time and Samples ahead¶
This pair is a unique model identifier and you should read it as "This model is able to forecast any number of the samplesAhead samples ahead from the end of the target as long as the timestamp of the forecast has the same time as the dayTime". Daytime is "All" in the case of a dataset without a daily cycle. SamplesAhead property is counted from the point where the target ends and not from the point in time when you forecast. This has a simple reasoning - the model complexity and form stems out of the data. If you forecast one day ahead but your latest data come from three months ago, you technically forecast 3 months ahead rather than 1 day ahead. Your models should reflect this - e.g. by being more simple.
TIM uses this information to know how much continuous history of each predictor's data points it needs to be able to evaluate the model.
An indicator of quality of the model.
Last target timestamp¶
TIM uses this information to assess how "old" the model really is. This is useful especially when choosing the "older than" option in the rebuilding policy.
RInv, g and mx¶
These mathematical parameters are used for the computation of the root cause analysis.
This is the mathematical transcript of the model. You can read it as a linear regression. Each transformation (called "part") is multiplied with its beta coefficient and all are added together to produce a forecast. If there are two transformations inside one term, it means their values should first be multiplied - it's a so called interaction. Each term has its own importance so you can see how much it contributes to the forecast. This metric is calculated from the transformed ratio of the variance explained by this term in the model building process.