# Model Zoo

In this section we will go through all information that can be found and extracted from the Model Zoo.

## Sampling period¶

TIM estimates **the sampling period** of the data, or a user can set it manually. The sampling period used when building the Model Zoo is stored in **the ISO 8601 duration format**. For the **predict** method, the stored sampling period will be used instead of recalculating a new one.

## Average training length¶

The average of the length of all datasets used to build models contributing to this Model Zoo. This information allows TIM to warn the user in case of trying to forecast a new situation with few data; a new model would have to be built, and its accuracy would not be on par with the originally built models.

## Difficulty¶

The difficulty is a simple measure representing the difficulty of modeling the given data. It ranges from 0 to 100 percent and is calculated as 1 minus the ratio of the explained variance to the original variance when using a simple regression model. Completely random data will have difficulty close to 100, whereas highly correlated data will have a difficulty close to 0.

## Target name¶

The name of the target variable.

## Holiday name¶

The name of the variable that was used to enhance the modeling with the **public holiday dictionary**.

## Upper and lower boundaries¶

The numbers TIM uses to trim the forecast to be within a specific corridor. This setting can be configured in **prediction boundaries**.

## Daily cycle¶

A dataset and Model Zoo property defining whether TIM should distinguish between **situations** based on the time within a day.

## Confidence level¶

This represents the confidence level of the prediction intervals.

## In-sample rows¶

This represents the range of data that was used for the model building (training).

## Out-of-sample rows¶

This represents the range of data that was not used for the model building, but where the Model Zoo was used to forecast on (backtest; validate).

## Variable properties¶

Variable properties contains all releva&nt information about each of the predictors identified by their names.

### Min and max¶

This represents the minimum and maximum of all values of the predictor observed so far.

### Importance¶

The importance is a measure of how much the individual predictors entering the model building contribute to the model and whether it is worth collecting them. The individual importances sum up to 100 percent.

### Data from¶

This field indicates how much history is needed and thus helps to avoid the need to plug all (available) data into TIM when making predictions. For each of the original predictors, an integer value is returned that signifies how far into the predictor's history the model potentially looks (expressed in samples) when predicting a specific timestamp. This indicates what the model needs, and avoids the need to transfer unnecessary loads of data when predicting with TIM.

### Aggregation¶

Aggregation defines how the variables entering model building were aggregated; predictor variables always have the default aggregation function. Available aggregation types are *Mean*, *Sum*, *Minimum* and *Maximum*. The default aggregation is *Mean* for numerical variables and *Maximum* for boolean variables. This is related to **time scale**, as the sampling period to aggregate to is defined there.

## Models¶

A collection of models, each built to deal with specific **situations**.

### Index¶

This represents the index of a given model.

### Cases¶

Cases contains a collection of situations the model is equipped to deal with. Each case is defined by a combination of a specific daytime (or "all" in case of all possible daytimes, for nondaily cycle data) and a list of the closest usable predictors' data points.

### Prediction intervals¶

Prediction intervals provide an **interval** of uncertainty around the prediction that would be produced by this model. This can be **configured** manually.

### Daytime and Samples ahead¶

This pair is a unique model identifier. It should be read as "This model can forecast any number of the *samplesAhead* samples ahead from the end of the target as long as the timestamp of the forecast has the same time as the *dayTime*". Daytime is "All" in the case of a dataset without a daily cycle (i.e. nondaily cycle).

The *SamplesAhead* property is counted from the point where the target ends, and not from the point in time when you forecast. There is a simple reasoning behind this: the model complexity and its form stem from the data. When forecasting one day ahead, but with latest data from three months ago, the forecast is actually made for 3 months ahead rather than 1 day ahead. The models should reflect this, e.g. by being more simple.

### Variable offsets¶

TIM uses this information to determine how much continuous history of each predictor's data points it needs to be able to evaluate the model.

### Model quality¶

This is an indicator of **the quality** of the model. The model quality is stored in the model as a numerical enumeration corresponding to qualities as follows:

- 0 -
*Combined* - 1 -
*Low* - 2 -
*Medium* - 3 -
*High* - 4 -
*VeryHigh* - 5 -
*UltraHigh*

### Last target timestamp¶

TIM uses this information to assess how "old" a model really is. This is useful, especially when choosing the "older than" option in the **rebuilding policy** for the **rebuild-model** method.

### RInv, g and mx¶

These mathematical parameters are used for the computation of the **root cause analysis**.

### Terms¶

This is the mathematical transcript of the model. It can be read as a **linear regression**. Each **transformation** (called "part") is multiplied with its beta coefficient, and all are added together to produce a forecast. If there are two transformations inside one term, their values should first be multiplied - it's a so-called interaction. Each term has its importance to see how much it contributes to the forecast. This metric is calculated from the transformed ratio of the variance explained by this term in the model building process.