Multi situational layer and ModelZOO
When building a model, TIM first tries to recognize all situations that might occur when using this model in production. A situation in this context is a combination of the time of forecasting, the forecasting horizon and the data availability. Usually, many different situations occur. TIM creates a separate model for each situation, optimized given the situation's conditions, and then combines all of these models into one Model Zoo. When TIM is asked to make a forecast, the current situation is automatically recognized and the most appropriate model is dispatched.
This enables the creation of very simple models for straightforward situations - such as solar production at night, for example - and more complex models for more difficult situations. In other words, TIM is able to include necessary complexity in certain situations, while eliminating redundant complexity in other situations. As a lot of focus is put on TIM's fast forecasting capabilities, it is important that TIM can recognize how these situations differ from each other. TIM can intelligently exploit similarities between models during model building.
This optimization of all individual models is run by TIM's multi-situational layer.
Let's take a look at the solar forecasting example where we forecast each day at 8 AM and 4 PM and wish to know the future starting from the next sample up until the end of tomorrow. We know the target value in real time and have forecasts of all predictors available for up until the end of tomorrow. Data are sampled quarter-hourly. You can also download the sample model generated by TIM. If we look closely, we can see, that there are 287 different models generated! This might seem like a lot, but let's count. When we forecast at 8 AM, we forecast 63 samples inside the same day and 96 samples for the next. When we forecast at 4 PM, we forecast 31 samples inside the same day and 96 samples for the next one. If we were to build a separate model for each of these "situations", we would get 63+96+31+96 = 286 models. Indeed, TIM has decided that for this particular dataset it is worth. It has also included 1 special safety model (the last one) that does not use any data and can be evaluated even in case data are missing.
Let's dive deeper and understand how these model differ. If you look closely, you can see that each model has 2 properties accompanying it: samplesAhead and dayTime. This pair is a unique model identifier and you should read it as "This model is able to forecast any number of the samplesAhead samples ahead from the end of the target as long as the timestamp of the forecast has the same time as the dayTime".
If we filter only those that have dayTime = "00:00:00", we get
These are 2 models that were generated to create forecasts for midnight. Their "samplesAhead" differ - one is  and the other one is . The first one therefore covers the midnight forecast done at 4 PM: there are 8 hours = 8*4 = 32 samples between 4 PM and midnight. The other one does it for 8 AM. Each model then consists of terms and their beta coefficients (look GAM). However, the models have only one term - Intercept - and its coefficient is 0. We can basically read this model as y = 0. This makes sense - there was no solar production during midnight in the past.
There are also exactly 2 situations where we forecast 1-step ahead:
It should make sense, that the first one has a tag dayTime = "08:15:00" (one step ahead from 8 AM) and the second one "16:15:00" (one step ahead from 4 PM). Both models are more complicated (have more terms) as the dynamic is richer when the sun actually shines.
We can also find 4 different models for dayTime = "22:00:00" - because we forecast this hour twice a day and for two different days (intra-day and day-ahead).
SamplesAhead property is counted from the point where the target ends and not from the point in time when you forecast. This has a simple reasoning - the model complexity and form stems out of the data. If you forecast one day ahead but your latest data come from three months ago, you technically forecast 3 months ahead rather than 1 day ahead. Your models should reflect this - e.g. by being more simple.
Model Zoo does not have to be this big. Sometimes TIM creates models that have dayTime = "all" and are usable for any hour of the day. Sometimes models cover multiple samplesAhead forecasts at once - this happens when you lower the quality of the models built for that particular horizon.
Multiple model dispatch¶
When forecasting with a Model Zoo, TIM tries to pick the best possible model for each of the samples. The inner logic is as follows:
- Pick only a model that can be evaluated using the data available. If there is none, use a safety model. You will find safety models in the zoo with a tag Quality = 1.
- Out of these model candidates, pick only models that were built to solve the particular situation. If there are more, pick the one that uses more recent data.
Each of the forecasts that TIM returns also has an information about the model quality and particular model index so you can trace the model responsible for it.
It is important to note that when backtesting - forecasting on already gathered historical data, all models satisfy the first condition. That is why TIM uses artificial restrictions to properly emulate forecasting situations. Learn more in the Raw and Aggregated Predictions.