Daily and nondaily cycle data
The graph below shows the production of a typical solar farm during one day. The goal of this use case is to forecast this production. To forecast the production at 3:00, the model production(t)=0 probably produces good results; the sun does not shine at night. This model would however fail to accurately forecast production at 12:00. In other words, the required model complexity differs a lot depending on the time that is being forecasted.
Such datasets have a "daily cycle", as opposed to those that are "nondaily cycle". The difference between these two types of datasets can easily be spotted in the images below.
How TIM detects a daily cycle
TIM can detect a daily cycle in a dataset if it meets certain requirements, namely:
- the dataset has a sampling period between 10 minutes and 12 hours (it has to make sense to look for patterns within a day), and
- the dataset has to span at least half a year (there has to be enough data to reliably detect patterns).
A sampling frequency higher than 10 minutes (i.e. a sampling rate smaller than 10 minutes) causes high signal variance, and building a separate model for each time would become ineffective. A sampling period larger than 12 hours does not include at least two samples during a day, thus it no longer makes sense to distinguish different behaviors within a day. The purpose of the lower limit on data length is to prevent training (a) model(s) on a small amount of data.
For datasets that satisfy this condition, TIM performs a simple autocorrelation analysis to see whether the autocorrelation decreases with time, or whether some regular spikes can be found on a daily basis. This latter case is a strong indication that the dataset follows a daily cycle.
What TIM does differently for daily cycle datasets
The main distinction TIM makes between daily cycle and nondaily cycle datasets is that TIM builds different models to handle separate times of the day for daily cycle datasets. This has many consequences, such as different backtesting results and different behavior for model rebuilding.
Keep in mind that the presence of a daily cycle is a property of the dataset, and therefore the whole Model Zoo will consist of either time-specific models (daily cycle data) or time-generic models (nondaily cycle data).
It is possible to force TIM to use a daily cycle Model Zoo on data that meets the requirements, even if TIM does not detect such a pattern. It is also possible to prevent TIM from trying to detect a daily cycle pattern and force TIM to work with a nondaily cycle pattern. These options are meant to give as much freedom in configurability as possible to domain experts, but in most cases TIM's detection will produce optimal results. Overriding how TIM should handle the dataset can be done using the configuration.
When working with daily cycle models, it is important to correctly handle time zones. The section on time zones includes an example of the impact incorrect time zone specification can have when building and interpreting daily cycle models.