We have put a lot of effort to create a fully automatic model building engine but sometimes, even against our best efforts, some of the models do not get you accuracy as high as they could. By toying with the exposed mathematical parametrization of our algorithm you can ensure that even the toughest dataset can be modeled properly.
Where to set this in TIM connector and API¶
modelBuilding: configuration: normalBehaviorModellingConfiguration:
Where to set this in TIM Studio¶
Quality (Forecasting Only)¶
This parameter tells how accurate models for forecasts in particular day should be. First value is connected to all models created for forecasts in the same day, that we forecast in, regardless of the availability of data (we call this day Day+0 - day zero). Second one denotes quality for Day+1 - one day ahead - forecasts, etc.
- "Automatic" - denotes automatic choice of quality
- "Low (1)" - denotes dummy quality, these models can be used even without any data provided
- "Medium (2)" - denotes models without offsets of target
- "High (3)" - denotes model usage with only limited amount of offsets of target
- "UltraHigh (4)" - denotes quality where every model uses closest target offset possible
- "SuperHigh (5)" - denotes quality where every model uses closest offset possible for every single predictor
The higher the quality is, the longer time it usually takes to build the modelZoo and accuracy should get higher, but it is not a rule. Automatic setting sets quality UltraHigh for Day+0 and Day+1 and quality "High" for all other days. If you set the quality for a day that the model will not be built for it is simply be ignored.
usage: modelQuality: - day: 0 quality: UltraHigh - day: 1 quality: High - day: 2 quality: Medium
This parameter only adjusts how offsets are used. "SuperHigh" quality should only be used in cases where predictors are not available for the exact timestamp that we want to forecast or for experimental purposes. Most of the time the amount of computing time is not worth the gain in accuracy.
TIM tries to enhance the model building process with new artificially created features derived from the original predictors. You can choose from the following features (those in bold are used by default):
- Piecewise linear
- Periodic components
- Day of week
- Exact day of week
- Moving average
- Simple moving average
- Time Offsets
If you want to, you can try omitting some of them by listing only those you want to use.
features: [MovingAverage, DayOfWeek, PeriodicComponents, Intercept, PiecewiseLinear, TimeOffsets, Polynomial]
Sometimes it is desirable to create models that do not use any offsets, only the value of a predictor for a given timestamp. This is especially useful in many anomaly detection applications. When set to false, TIM automatically disables dictionaries that would otherwise create offset features.
When normalization is on, predictors are scaled by mean and standard deviation. Switching off may help for data with structural changes. For difficult datasets with structural changes (change in mean of the target time-series) disabling normalization helps TIM Engine to settle in different working regimes faster which results in an improved accuracy. If not provided or set to automatic, TIM will decide on its own.
Determines maximal complexity of models. It is given as an integer percentage. Difficult datasets might require lower model complexity. If not provided or set to automatic, TIM will decide on its own.
Is a boolean value that decides whether to use individual model building approach for different times within a day. It is especially useful if the dynamics of the underlying problem changes during the day. Switching it off leads to common model building approach for all timestamps. If the parameter is not provided or set to automatic, TIM will decide automatically.
Note: Building only one model for all timestamps can be achieved by switching off the time specific models parameter and using other than "UltraHigh" model quality.
For some datasets, values outside of certain boundaries do not make sense - e.g. negative values for energy production. TIM tries to figure these out automatically but there is an option to override these detected values. Both lower and upper boundary should be real values. It might be useful to turn them off for datasets with a visible trend.
forecasting: configuration: extendedOutputConfiguration: predictionBoundaries: type: None
Using target offsets (Anomaly Detection Only)¶
Decides whether modelling can use offsets of the target variable itself. This tends to improve the modelling accuracy, but might improve it so much that it will worsen the anomaly detection.