Technically, to build a model, TIM only requires data and defined KPI. Having a reasonable model is crucial for finding the right anomalies. The most important things that affect the quality of the model may be divided into two parts:
- Math/Machine learning - The goal is to automatize the anomaly detection process on your data. This is related to things as feature expansion, feature reduction, selection of normal behavior and anomalous behavior model and its parameters and creation of anomaly indicator. TIM does this part of the work automatically.
- Domain-specifics - The goal is to design the experiment in the most reasonable way for your domain/problem. This is related to data - defining KPI, choosing the right influencers, and if needed adjusting the update times of your data, sensitivity and perspective of how you look on anomalies. Even though TIM has an automatic/default configuration mode, a domain expert a proper person to adjust it resulting in fewer false positives or false negatives.
First, you put your data in a required format, determine a KPI on which you want to detect anomalies and, if available, include influencers that affect your KPI. Selection of the influencers and period of data for model building can significantly affect the results. You build your model on historical data, and then, based on this model, you are evaluating new measurements as they flow into your database. Technically, if you have data, you can automatically create a model.
However, to have a model that meets your domain specifics, you have to define the data updates, as the understanding of what your data looks like in the moment of detection is also important. In a default mode, TIM expects aligned data which is often the case in anomaly detection. The detection perspectives - perspectives of how you look at anomalies is also a way how the user can adapt the anomaly detection to his preferences. Finally, there is a customizable sensitivity parameter which lets you to fine-tune sensitivity to potential anomalies based on your business risk profile, whereby it is also possible to let TIM find a reasonable sensitivity in an automatic way.
Model building task returns not only a model but also other outputs. The most important is the anomaly indicator - its values are returned for the entire model building period. Analysis of the result helps you to decide if the model was configured appropriately.
Suppose you are not satisfied with the results and want to tweak TIM's performance. In that case, you can play with the configuration of both domain-specifics mentioned above and mathematical settings manually. If you are satisfied with the model, you can use it for detection as often as new data comes to your database.
To sum it up, data are a must for creating a reasonable model. To have the best possible model you have to use your domain knowledge to define the model building scenario. A built model can be used for detection as often as required. We will go into detail for all of the mentioned topics.
Most often, the case is that you have built a model suitable for your problem and want to use it for real-time detection. TIM requires two things for detection - model and new data.
There are a couple of things you have to care about related to data. You have to make sure that the data you are sending to TIM is in the same form as when building. It might also be useful first to read the section about required data properties. Also, to make detection possible for the chosen period, you have to include at least that amount of data which is required by the underlying model, see image below. Otherwise, data points without all the expected inputs can't be calculated by the model. The length of data required often differs from predictor to predictor.
The typical approach in automatic model building is to find and tune the best model possible, store it and then interpret it with new data to make a detection. Then, after a while, you could rebuild it with new data. As we already know from model building, building a reasonable model is crucial. Then, with such a model, you can detect anomalies for both old and new data points. Yet, imagine a situation where you have built a model you can rely on from the perspective of the quality of the normal behavior (you are satisfied with your experiment - KPI, included influencers and math settings), but you want to tune the domain specifics (perspectives/sensitivities) or update your model with new incoming data. If so, rebuilding is the right way to go.
What you need to have is a model and input data in the same form as when building the model. Then you can specify the configuration of rebuilding, namely: domain specifics for overriding their configuration in input model and rebuild type for choosing which part of the model should be rebuilt or reconfigured. Then, depending on whether domain specifics are configured and on the chosen rebuild type, a specific rebuild functionality is applied. In case you only want to reconfigure detection perspectives or their sensitivities, set rebuildType to DomainSpecifics. If not provided, domain specifics are read from the input model. It is visualized in the following schema:
TIM allows uploading a pre-built model to a use case by creating an upload-model job. Such a job can be created either by specifying an existing job that contains a model or by uploading a model that was previously downloaded from another job. The upload-model job can upload the model to a use case that is associated with a different dataset than the one used to build the model, however, the new dataset has to be compatible with the model (identical column names, sampling rate, etc.). An upload-model job can be used as a parent job for running detection or model rebuilding jobs.